From 3c2a7da6059ca5e770bb67104b9e9114535ed5de Mon Sep 17 00:00:00 2001 From: aloekun Date: Mon, 4 May 2026 04:58:11 +0900 Subject: [PATCH 1/3] =?UTF-8?q?docs(todo):=20PR=20#105=20post-merge-feedba?= =?UTF-8?q?ck=20=E6=8E=A1=E7=94=A8=E5=88=86=20+=20format=20=E6=8B=A1?= =?UTF-8?q?=E5=BC=B5=E3=82=BF=E3=82=B9=E3=82=AF=E3=82=92=E8=BF=BD=E5=8A=A0?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - 順位 57: Aggregation cap integration test (PR #105 T2-1 採用、Tier 2 / S) — collect_all_violations の MAX_VIOLATIONS contract を test 化、将来の lint 追加時の truncate 削除 regression を防止 - 順位 58: post-merge-feedback findings table format 拡張 (Tier 2 / S) — Severity / Frequency / Adoption Risk / Recommendation 必須列化、AI 採用 判定を rubric ベースで安定化、卻下根拠の言語化 PR #105 post-merge-feedback の他 7 件 (T1-1/2/3, T2-2/3, T3-1, T3-2) は採用 見送り (低頻度 + 高実装コスト / NLP 必要 / 過剰一般化 / 重複ルール 等)。 --- docs/todo.md | 2 + docs/todo5.md | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 107 insertions(+) diff --git a/docs/todo.md b/docs/todo.md index e08ea6c..5afbaa1 100644 --- a/docs/todo.md +++ b/docs/todo.md @@ -69,6 +69,8 @@ | 54 | 🔧 Tier 2 | **review 完了待ちの CronCreate 化 + observer 廃止 (Bundle b PR-2) ★ Bundle b** | todo5.md | M | 順位 53 land 後 (Bb-1 で導入する Cron 機構を review 完了待ちにも展開、45s polling + 5s observer polling を完全排除、固定値 wakeup 化) | | 55 | 💎 Tier 3 | **config 拡張 + SessionStart catch-up (Bundle b PR-3) ★ Bundle b** | todo5.md | S | 順位 53 / 54 land 後 (固定値の `monitor.toml` 化 + Claude Code 不在時に発火した wakeup を SessionStart で catch-up、AI 不在時の silent loss 防止) | | 56 | 🔧 Tier 2 | **comment-lint hook test 拡充 (PR #104 T2-1+T2-2 bundle)** | todo5.md | S | なし (UTF-8 multi-byte 5 パターン + block comment boundary 6 パターンを `locate_string_line_ranges` / `span_overlaps_ranges` の回帰テストとして体系化、PR #104 Critical/Minor fix の固定化) | +| 57 | 🔧 Tier 2 | **Aggregation cap integration test (PR #105 T2-1 採用)** | todo5.md | S | なし (`collect_all_violations` の MAX_VIOLATIONS contract を test 化、将来の lint 追加時に `truncate(MAX)` 削除 regression を防止する explicit 安全網) | +| 58 | 🔧 Tier 2 | **post-merge-feedback findings table format 拡張 (Severity / Frequency / Adoption Risk / Recommendation を必須列化)** | todo5.md | S | なし (PR #105 評価で Effort + Rationale のみでは AI 採用判定が安定しないことを確認、rubric ベースの format 固定化で評価コスト削減 + 卻下根拠の言語化) | **戦略**: Tier 1 を 2〜3 セッションで片付け → Tier 2 で ADR-032 の前提 + rate-limit + convergence cost 削減を進める → Tier 3 で ADR-032 を land + ドキュメント整備。Tier 4-5 は cleanup / 外部展開で daily efficiency への直接効果は小さい。 diff --git a/docs/todo5.md b/docs/todo5.md index 04d59ad..4ed1b67 100644 --- a/docs/todo5.md +++ b/docs/todo5.md @@ -358,3 +358,108 @@ #### 詰まっている箇所 - 結合文字 (`e + ́`) を `new_string` に含むケースは Edit tool が実環境で発生するか不明 (理論的検証としては有効、実際の回帰防止としては効果薄の可能性)。1 パターンで足る + +--- + +### Aggregation cap integration test (PR #105 T2-1 採用) + +> **動機**: PR #105 の auto-fix で `collect_all_violations` に `violations.truncate(MAX_VIOLATIONS)` を追加した (CodeRabbit Minor finding 解消) が、これは contract の暗黙化に過ぎない。将来 `find_xxx_violations` を追加する PR で `extend()` の後に `truncate` を入れ忘れる regression を構造的に防ぐ test がない。 +> +> **本タスクの位置づけ**: PR #105 post-merge-feedback Tier 2 #1 採用。後続の lint 追加 (例: 順位 56 の test 拡充 / 順位 47 の `>=` boundary lint / 将来の Rust 専用 lint) で同 contract を破る regression を test で固定化する。 +> +> **参照**: `.claude/feedback-reports/105.md` Tier 2 #1、`src/hooks-post-tool-comment-lint-rust/src/main.rs` `collect_all_violations` (line 545)、PR #105 Finding #2 (Minor) の auto-fix +> +> **実行優先度**: 🔧 **Tier 2** — Effort S。test 1-2 件追加で完結。 + +#### 設計決定 (案) + +- **シナリオ**: `collect_all_violations(file_path, source_with_15_comments_and_15_long_functions, None)` を呼び、結果が **MAX_VIOLATIONS (= 20) 以下** であることを assert +- **source 構築**: + - 15 個の禁止コメント (`// forbidden 0` 〜 `// forbidden 14`) + - 15 個の 60 行関数 (`fn big_0` 〜 `fn big_14`) + - 合計 30 件の violation 候補 → cap で 20 件に truncate +- **test 名**: `collect_all_violations_truncates_to_max_violations` (spec を test 名に反映、PR #105 T2-3 提案は卻下したが naming-as-spec 自体は意義あり) +- **追加検証** (任意): 個別 `find_violations` / `find_function_length_violations` がそれぞれ 20 件以上返しうることも assert (truncate なしだと 30 件返ることを示す) + +#### 作業計画 + +- [ ] 30 件の violation 候補を含む synthetic source を生成する helper 関数を test module に追加 +- [ ] `collect_all_violations_truncates_to_max_violations` test を追加 +- [ ] 個別 finder の non-truncate 挙動を assert する補助 test を追加 +- [ ] cargo test pass 確認 +- [ ] 派生プロジェクト deploy は不要 (test のみ) +- [ ] 本 todo5.md エントリを削除 + +#### 完了基準 + +- 結合後の violation 件数が `MAX_VIOLATIONS` 以下であることが test で固定化 +- 将来 `find_xxx_violations` を追加した PR で truncate 削除すると test fail で検出される + +#### 詰まっている箇所 + +- 順位 56 (PR #104 T2-1+T2-2 test 拡充) と同 PR で bundle するか別 PR とするか。両者とも test additions、同ファイル同 test module で scope clean、bundle 推奨。 + +--- + +### post-merge-feedback findings table format 拡張 (Severity / Frequency / Adoption Risk / Recommendation 必須化) + +> **動機**: PR #105 post-merge-feedback の評価セッションで、現フォーマット (Effort + Rationale のみ) では AI が採用判定根拠を systematically 落とすことが確認された。8 件中 1 件のみ採用 (T2-1) という評価をユーザーが下せたのは、AI が手動で各 finding の Severity / Frequency / 実装コスト / 過剰一般化リスクを補完したから。format に組み込めば AI 出力が rubric ベースで安定し、ユーザーの審査時間を削減できる。 +> +> **本タスクの位置づけ**: PR #105 評価セッションで合意。post-merge-feedback workflow の aggregate facet を拡張し、AI に rubric を強制することで卻下根拠の言語化と採用判定の安定化を図る。 +> +> **参照**: PR #105 評価セッション (本対話)、`.claude/feedback-reports/105.md` の現フォーマット、~/.claude/.takt/facets/ または equivalent (実際の facet 配置先は要調査) +> +> **実行優先度**: 🔧 **Tier 2** — Effort S。facet prompt 1-2 セクションの修正、派生プロジェクトへの展開も含む。 + +#### 設計決定 (案) + +##### 拡張する table 列 + +| 既存 | 拡張後 | +|---|---| +| `# / Type / Description / Target / Effort / Rationale` | `# / Type / Description / Target / Severity / Frequency / Effort / Adoption Risk / Recommendation / Rationale` | + +##### 各列の rubric (facet instructions に明記) + +- **Severity**: + - `Critical`: data loss / security / 致命的バグの再発防止 + - `High`: 機能 bug / silent failure / data integrity + - `Medium`: silent degrade / UX 低下 / 開発体験劣化 + - `Low`: style / micro-optimization / 局所改善 +- **Frequency** (再発リスク): + - `High`: 複数 PR で観測済 / systemic pattern + - `Medium`: 1 PR + 類似コードベースで再発見込み + - `Low`: single observation / 局所 +- **Adoption Risk** (採用時のリスク要素を 1-2 語で記述): + - 例: `既存ルール重複` / `過剰一般化` / `NLP 必要` / `OS 依存` / `派生プロジェクト deploy コスト` / `false positive リスク` / `None` +- **Recommendation** (必須付記): + - `✅ 採用`: 高コスパ判定 (Effort=S かつ Severity Medium+ または Frequency Medium+ かつ 根本原因が適切) + - `🤔 様子見`: 採用根拠は弱いが将来発生時に再評価したい (一般原則 / 不確実性高) + - `❌ 卻下`: 頻度 Low かつ 実装コスト High、または Adoption Risk が overlap / 過剰一般化 / NLP 必要 / 重複ルール +- **Rationale** (採用 / 卻下の根拠を必ず記述): + - なぜ Severity × Frequency × Effort × Adoption Risk から Recommendation に至ったかを 1-2 文で + +##### facet 配置の調査 + +- post-merge-feedback の aggregate step は takt facet として実装されている想定 +- 実際の prompt 配置: `~/.claude/.takt/facets/post-merge-feedback/*.md` か `.claude/feedback-reports/` か要調査 +- 派生プロジェクト (techbook-ledger / auto-review-fix-vc) は Python ベースで facet 配置が異なる可能性 + +#### 作業計画 + +- [ ] post-merge-feedback aggregate facet の実体を grep / find で特定 (`takt facet` / `post-merge` 等で検索) +- [ ] facet prompt の table format 部分を改訂 (列追加 + rubric 説明文) +- [ ] sample output (この PR の評価結果を例として facet に inline) を追加 +- [ ] dogfood: 次の post-merge-feedback で新フォーマットが出力されること確認 +- [ ] 派生プロジェクトの同 facet にも展開 (Python ベース facet があれば独立に対応) +- [ ] 本 todo5.md エントリを削除 + +#### 完了基準 + +- 次回以降の post-merge-feedback で 9 列フォーマット (Severity / Frequency / Adoption Risk / Recommendation 必須) が出力される +- ユーザーの採用判定セッションで「AI が rubric を埋めていない」ことに起因する補完作業が消滅 + +#### 詰まっている箇所 + +- aggregate facet が AI prompt なのか deterministic template なのか要調査。前者なら自然言語ルール追加で済むが、後者ならコード変更が必要 +- rubric の閾値 (Severity Medium 等) が AI 判定で揺れる可能性 → sample output を facet に inline することで cluster 化を狙うが、初回 dogfood で false positive / negative の頻度を観測する必要あり From 2c40b188dfd4e1c83cd6ecdbf12a016d4362ebb1 Mon Sep 17 00:00:00 2001 From: aloekun Date: Mon, 4 May 2026 05:14:36 +0900 Subject: [PATCH 2/3] =?UTF-8?q?feat(takt):=20Bundle=20Z=20Phase=203=20?= =?UTF-8?q?=E2=80=94=20reviewer=20=E5=BD=B9=E5=89=B2=E5=A4=89=E6=9B=B4=20+?= =?UTF-8?q?=20fix-trust=20=E9=80=A3=E5=B8=AF=20(PR=203=20/=20#B-=CE=B3=20+?= =?UTF-8?q?=20#C-2)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 決定論層 (Phase 1 #B-α + Phase 2 #B-β) を通過した状態を前提に、reviewer の責務を 「lint で防げない高次違反のみ flag」 に再定義し、fix step 報告に基づく Iter 3 短絡を追加する。 #B-γ reviewer 役割変更: - review-simplicity.md: 7 criteria enumerate 義務を削除、anomaly detection モードに転換 - 決定論層 intercept 項目 (comment policy / function length / function metrics) を skip - 異常検知 criteria: Unexplained complexity / Inconsistent style / Dead-on-arrival code / Hidden coupling / Missing failure paths / Non-obvious magic values - DRY/YAGNI scope guards は維持 - 'Calibration: avoid over-narrowing' セクションで二重 miss リスク対策 - review-security.md: vulnerability checklist を memory aid に格下げ、anomaly mode を primary entry point に - concrete exploit path 要件を強化 (input control / what becomes possible) - docs-only trust boundary criterion は維持 #C-2 fix-trust 連帯: - fix.md: Convergence verdict セクション追加 (REQUIRED) - persists==0 AND misdirected==0 → 'convergence_verdict: fully_resolved' - else → 'convergence_verdict: partial' - 'Honesty constraint' で安全網 bypass リスクを明示 - post-pr-review.yaml + pre-push-review.yaml: fix step rules に新 condition 追加 - fully_resolved → COMPLETE 直行 (analyze/reviewers 再実行 skip) - partial → analyze/reviewers (既存挙動) - Unable to proceed → supervise (既存挙動) 期待効果: - attention drift 問題消滅 (検出対象が absolute に narrow に) - 1-iter ALL APPROVE 率 90% 超 (決定論層で大半 intercept) - review 所要時間短縮 (1m30s〜3m → 30s〜1m 期待) - post-pr-review 3-iter outlier を 2-iter に圧縮 (~3 分削減/run) 進捗管理 (docs/pipeline-token-efficiency.md): - #C-3 / #A-2 / #B-β を 'Completed' に更新 - #B-γ / #C-2 を '実装中' (本 commit) に更新 リスク対策: - 二重 miss: reviewer の anomaly mode に 'Calibration: avoid over-narrowing' を残置 - 自己評価信頼性: fix.md の 'Honesty constraint' で fully_resolved を慎重に emit するよう明示 --- .takt/facets/instructions/fix.md | 21 +++++++ .takt/facets/instructions/review-security.md | 46 +++++++++++----- .../facets/instructions/review-simplicity.md | 55 ++++++++++++------- .takt/workflows/post-pr-review.yaml | 12 +++- .takt/workflows/pre-push-review.yaml | 12 +++- docs/pipeline-token-efficiency.md | 10 ++-- 6 files changed, 115 insertions(+), 41 deletions(-) diff --git a/.takt/facets/instructions/fix.md b/.takt/facets/instructions/fix.md index 24a69d4..732c6df 100644 --- a/.takt/facets/instructions/fix.md +++ b/.takt/facets/instructions/fix.md @@ -78,3 +78,24 @@ New files (post-only) are reported `metrics_check: skipped` and do not block fix | reopened (recurrence fixed) | {N} | | persists (carried over, not addressed this iteration) | {N} | | misdirected (suggestion pointed at a read-only zone, skipped) | {N} | + +## Convergence verdict (REQUIRED — Phase 3 / #C-2 fix-trust shortcut) + +After completing fixes, evaluate the gate above and emit one of two verdicts. The next workflow step is selected from this verdict, so it must accurately reflect the gate state. + +- **fully_resolved** — `persists == 0` AND `misdirected == 0`. All findings of this iteration were either fixed or correctly skipped. No remaining work for the analyze step to re-examine. +- **partial** — `persists > 0` OR `misdirected > 0`. Some findings carried over (still need fixing in a later iteration) or were skipped due to misdirection (and need to be reported). Re-analysis is required. + +Place the verdict at the **end of your report** as a single bare line in this exact form (no surrounding quotes, no trailing punctuation): + +```text +convergence_verdict: fully_resolved +``` + +or: + +```text +convergence_verdict: partial +``` + +**Honesty constraint**: This verdict gates whether the analyze step runs again. Reporting `fully_resolved` while leaving findings unaddressed bypasses the safety re-check. If you are uncertain whether a finding was truly resolved (e.g., you applied a fix but did not verify the build passes), emit `partial` so the analyze step can re-evaluate. diff --git a/.takt/facets/instructions/review-security.md b/.takt/facets/instructions/review-security.md index 26dc6d1..8ade2a0 100644 --- a/.takt/facets/instructions/review-security.md +++ b/.takt/facets/instructions/review-security.md @@ -1,10 +1,4 @@ -Review the changes from a security perspective. Check for the following vulnerabilities: -- Injection attacks (SQL, command, XSS) -- Authentication and authorization flaws -- Data exposure risks (hardcoded secrets, API keys, tokens) -- Cryptographic weaknesses -- Unsafe code without safety comments (Rust) -- Path traversal risks +Focus on **security anomaly detection** in the changed diff. Categorical vulnerability classes (injection / auth flaws / data exposure / crypto weakness / unsafe code / path traversal) remain in scope, but the bar for raising a finding is the same as the simplicity facet: an articulable concern with a concrete exploit path, not a checklist tick. ## Obtaining the diff @@ -21,16 +15,38 @@ Before evaluating the change, **read the following project documents** using the **Important:** - Do not treat documented precedence rules, extension points, or configuration override behavior as vulnerabilities by themselves. -- To issue a blocking finding, make the exploit path concrete: who controls what input, and what newly becomes possible. +- To raise a blocking finding, make the exploit path concrete: who controls what input, and what newly becomes possible. -## Judgment Procedure +## Vulnerability dimensions (use as memory aid, not a checklist) -1. Review the change diff and extract issue candidates -2. For each candidate, verify the concrete exploit path - - Which actor controls the input or configuration - - Whether the change enables new privilege, data access, code execution, or prompt modification -3. For each detected issue, classify it as blocking or non-blocking -4. If there is even one blocking issue, judge as REJECT +The following classes remain reviewable, but flag them only when you can construct a concrete exploit path: + +- **Injection attacks**: SQL, command, XSS — actor-controlled input flowing into an interpreter without escaping +- **Authentication and authorization flaws**: Missing checks, scope confusion, privilege escalation paths +- **Data exposure risks**: Hardcoded secrets, API keys, tokens, sensitive logs +- **Cryptographic weaknesses**: Broken algorithms, missing randomness, weak key handling +- **Unsafe code without safety comments** (Rust): `unsafe` blocks lacking `// SAFETY:` justification +- **Path traversal**: Unsanitized file paths reaching filesystem APIs + +## Anomaly mode (preferred entry point) + +Read the diff once, end-to-end, before consulting the dimensions list. If a pattern reads as **unusual / unexplained / hard to justify** from a security standpoint, that is your primary signal. The dimensions above are a memory aid for what to do with that signal, not a substitute for it. + +For each finding, articulate: + +- **What is unusual or risky** +- **Who controls the relevant input or configuration** +- **What newly becomes possible** (data access, privilege, code execution, prompt modification) + +If you cannot articulate the third bullet, the finding is speculative — downgrade or drop it. + +## Judgment procedure + +1. Read the diff from `.takt/review-diff.txt` +2. Read straight through. Note any pattern that triggers a security concern +3. For each candidate, verify the concrete exploit path (input control, what becomes possible) +4. Classify each verified concern as blocking or non-blocking +5. If there is even one blocking concern with a concrete exploit path, judge as REJECT ## Docs-only changes: trust boundary criterion diff --git a/.takt/facets/instructions/review-simplicity.md b/.takt/facets/instructions/review-simplicity.md index 2e156e2..fd554bc 100644 --- a/.takt/facets/instructions/review-simplicity.md +++ b/.takt/facets/instructions/review-simplicity.md @@ -1,4 +1,4 @@ -Focus on reviewing **code simplicity** within the changed diff only. +Focus on **anomaly detection** in the changed diff -- patterns that look unusual, unexplained, or out of step with the surrounding codebase. Do NOT enumerate against a fixed checklist; the deterministic layer already handles structural metrics. ## Obtaining the diff @@ -6,38 +6,55 @@ The diff has been pre-collected by push-runner (Rust exe) and saved to `.takt/re **Read this file first** using the Read tool. This is the authoritative review target. Do NOT run `git diff` or `jj diff` yourself -- the file already contains the correct diff scope. -## Scope constraint +## Determinism layer guarantees (do NOT duplicate) + +The following dimensions are enforced by deterministic hooks at write time and by `fix-metrics-check.ps1` during fix iterations. Skip them — flagging them duplicates the deterministic layer and produces noise: + +- **Comment policy** (Bundle Z #B-α / `hooks-post-tool-comment-lint-rust`): Non-doc comments are blocked at PostToolUse. Existing comments in the diff have already passed the allowlist (`// SAFETY:` / `// TODO:` / rustdoc etc.). +- **Function length** (順位 48, same hook): Functions >50 lines are blocked at write time (touch-trigger ratchet, grandfathered until touched). New >50 functions or growth past 50 cannot land in changed regions. +- **Function metrics during fix** (Bundle Z #B-β / `fix-metrics-check.ps1`): non-doc comment count, function length, max nesting depth cannot increase per function during fix iterations. Pre/post comparison enforces this structurally. + +Reviewing these dimensions is duplicative. Skip them. + +## Anomaly criteria (subjective judgment required) -Review ONLY the lines changed in the diff. Do NOT explore cross-file dependencies, call chains, or project-wide architecture. Every finding must be traceable to a specific hunk in the diff. +Read the diff straight through. Note any pattern that prompted "this looks unusual / unexpected / hard to explain" — patterns deterministic checks cannot catch: -## Review criteria (all diff-local) +- **Unexplained complexity**: Logic choices with no obvious motivation given the surrounding code; algorithm complexity that seems disproportionate to the problem +- **Inconsistent style**: Naming or structural patterns that diverge from neighboring code without rationale +- **Dead-on-arrival code**: Branches, parameters, or abstractions with no apparent caller or use site +- **Hidden coupling**: Changes that silently depend on global state, environment, ordering, or undocumented invariants +- **Missing failure paths**: Operations that can fail (I/O, parse, network, optional unwrap) with no visible error handling +- **Non-obvious magic values**: Numeric or string literals whose meaning isn't clear from context -- **Nesting depth**: Flag blocks nested >4 levels; suggest flattening via early returns or extraction -- **Function length**: Flag functions exceeding 50 lines -- **Early return opportunities**: Identify guard clauses that would reduce nesting -- **Redundant / duplicate code**: Flag copy-paste patterns or unnecessarily verbose logic within the diff -- **Magic numbers**: Flag unexplained numeric or string literals; suggest named constants -- **YAGNI violations**: Flag speculative abstractions, unused parameters, or over-engineered patterns that serve no current need -- **Naming clarity**: Flag ambiguous variable/function names that obscure intent +For each anomaly, articulate **what looks unusual**, **why it caught your attention**, and **what alternative would be expected**. If you cannot articulate the "why", it likely isn't an anomaly worth flagging. + +## Scope constraint + +Review ONLY the lines changed in the diff. Do NOT explore cross-file dependencies, call chains, or project-wide architecture. Every anomaly finding must be traceable to a specific hunk in the diff. ## Scope of DRY / YAGNI (do NOT raise findings outside this scope) -The DRY and YAGNI criteria above apply **only to executable code logic**. +The DRY and YAGNI dimensions in anomaly detection apply **only to executable code logic**. - **DRY scope**: Flag duplicated *code logic* (copy-paste functions, repeated control flow, redundant computations). Do NOT flag: - Documentation hierarchies that intentionally restate context (e.g., a summary table followed by detailed bullet points) - Repetition between docs and code (docs explain, code executes — they serve different audiences) - Test code mirroring production code structure (test independence > test DRY) - **YAGNI scope**: Flag *speculative code abstractions* (unused parameters, premature interfaces, over-engineered patterns in production code). Do NOT flag: - - Planning documents listing "future candidates", "Phase 2 検討", or "out of scope but worth considering" sections — these capture design intent for shared understanding, not speculative implementation - - ADR alternatives sections describing rejected options — these document the decision rationale - - Comments documenting *known constraints or limitations* of the current implementation (these are not speculation; they are recorded reality) + - Planning documents listing "future candidates", "Phase 2 検討", or "out of scope but worth considering" sections + - ADR alternatives sections describing rejected options + - Comments documenting *known constraints or limitations* of the current implementation + +If a finding cannot be tied to executable code logic, it is out of scope. + +## Calibration: avoid over-narrowing -If a finding cannot be tied to executable code logic, it is out of scope — do not raise it. +The shift to anomaly detection is meant to remove the duplicative checklist work, not to skip review. If reading the diff leaves you with a concrete unease that you can articulate, raise it — even if it doesn't fit a named criterion. Conversely, if you can only flag something by mechanically applying a rule, the deterministic layer already handles that case. ## Judgment procedure 1. Read the diff from `.takt/review-diff.txt` -2. For each changed hunk, check against the 7 criteria above -3. For each detected issue, classify as blocking (significantly harms readability/maintainability) or non-blocking (minor suggestion) -4. If there is even one blocking issue, judge as REJECT +2. Read straight through. After the first pass, list any pattern that read as "unusual / unexpected / hard to explain" +3. For each anomaly, classify as blocking (significant unexplained risk) or non-blocking (worth raising but not a blocker) +4. If there is even one blocking anomaly, judge as REJECT diff --git a/.takt/workflows/post-pr-review.yaml b/.takt/workflows/post-pr-review.yaml index 7fc642e..2582c29 100644 --- a/.takt/workflows/post-pr-review.yaml +++ b/.takt/workflows/post-pr-review.yaml @@ -93,7 +93,17 @@ steps: pass_previous_response: false instruction: fix rules: - - condition: Fixes complete + # Phase 3 / #C-2 fix-trust shortcut: fix step が Convergence verdict + # `convergence_verdict: fully_resolved` を report したら analyze 再実行を + # skip して直接 COMPLETE。3-iter run を 2-iter に圧縮する。 + - condition: | + Report ends with `convergence_verdict: fully_resolved` (Convergence + gate shows persists: 0 AND misdirected: 0). All findings of this + iteration are fully resolved with no remaining work. + next: COMPLETE + - condition: | + Fixes applied but `convergence_verdict: partial` (some findings + persist or were misdirected, re-analysis needed). next: analyze - condition: Unable to proceed with fixes next: supervise diff --git a/.takt/workflows/pre-push-review.yaml b/.takt/workflows/pre-push-review.yaml index 5ba0101..8eb7bbd 100644 --- a/.takt/workflows/pre-push-review.yaml +++ b/.takt/workflows/pre-push-review.yaml @@ -111,7 +111,17 @@ steps: pass_previous_response: false instruction: fix rules: - - condition: Fixes complete + # Phase 3 / #C-2 fix-trust shortcut: fix step が Convergence verdict + # `convergence_verdict: fully_resolved` を report したら reviewers 再実行を + # skip して直接 COMPLETE。multi-iter の review-fix loop を圧縮する。 + - condition: | + Report ends with `convergence_verdict: fully_resolved` (Convergence + gate shows persists: 0 AND misdirected: 0). All findings of this + iteration are fully resolved with no remaining work. + next: COMPLETE + - condition: | + Fixes applied but `convergence_verdict: partial` (some findings + persist or were misdirected, re-review needed). next: reviewers - condition: Unable to proceed with fixes next: supervise diff --git a/docs/pipeline-token-efficiency.md b/docs/pipeline-token-efficiency.md index cf9a2c0..f499463 100644 --- a/docs/pipeline-token-efficiency.md +++ b/docs/pipeline-token-efficiency.md @@ -454,11 +454,11 @@ docs/pipeline-token-efficiency.md の「検証方法」を実行。 | 改善案 | 配置 | 状態 | 採用日 | 完了 PR | 備考 | |---|---|---|---|---|---| -| #C-3 rate-limit skip | **PR 1** | 計画 | - | - | PR #97 の rate-limit 検出を流用 | -| #A-2 trivial PR skip | **PR 1** | 計画 | - | - | 単独実施可 | -| #B-β 制約付き fix instruction | **PR 2** (Bundle Z Phase 2) | 計画 | 2026-05-01 | - | PR #99 の例外マーカー定数と同期必須 | -| #B-γ reviewer 役割変更 | **PR 3** (Bundle Z Phase 3) | 計画 | 2026-05-01 | - | PR 2 dogfood 完了が前提 | -| #C-2 Iter 3 短絡 | **PR 3** (fix-trust 連帯) | 計画 | - | - | PR 3 で #B-γ と同梱 | +| #C-3 rate-limit skip | **PR 1** | 完了 | 2026-05-03 | #102 | PR #97 の rate-limit 検出を流用 | +| #A-2 trivial PR skip | **PR 1** | 完了 | 2026-05-03 | #102 | 単独実施可 | +| #B-β 制約付き fix instruction | **PR 2** (Bundle Z Phase 2) | 完了 | 2026-05-03 | #103 | PR #99 の例外マーカー定数と同期必須 | +| #B-γ reviewer 役割変更 | **PR 3** (Bundle Z Phase 3) | 実装中 | 2026-05-04 | (進行中) | PR #103/104/105 dogfood 完了、anomaly mode に転換 | +| #C-2 Iter 3 短絡 | **PR 3** (fix-trust 連帯) | 実装中 | 2026-05-04 | (進行中) | PR 3 で #B-γ と同梱、convergence_verdict マーカー導入 | | #A-3 transcript filter | **番外** | 計画 | - | - | スキマ時間の単独 PR か #D-4 再評価時に同梱 | | #D-4 応答スタイル簡素化 | (保留) | 保留 (ADR-034) | 2026-05-02 | - | Bundle Z (PR 2 + PR 3) 完了後再評価 | From 23ba4686036eaff8d07188a07e6a705e1afa8b0a Mon Sep 17 00:00:00 2001 From: aloekun Date: Mon, 4 May 2026 05:35:08 +0900 Subject: [PATCH 3/3] fix(review): apply CodeRabbit Major fix for #106 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Finding #1 (Major / .takt/facets/instructions/review-simplicity.md:35): 'Hidden coupling' を anomaly criteria に残しつつ cross-file 全面禁止すると 裏取り不可能で矛盾する指摘 (resolved)。Scope constraint を softening し、 limited cross-file lookups を anomaly verification 目的に限定して許可する。 findings 自体は依然 diff hunk traceable のまま。 Finding #2 (Minor / docs/todo.md:73): reject。docs/todo.md の運用ルール '新規タスクは追加しない' は task body 追加禁止で、priority table 行追加は 既存運用 (順位 32 で既知の文言整合タスクとして登録済) のため本 PR では 変更しない。 --- .takt/facets/instructions/review-simplicity.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.takt/facets/instructions/review-simplicity.md b/.takt/facets/instructions/review-simplicity.md index fd554bc..3f56940 100644 --- a/.takt/facets/instructions/review-simplicity.md +++ b/.takt/facets/instructions/review-simplicity.md @@ -31,7 +31,7 @@ For each anomaly, articulate **what looks unusual**, **why it caught your attent ## Scope constraint -Review ONLY the lines changed in the diff. Do NOT explore cross-file dependencies, call chains, or project-wide architecture. Every anomaly finding must be traceable to a specific hunk in the diff. +Review primarily within the changed diff. **Limited** cross-file lookups are permitted only to *verify* an anomaly already raised by the diff (e.g., confirming a hidden coupling, checking whether a referenced symbol exists, distinguishing dead-on-arrival code from a legitimate caller elsewhere). Do NOT use this allowance to expand into project-wide architecture review, unrelated call chains, or speculative exploration. Every anomaly finding must still be traceable to a specific hunk in the diff — cross-file evidence supports the finding, it does not become its own finding. ## Scope of DRY / YAGNI (do NOT raise findings outside this scope)