diff --git a/.takt/facets/instructions/aggregate-weekly.md b/.takt/facets/instructions/aggregate-weekly.md new file mode 100644 index 0000000..b5ac503 --- /dev/null +++ b/.takt/facets/instructions/aggregate-weekly.md @@ -0,0 +1,188 @@ +# Aggregate Weekly Review + +3 つの whole-tree レビュー (simplicity / security / architecture) を統合し、週次レビューレポートと構造化 findings JSON を生成する。 + +ADR-031 § Findings スキーマ + § 採否フロー の input source として findings JSON を produce する設計。skill 側 (Phase C 予定) が JSON を読んで AskUserQuestion で採否を確認するため、本 facet は構造化データの単一 source。 + +**重要な原則:** + +- 読み取り専用 (`edit: false`)。コードの修正は一切行わない (採否は Phase C skill とユーザー判断で行う) +- findings がない場合は「特筆事項なし」で正常終了する。無理に findings を捻出しない +- 重複する findings はマージし、`location` と `rationale` を統合する +- severity の自動配点を最終手段とせず、3 reports の articulation を尊重する +- **各 finding に Severity / Category を必須付与する** (ADR-031 § Findings スキーマ) + +--- + +## Input + +### Report Directory (takt が提供) + +本 step (`pass_previous_response: false`) は前 step の response を受け取らない。代わりに Report Directory に保存された 3 reports を Read で読み取る: + +- `simplicity-whole-review.md` — review-simplicity-whole facet の出力 +- `security-whole-review.md` — review-security-whole facet の出力 +- `architecture-whole-review.md` — review-architecture-whole facet の出力 + +### Context + +実行日は本 step の wall clock を `YYYY-MM-DD` 形式で取得 (UTC でも JST でも一貫していればよい。findings id の prefix に使う)。 + +## Phase 1: 3 reports の統合 + +各 report の findings を抽出し以下のルールで統合する: + +1. **重複検出**: 同じ `location.path` + 似た description / 同じ category の finding はマージする (3 facets 間で観点が重なるケースあり、例: simplicity が dead code、architecture が ADR-012 違反として同じ symbol を flag) +2. **rationale 統合**: マージした finding の rationale 部分に複数 facet (simplicity / security / architecture) を併記する +3. **severity 確定**: 各 finding の severity は facet が articulate した severity を尊重する。複数 facet が異なる severity を articulate した場合は **高い方** を採用する (例: simplicity が medium、security が high なら high) +4. **品質フィルタ** (最初から表に乗せない): + - 一般的なベストプラクティスの押し付け (具体的な file / line evidence なし) + - すでに hooks-config.toml / custom-lint-rules.toml / cli-docs-lint で機械的に検出される pattern (Read で確認可能) + - 対象ファイルが read-only zone (`.takt/runs/`, `.claude/feedback-reports/` 等の generated artifact) のみで意味のある編集箇所が示せないもの + +## Phase 2: 各 finding に Severity / Category / Recommendation を確定する + +各 finding について以下の rubric に基づいて判定列を埋める。**この評価は採用判定をユーザーへ委ねるための材料**であり、AI が判定を独占しない。明確に判定できない場合は中庸な値 (`medium` / `🤔 様子見`) を選び、rationale で不確実性を明示する。 + +> **AI agent への明示禁則**: 本 report の生成完了 = ユーザーへの提示完了に過ぎず、Claude / Codex / Opencode 等の agent は `✅ 採用候補` を読んだだけで採用処理 (`docs/todo*.md` への entry 追加 / 実装着手 / ADR 編集 等) に進んではならない。**必ずユーザーの明示承認 (AskUserQuestion 回答 or テキスト承認のいずれか) を待つこと**。本 report の Recommendation 列は analyzer 推奨であり、確定判断ではない。 + +### Severity rubric (ADR-031 § Findings スキーマ準拠) + +| 値 | 該当する状況 | +|---|---| +| `critical` | data loss / security 脆弱性 / 致命的バグ / production-down リスク | +| `high` | 機能 bug / silent failure / data integrity 違反 / systemic harness drift | +| `medium` | UX 低下 / 累積複雑度 / dead code / 局所 ADR drift | +| `low` | style / micro-optimization / docs typo | + +### Category rubric + +simplicity / security / architecture facets が emit する category を以下に正規化する: + +- `harness-duplication` — rule / pipeline / hook 重複 +- `adr-alignment` — ADR と実装の drift +- `docs-internal` — docs 間 cross-ref drift (cli-docs-lint で取れない meta pattern) +- `docs-source-drift` — docs と source の矛盾 +- `module-boundary` — モジュール境界違反 +- `cyclic-dep` — 循環依存 +- `layer-violation` — レイヤ侵犯 +- `adr-naming` — ADR-012 命名違反 +- `test-anti-pattern` — TDD anti-pattern / 境界欠落 +- `cumulative-complexity` — 累積複雑度 +- `dead-code` — 未参照コード +- `overspec` — overspec'd abstraction +- `secret-exposure` — 機密漏出パターン +- `injection` / `auth-flaw` / `crypto-weak` / `unsafe-no-safety` / `path-traversal` / `prompt-injection` — security category + +category が複数該当する場合は最も特徴的な 1 つを採用、補助 category は description で言及する。 + +### Recommendation rubric + +3 種類のいずれかを必ず emit する: + +| 値 | 該当する状況 | +|---|---| +| `✅ 採用候補` | `severity ∈ {medium, high, critical}` AND `category が systemic (= harness-duplication / adr-alignment / test-anti-pattern / secret-exposure 等)` AND `Adoption Risk が弱い`。**ユーザー承認後に採用確定**。 | +| `🤔 様子見` | 採用根拠は弱いが将来発生時に再評価したい (Severity 高だが location が局所的、Adoption Risk が中間、Phase C/D で再評価したい等)。✅ にも ❌ にも振り切れない場合の中庸 | +| `❌ 却下推奨` | `severity ∈ {low}` AND `category が局所 (= docs typo / style)` OR `(Adoption Risk が strong: 過剰一般化 / NLP 必要 / false positive リスク / takt test infra 未調査)` OR `(実害観測前の preventive over-engineering)`。**ユーザー承認後に却下確定** (Claude 単独で却下処理しない)。 | + +## Phase 3: findings id 採番と JSON 生成 + +各 finding に id を採番: + +- format: `WR--` +- facet_initial: `S` (simplicity) / `C` (security) / `A` (architecture) / `M` (multi-facet merged) +- sequence: 同 facet 内で 01 から連番 (`01` / `02` / ...) + +例: `WR-2026-05-29-A03` = 2026-05-29 実行、architecture facet 由来、3 番目。 + +JSON は ADR-031 § Findings スキーマ準拠で `findings.json` というファイル名で write する (workflow の output contract では `name: findings.json` + `format: findings-json` として宣言されている — `findings.json` がファイル名、`findings-json` が契約 (format) 名): + +```json +{ + "run_date": "2026-05-29", + "report_path": ".claude/weekly-reviews/2026-05-29.md", + "findings": [ + { + "id": "WR-2026-05-29-A03", + "facet": "architecture", + "severity": "high", + "category": "harness-duplication", + "location": { "path": "src/foo.rs", "line_range": "120-145" }, + "description": "...", + "proposal": "...", + "decision": "pending", + "recommendation": "✅ 採用候補", + "rationale": "..." + } + ] +} +``` + +`decision` field は常に `pending` で出力 (Phase C skill が AskUserQuestion 経由で `adopted` / `rejected` / `deferred` に書き換える)。`recommendation` / `rationale` は analyzer 推奨を保持する補助 field。 + +## Phase 4: Markdown report 生成 + +Markdown は人間 / Claude が読む summary 層。findings table を severity 順 (critical → low) で並べ、facet ごとの観察メモを追記する。 + +### Required output (Markdown) + +```markdown +## Weekly Review Report () + +### スコープ +- 対象ツリー: `src/` / `scripts/` / `.claude/` / `.takt/` / `docs/` +- レビューファセット: simplicity-whole / security-whole / architecture-whole +- 採否方針: Phase C skill `/weekly-review` で AskUserQuestion 経由 + +### 統合 findings + +#### Severity: critical / high + +| ID | Facet | Category | Location | Description | Proposal | Recommendation | Rationale | +|---|---|---|---|---|---|---|---| + +#### Severity: medium + +| ID | Facet | Category | Location | Description | Proposal | Recommendation | Rationale | +|---|---|---|---|---|---|---|---| + +#### Severity: low + +| ID | Facet | Category | Location | Description | Proposal | Recommendation | Rationale | +|---|---|---|---|---|---|---|---| + +### Facet 観察メモ + +- **simplicity-whole**: +- **security-whole**: +- **architecture-whole**: + +### 次のアクション + +**重要**: 本 report の Recommendation 列はすべて analyzer の推奨であり、ユーザー明示承認なしに採用・却下を確定してはならない。Claude / 他 AI agent は report を読んだだけで `docs/todo*.md` への entry 追加、実装着手、ADR 編集等を実行してはならず、**必ずユーザー承認 (AskUserQuestion 回答 or テキスト承認のいずれか) を待つこと**。 + +- `✅ 採用候補`: Phase C skill `/weekly-review` での AskUserQuestion 採用、`docs/todo*.md` 系列への登録または直接実装 +- `🤔 様子見`: Phase C/D の dogfood トリガで再評価、現時点で action なし +- `❌ 却下推奨`: ユーザー承認後に却下確定、`docs/todo*.md` への登録不要 (Claude 単独で却下処理しない) +- このレポートは `.claude/weekly-reviews/.md` に保存される (`.gitignore` 除外、内部 artifact) +- 構造化 findings は `findings.json` として並置保存される (Phase C skill 入力) +``` + +findings がゼロ severity (= 該当 finding なし) の section は省略する。 + +findings 全体がゼロの場合は以下を出力: + +```markdown +## Weekly Review Report () + +### スコープ +- 対象ツリー: `src/` / `scripts/` / `.claude/` / `.takt/` / `docs/` +- レビューファセット: simplicity-whole / security-whole / architecture-whole + +特筆すべき findings なし。3 facet いずれも whole-tree レビューで blocking concern を発見しませんでした。 + +決定論層 + diff-local レビュー + post-pr-review が現状の coherence を保っている状態と解釈できます。 +``` + +最後に `aggregation complete` で終了する。 diff --git a/.takt/facets/instructions/review-architecture-whole.md b/.takt/facets/instructions/review-architecture-whole.md new file mode 100644 index 0000000..6b21099 --- /dev/null +++ b/.takt/facets/instructions/review-architecture-whole.md @@ -0,0 +1,100 @@ +Focus on **whole-tree architecture coherence**. This facet detects drift between the codebase's actual structure and the design rationale codified in ADRs / global rules / harness layers. It is invoked by the weekly-review workflow (ADR-031) and is the **primary detection layer** for harness drift, docs↔code divergence, and ADR violations that no diff-local facet can see. + +There is no diff-local analog of this facet. Architecture coherence is intrinsically whole-tree. + +## Criterion 0 (MVP top priority): Harness adherence — rule / pipeline / hook duplication + +This project deliberately migrates "rules" (LLM-prompt-enforced) → "pipelines" (procedural) → "hooks" (mechanically enforced) when feasible (順位 146-151 Bundle "既存ルール仕組み化"). Detect three failure modes: + +1. **Rule + hook overlap**: A rule in `~/.claude/rules/common/*.md` or `CLAUDE.md` codifies a constraint that a PostToolUse / PreToolUse hook also enforces. Either the rule is residual after hook landing (delete the rule, point to the hook) or the hook is a partial mechanization (consolidate the rule into the hook's coverage). +2. **Pipeline + hook overlap**: A push-runner stage in `src/cli-push-runner/src/stages/` reproduces a check that the same hook performs at write time. Either the stage is defense-in-depth (acceptable, but `docs/adr/` should justify) or the stage is residual. +3. **Multi-layer rule fragmentation**: One conceptual constraint expressed in 3+ places (rule + pipeline + hook + ADR) without `docs/adr/` documenting which is the authoritative source and which are reminders. + +For each finding, name the **specific files** that overlap, and propose which layer should own the constraint going forward. Memory rule `feedback_pipeline_over_rules.md` is the bias to apply: when a constraint can be mechanized, the pipeline / hook layer should own it; rules become pointers, not enforcement. + +Hook + pipeline + rule entry points to enumerate: + +- `~/.claude/rules/common/*.md` — global rules (referenced via `Read` if needed; do not modify) +- `CLAUDE.md` — project rules +- `.claude/custom-lint-rules.toml` + `.claude/hooks-config.toml` — declarative hook config +- `src/cli-push-runner/src/stages/*.rs` — push-time stages +- `src/hooks-*/src/main.rs` — hook implementations +- `.takt/workflows/*.yaml` — facet-time review prompts (rules expressed as LLM instructions) + +## Criterion 1: ADR alignment + +Spot drift between ADRs and the code they describe. Priority ADRs (read these whenever you raise an alignment finding): + +- ADR-007 (custom-linter layer boundary) +- ADR-012 (src/ naming convention) +- ADR-021 (jj change detection principles) +- ADR-022 (automation responsibility separation — `edit: false` + write zones) +- ADR-030 (deterministic post-merge-feedback) +- ADR-031 (weekly review pipeline — this facet's own spec) +- ADR-036 (Bundle Z 3-layer review architecture) + +For non-priority ADRs, read on demand only when a finding suggests violation. Do NOT load all ADRs end-to-end (context budget). + +Patterns to flag: + +- ADR specifies behavior X, code implements behavior X' (drift) +- Code introduces an architecture decision that no ADR captures (undocumented decision) +- ADR was superseded but the code still references the superseded ADR's structure +- ADR documents a 3-layer pattern (e.g. ADR-036) but a workflow uses only 2 layers without rationale + +## Criterion 2 (sub criterion): docs 内整合性 + +Cross-document consistency within `docs/`: + +- ADR cross-references that point to non-existent sections (similar to PR #94 / #111 / #132 broken-cross-ref pattern; cli-docs-lint (順位 95+96 = PR #179) is the mechanical baseline — flag patterns that escape the mechanical check) +- ADR families that should be linked but are not (e.g. two ADRs covering related decisions with no "see also") +- Index drift: `CLAUDE.md` ADR list missing recently landed ADRs, or listing ADRs that have been retired without "Superseded" annotation + +Skip individual broken-link cases that cli-docs-lint already catches; this criterion is the meta-pattern detector ("the mechanical check missed a class"). + +## Criterion 3 (sub criterion): docs ↔ source 矛盾 + +Drift between docs and the source they describe. Read only the priority docs: + +- `CLAUDE.md` +- ADRs from § Criterion 1 priority list + +Patterns: + +- ADR says "Rust impl uses X pattern", code uses Y (e.g. ADR-024 specifies shared-jj-helpers location, code uses inline duplication) +- ADR specifies a directory layout, code violates it (ADR-012 naming convention) +- ADR documents a kill-switch / experimental flag, code lacks the flag or applies a different env var name +- `~/.claude/rules/common/*.md` describes a convention, code violates it for a non-trivial fraction of cases (single exceptions are local; a pattern of violations is architectural) + +Do NOT chase low-level style drift that the deterministic layer (`hooks-post-tool-comment-lint-rust` etc.) already catches. This criterion is for whole-tree-shaped drift only. + +## Criterion 4: Module boundaries, cyclic dependencies, layer violations + +Whole-tree patterns the diff-local facet cannot see: + +- **Cyclic dependencies** between crates / modules: cross-grep `[dependencies]` blocks in `src/*/Cargo.toml` with `Grep` to find A→B→A patterns. Symptom-level: `use crate::` imports forming round trips +- **Layer violations**: e.g. a `stages/` module reaching into another crate's private impl, or a `cli-*` crate calling another `cli-*` crate directly instead of via a shared lib (ADR-024) +- **God modules**: single modules that absorb responsibilities from 3+ unrelated concerns +- **ADR-012 violations**: directory names that do not match the ADR-012 naming convention + +For each finding, name the specific files / crates and propose the smallest restructuring (`extract this trait`, `move X to lib Y`, etc.). Do NOT recommend large refactors without estimating effort and dependency depth. + +> **任意の追加検証**: `cargo tree` / `cargo modules` で循環依存を構造的に確認したい場合は、本 facet の `allowed_tools` (`Read` / `Glob` / `Grep`) では実行できないため、finding として上げる際に「reviewer が手動で `cargo tree -d` を実行して確認すること」と proposal で明示する。本 facet 単独で完結する判定は Grep ベースで行う。 + +## Scope constraints + +- Read selectively. Glob to enumerate, Read to verify, Grep to confirm. Do NOT load the full `src/` tree. +- For ADRs: priority list above; load others on demand only when a finding suggests they are relevant. +- When in doubt about scope, prefer fewer high-confidence findings over many low-confidence ones. Aggregate-weekly will weight by severity / frequency / adoption risk — your role is to surface the architectural concern with evidence. + +## Judgment procedure + +1. Glob the structural surfaces (`.claude/`, `.takt/`, `src/cli-*/src/stages/`, `src/hooks-*/src/`, `docs/adr/`, `CLAUDE.md`). +2. Start with Criterion 0 (harness adherence) — this is the MVP top priority and the continuous source of "既存ルール仕組み化" candidates. +3. For each suspected pattern, verify the file evidence (`Grep` for symbol existence, `Read` ADR sections referenced). +4. Classify each verified concern by severity (`critical` / `high` / `medium` / `low`) and category (one of: `harness-duplication` / `adr-alignment` / `docs-internal` / `docs-source-drift` / `module-boundary` / `cyclic-dep` / `layer-violation` / `adr-naming`). +5. Write the report per the output contract (`architecture-whole-review.md`). End with `analysis complete`. + +## Scope boundary + +Severity / Frequency / Adoption Risk / Recommendation rubric is delegated to aggregate-weekly. Surface the concern with file-level evidence; let aggregate weight it. diff --git a/.takt/facets/instructions/review-security-whole.md b/.takt/facets/instructions/review-security-whole.md new file mode 100644 index 0000000..e3c2192 --- /dev/null +++ b/.takt/facets/instructions/review-security-whole.md @@ -0,0 +1,74 @@ +Focus on **whole-tree security anomaly detection**. Categorical vulnerability classes (injection / auth / data exposure / crypto / unsafe code / path traversal) remain in scope, but applied at tree level: look for **patterns that span multiple files** or **invariants that the whole codebase silently relies on**, not single-line vulnerabilities the diff-local facet already covers. + +The diff-local `review-security.md` is a separate facet with diff scope; this file is NOT a derivative for shared use. ADR-031 § アンチパターン: `review-simplicity.md` を whole-tree 用と共有してはならない (security 版でも同原則) prohibits common-izing whole-tree review with diff-local review. + +## Reading the source tree + +Read selectively rather than end-to-end: + +1. `src/**/*.rs` — Rust crates, with priority on: + - Anything containing `unsafe` blocks (use `Grep` on `src/**/*.rs` for the literal `unsafe` keyword to enumerate hotspots recursively) + - Anything reading env vars / arguments / external input (`std::env::var` / `clap` argument structs / `serde_json::from_str` on untrusted input) + - Anything writing files / spawning processes (`std::fs::write` / `std::process::Command`) +2. `scripts/**/*.{ps1,sh}` — shell scripts, with priority on commands that take user input or call subprocesses +3. `.claude/**/*.{toml,json}` — hook configuration (block lists, allow lists, secret patterns) +4. `.takt/facets/instructions/**/*.md` + `.takt/workflows/**/*.yaml` — facet prompts (prompt injection surface) and workflow `allowed_tools` (privilege scope) +5. `docs/adr/*.md` — referenced when an ADR documents a security boundary (e.g. ADR-022 responsibility separation, ADR-028 exec gates) + +Use `Glob` + `Grep` to locate hotspots; do NOT load every file. + +## Project-Specific Context (read before judging) + +Before evaluating, read: + +1. `CLAUDE.md` — Project overview and ADR index +2. `docs/adr/adr-012-src-naming-convention.md` — naming convention (so each crate's responsibility is clear) +3. `docs/adr/adr-022-automation-responsibility-separation.md` — facet `edit: false` and write zone constraints (a security-relevant invariant) +4. `docs/adr/adr-028-pnpm-create-pr-gate.md` — external-visible artifact execution gates + +Do not treat documented precedence rules, override mechanisms, or kill-switches as vulnerabilities by themselves. To raise a blocking finding, make the exploit path concrete: who controls what input, and what newly becomes possible. + +## Vulnerability dimensions (use as memory aid, not a checklist) + +Same dimensions as the diff-local facet, but flagging criterion is whole-tree-shaped: **the exploit path may span multiple files**. + +- **Injection attacks** (whole-tree): actor-controlled input that flows through 2+ files before reaching an interpreter without escaping +- **Authentication / authorization flaws** (whole-tree): missing checks at one entry point that another entry point assumes are present (asymmetric guard coverage) +- **Data exposure risks** (whole-tree): secret patterns that appear in commit-able files (logs, examples, test fixtures) — pair with `grep -rE` for known token prefixes (`AKIA` / `sk-` / `ghp_` / `sk-ant-` etc.) +- **Cryptographic weaknesses** (whole-tree): weak algorithms or PRNGs centralized in one module and consumed by many call sites — single fix point but wide blast radius +- **Unsafe code coverage** (Rust): all `unsafe` blocks reviewed for `// SAFETY:` comments covering every invariant; if comments omit invariants the code relies on, flag the omission +- **Path traversal** (whole-tree): unsanitized path handling in shared helpers (e.g. `~/.takt/shared-jj-helpers` per ADR-024) +- **Prompt injection surface** (LLM-specific, whole-tree): facet prompts that read user-controlled artifacts (PR titles, commit messages, transcript text) without quoting or sanitization. `analyze-pr.md` / `analyze-session.md` and similar are the primary surface + +## Anomaly mode (preferred entry point) + +Read the tree once, hopping by reference rather than file order. If a pattern reads as **unusual / unexplained / hard to justify** from a security standpoint, that is your primary signal. Dimensions above are memory aids, not substitutes. + +For each finding, articulate: + +- **What is unusual or risky** +- **Who controls the relevant input or configuration** (user / external API / commit / PR title / etc.) +- **What newly becomes possible** (data access, privilege, code execution, prompt modification, secret exfiltration) +- **Files involved** (whole-tree findings often span 2+ files) + +If you cannot articulate the third bullet, the finding is speculative — downgrade or drop it. + +## Asymmetric guard coverage (whole-tree specific) + +A class of whole-tree finding the diff-local facet structurally cannot see: one entry point that protects an invariant, another entry point that does not. + +Example: PR body building in cli-pr-monitor sanitizes against shell argument truncation (PR #181 dogfood) but a hypothetical second PR-creation path elsewhere skips the sanitization. Flag the asymmetry, not the unprotected path alone. + +Use `Grep` to find sibling entry points that perform conceptually similar operations, and verify whether their guard coverage matches. + +## Judgment procedure + +1. Glob the primary directories (§ Reading the source tree). Prioritize hotspots noted above. +2. Read selectively; follow references via `Grep` rather than depth-first traversal. +3. For each pattern, verify the concrete exploit path (input control, files traversed, what becomes possible). +4. Classify each verified concern by severity (`critical` / `high` / `medium` / `low`) per ADR-031 § Findings スキーマ. +5. Write the report per the output contract (`security-whole-review.md`). End with `analysis complete`. + +## Scope boundary + +The aggregate-weekly facet decides the final severity for the weekly report. Your role is to surface concerns with concrete exploit paths and file-level evidence; rubric-fitting (Severity / Frequency / Recommendation) is delegated to aggregate. diff --git a/.takt/facets/instructions/review-simplicity-whole.md b/.takt/facets/instructions/review-simplicity-whole.md new file mode 100644 index 0000000..5b015a9 --- /dev/null +++ b/.takt/facets/instructions/review-simplicity-whole.md @@ -0,0 +1,84 @@ +Focus on **whole-tree anomaly detection** for simplicity — patterns that read as cumulative complexity, dead-on-arrival abstractions, or test design that does not protect behavior. This facet is invoked by the weekly-review workflow (ADR-031) and reviews the entire source tree, not a diff. + +The diff-local `review-simplicity.md` is a **separate facet** with different scope; this file MUST NOT be merged with it (ADR-031 § アンチパターン: `review-simplicity.md` を whole-tree 用と共有してはならない). Their concerns are orthogonal: diff-local guards "is the change locally readable", whole-tree guards "has accumulated complexity outgrown the reader's capacity". + +## Reading the source tree + +Glob the primary directories in this order and read selectively. Do NOT load every file end-to-end; sample the largest / most-recently-changed files first and follow references: + +1. `src/**/*.rs` — Rust crates (largest surface area) +2. `scripts/**/*.ps1` / `scripts/**/*.sh` — automation scripts +3. `.takt/facets/instructions/**/*.md` + `.takt/workflows/**/*.yaml` — facet prompts and workflow chains (LLM behavior surface) +4. `.claude/**/*.toml` + `.claude/**/*.json` — hook configuration (deterministic layer surface) +5. `docs/adr/*.md` + `CLAUDE.md` — design rationale (referenced for ADR alignment checks) + +Use `Glob` first to enumerate, then `Read` selectively. Use `Grep` to follow specific symbols / patterns. Do NOT run `git diff` / `jj diff` — this is a whole-tree review, not a diff review. + +## Determinism layer guarantees (do NOT duplicate) + +The following dimensions are enforced by deterministic hooks (PostToolUse / push-runner quality_gate) and need not be re-checked here: + +- **Comment policy** (`hooks-post-tool-comment-lint-rust`): Non-doc comments blocked at write time +- **Function length** (順位 48, same hook): Functions >50 lines blocked (touch-trigger ratchet) +- **Function metrics during fix** (`fix-metrics-check.ps1`): non-doc comment count, function length, max nesting depth cannot increase per function during fix iterations +- **File length** (順位 147, planned): when landed, replaces manual whole-tree file size review + +Skip these dimensions. If the determinism layer is the right home for a pattern, raise a finding suggesting the layer extension, but do not enumerate per-file violations the deterministic layer already catches. + +## Criterion 0 (MVP top priority): Test logic anomalies + +Test code is the part of the tree the deterministic layer least covers. Examine `**/*_test.rs`, `**/*tests/*.rs`, `**/__test*.ps1`, `**/test_*.py` etc. for: + +- **Behavior vs implementation detail drift**: tests that assert internal call ordering, mock invocation count, or struct field shape when the behavior they purport to verify is functional output +- **Boundary coverage gaps**: pure functions or guards whose `None` / empty / threshold-equal / overflow cases are not independently exercised (see also order-of-operations guards under § Multi-condition guard test isolation in `~/.claude/rules/common/code-review.md`) +- **Mock-heavy integration tests**: tests that mock the very dependency they claim to integrate-test (mocked DB in a migration test, mocked HTTP in an end-to-end test) +- **Silent regressions**: tests that pass without observably exercising the production path (e.g. `assert!(true)` after a failing setup that was swallowed) +- **Test DRY violations**: shared helper that hides variant differences. Per `feedback_test_dry_antipattern.md`, each test variant should be independently set up; helpers that conceal the per-variant `setup → act → assert` triple are an anti-pattern even if they reduce LoC + +For each finding, articulate **what the test fails to protect** and **what production change would slip past it**. If you cannot articulate the second bullet, downgrade the finding. + +## Criterion 1: Cumulative complexity (whole-tree only) + +Patterns that no single diff can flag because each individual hop looks reasonable: + +- **Indirection chains longer than 3 hops**: A → B → C → D where each layer adds one transformation but no decision +- **Parallel hierarchies**: parallel module trees that re-implement the same conceptual operation (e.g. two error converters that diverge slowly) +- **Abstraction premium without callers**: traits / interfaces with one implementor, or generic types with one concrete instantiation, that were introduced for "future extensibility" but no second caller materialized + +For each finding, name the **specific files** that constitute the chain and propose a concrete consolidation. If you cannot name files, the finding is speculative. + +## Criterion 2: Dead-on-arrival code (whole-tree) + +Code paths with no observable caller anywhere in the tree: + +- Functions / methods / types reachable only from their own unit tests +- Configuration fields / enum variants never read +- Generic parameters never instantiated with more than one type + +Use `Grep` to verify zero callers before raising. A finding without a verified zero-caller search is speculative. + +## Criterion 3: Overspec'd abstractions (whole-tree) + +Abstractions that exceed the requirement they document: + +- Generic types where a concrete type would compile +- Builder patterns for 1-2-parameter structs +- Trait objects where a function pointer or enum would suffice +- Layered error types that wrap each other without adding context + +For each finding, articulate **what the simpler form would lose** (and verify the loss is not currently exploited). + +## Calibration + +Whole-tree review tempts checklist-thinking ("file X has 800 lines, so flag it"). Resist. The deterministic layer already enforces objective metrics. This facet exists to catch the patterns no checklist names. If you can only flag something by mechanically applying a rule, the deterministic layer already handles it (or should — flag the gap, not the per-file count). + +Conversely, if reading the tree leaves you with a concrete unease that you can articulate, raise it — even if it doesn't fit a named criterion. Per `feedback_pipeline_over_rules.md`, fuzzy detection belongs in the pipeline layer (this facet), not in unenforceable rules. + +## Judgment procedure + +1. Glob the primary directories (§ Reading the source tree). Skim file names + sizes. +2. Sample the largest / most-recently-modified files first; cover all 5 directory groups. +3. For each section read, note any pattern matching Criteria 0-3. +4. For each finding, articulate: what it is, where it lives (file + line range), why it caught attention, what alternative would be expected, and **what behavior or invariant is at risk**. +5. Classify each finding by severity (`critical` / `high` / `medium` / `low`) per ADR-031 § Findings スキーマ. +6. Write the report per the output contract (`simplicity-whole-review.md`). End with `analysis complete`. diff --git a/.takt/workflows/weekly-review.yaml b/.takt/workflows/weekly-review.yaml new file mode 100644 index 0000000..420e11f --- /dev/null +++ b/.takt/workflows/weekly-review.yaml @@ -0,0 +1,135 @@ +name: weekly-review +description: > + プロジェクト全体を対象にした週次レビュー workflow (ADR-031)。 + 既存 3 パイプライン (pre-push-review / post-pr-review / post-merge-feedback) + が見ない whole-tree 観点 (累積複雑度 / 横断的 ADR 整合性 / ハーネス遵守 / + test 設計) を補完する。 + 3 つの whole-tree review facet を parallel で走らせて aggregate-weekly が + findings JSON + markdown を生成する。post-merge-feedback.yaml の構造を流用。 + fix loop なし: コードではなく findings レポートを生成するため + reviewers/fix の構造は不要。 + 出力: weekly-review.md + findings.json (Report Directory に保存) + Phase C skill `/weekly-review` が findings.json を読んで AskUserQuestion で + 採否を確認、採用分を docs/todo*.md に展開する。 + +workflow_config: + provider_options: + claude: + network_access: false + codex: + network_access: false + opencode: + network_access: false + +max_steps: 10 +initial_step: reviewers + +steps: + # --------------------------------------------------------------------------- + # Step 1: reviewers (parallel) + # 3 つの whole-tree review facet を並列実行する。各 facet は独立した観点で + # ソースツリー全体を Glob/Read/Grep し、findings を articulate する。 + # Report Directory には simplicity-whole-review.md / + # security-whole-review.md / architecture-whole-review.md が並ぶ。 + # --------------------------------------------------------------------------- + - name: reviewers + parallel: + # 観点 ⑥ テストロジック (TDD anti-pattern + 境界欠落) を criteria 筆頭、 + # 累積複雑度 / dead code / overspec'd abstraction を補助観点。 + - name: simplicity-whole-review + edit: false + persona: simplicity-reviewer + model: haiku + policy: review + provider_options: + claude: + allowed_tools: + - Read + - Glob + - Grep + instruction: review-simplicity-whole + output_contracts: + report: + - name: simplicity-whole-review.md + format: simplicity-whole-review + rules: + - condition: analysis complete + + # 観点 ④ セキュリティ。whole-tree 限定の発見 = 機密漏出 pattern / + # 横断的 prompt injection surface / asymmetric guard coverage。 + - name: security-whole-review + edit: false + persona: security-reviewer + model: haiku + policy: review + knowledge: security + provider_options: + claude: + allowed_tools: + - Read + - Glob + - Grep + instruction: review-security-whole + output_contracts: + report: + - name: security-whole-review.md + format: security-whole-review + rules: + - condition: analysis complete + + # 観点 ① ハーネス遵守 (rule/pipeline/hook 重複) を criteria 筆頭、 + # 観点 ② docs 内整合性 + ③ docs-source 矛盾 を sub criterion で組込。 + # ADR 整合性 / モジュール境界 / ADR-012 命名 / 循環依存 / レイヤ侵犯。 + - name: architecture-whole-review + edit: false + persona: architecture-reviewer + model: haiku + policy: review + knowledge: architecture + provider_options: + claude: + allowed_tools: + - Read + - Glob + - Grep + instruction: review-architecture-whole + output_contracts: + report: + - name: architecture-whole-review.md + format: architecture-whole-review + rules: + - condition: analysis complete + + rules: + - condition: all("analysis complete") + next: aggregate-weekly + + # --------------------------------------------------------------------------- + # Step 2: aggregate-weekly + # 3 つの whole-tree review reports を統合し、severity 順に並べた + # findings JSON + markdown を生成する。 + # pass_previous_response: false で、Report Directory の各 *-review.md を + # 直接読む。post-merge-feedback の aggregate-feedback と同パターン。 + # --------------------------------------------------------------------------- + - name: aggregate-weekly + edit: false + persona: supervisor + model: sonnet + policy: review + provider_options: + claude: + allowed_tools: + - Read + - Glob + - Grep + instruction: aggregate-weekly + pass_previous_response: false + output_contracts: + report: + - name: weekly-review.md + format: weekly-review + - name: findings.json + format: findings-json + rules: + - condition: aggregation complete + next: COMPLETE diff --git a/docs/todo-summary.md b/docs/todo-summary.md index 1a80dd8..498366a 100644 --- a/docs/todo-summary.md +++ b/docs/todo-summary.md @@ -78,6 +78,8 @@ | 162 | 🔧 Tier 2 | **fail-closed error path (Option::None) 個別テスト追加 (PR #177 T2-#2 採用)** | todo9.md | S | なし (PR #177 Major #1 「`behind.unwrap_or(0)` fail-closed 漏れ」fix の回帰テスト、`check_todo_staleness` / `build_todo_staleness_message` の None ケース独立検証、Severity High + Frequency Medium、security gate + Option return pattern の reference) | | 163 | 🔧 Tier 2 | **Cross-ref edge case test coverage 追加 — percent-encode / GFM heading slug / relative path normalize (PR #179 T2-#1 採用)** | todo9.md | S | なし (PR #179 で cli-docs-lint の cross_ref validator を新規実装したが、percent-encode (`%20` / `%23`) / heading slug / `../` resolve の edge case が fixture テストで明示的に保護されていない silent regression リスク回避) | | 164 | 💎 Tier 3 | **ADR-039 kill-switch standard pattern に「診断メッセージは実装の受理値を網羅」原則追記 (PR #179 T3-#1 採用)** | todo9.md | XS | なし (PR #179 で `CLI_DOCS_LINT_DISABLE` の kill-switch message が `=1` 固定で実受理値 `true`/`TRUE`/`True` を反映しなかった spec-impl drift、ADR-039 は全 experimental feature の標準参照のため systemic reach、`docs/adr/adr-039-*.md` 編集) | +| 165 | 🔧 Tier 2 | **`pnpm create-pr` PR body truncation 回避を検証する e2e/integration test 追加 (PR #181 T2-#1 採用)** | todo9.md | S | なし (PR #134 + #181 の 2 回観測で Medium frequency に昇格、memory `feedback_pnpm_create_pr_body` の `--body-file` workaround を自動 regression gate 化、shell argument truncation の境界 (行数/バイト数) を fixture で測定、silent UX 劣化の早期検出、cli-pr-monitor の argv 組み立て層を test 対象、順位 166 と相補 = test 層 vs docs 層) | +| 166 | 💎 Tier 3 | **ADR-028 に PR body 複数行時の `--body-file` 推奨 + shell argument truncation の why/how 補足追記 (PR #181 T3-#1 採用)** | todo9.md | XS | なし (順位 165 の構造的予防策として ADR-028 = pnpm create-pr gate に codify、改行が shell delimiter として処理される why と `--body-file` / `gh pr edit --body-file` の how を 2-3 段落で補足、`docs/adr/adr-028-pnpm-create-pr-gate.md` 編集、memory `feedback_pnpm_create_pr_body` との back-link 整合) | **戦略**: Tier 1 を 2〜3 セッションで片付け → Tier 2 で ADR-032 の前提 + rate-limit + convergence cost 削減を進める → Tier 3 で ADR-032 を land + ドキュメント整備。Tier 4-5 は cleanup / 外部展開で daily efficiency への直接効果は小さい。 diff --git a/docs/todo9.md b/docs/todo9.md index a629269..b6fdd2d 100644 --- a/docs/todo9.md +++ b/docs/todo9.md @@ -738,6 +738,81 @@ ADR-039 § 決定 2 (Kill-switch) に以下の原則を追記: --- +### `pnpm create-pr` PR body truncation 回避を検証する e2e/integration test 追加 (PR #181 T2-#1 採用) + +> **動機**: PR #134 + #181 で 2 回観測された `pnpm create-pr` (= `cli-pr-monitor.exe` の PR 作成モード) における PR body 切り詰め問題。複数 section・複数行の body を `--body "..."` で渡すと shell argument 解釈で改行が delimiter 処理されて body が途中で切れる silent UX 劣化が発生する。memory `feedback_pnpm_create_pr_body` で `--body-file ` workaround を採用済だが、回避策が正常動作することを担保する自動 regression gate が存在しない。 +> +> **本タスクの位置づけ**: PR #181 post-merge-feedback Tier 2 #1 採用 (Severity Medium / Frequency Medium / Effort S / Adoption Risk None、2026-05-29 ユーザー承認)。PR #134・#181 の 2 回観測で Medium frequency に昇格、`--body-file` workaround の regression gate として採用条件成立。 +> +> **参照**: `.claude/feedback-reports/181.md` Tier 2 #1、memory `feedback_pnpm_create_pr_body`、`src/cli-pr-monitor/src/main.rs` (PR 作成モード本体)、`src/cli-pr-monitor/src/stages/` 周辺の `run_create_pr` 実装、PR #134 / #181 の create-pr 実行例 +> +> **実行優先度**: 🔧 **Tier 2** — Effort S。既存 cli-pr-monitor test infra の流用、shell argument truncation 境界の fixture 測定。 + +#### 設計決定 (案) + +- **検証対象**: `pnpm create-pr -- --title "..." --body-file ` 経由で PR を作成した際、body 内容が source file と一致すること (truncation なし、改行保持) +- **境界測定**: PR body 文字数 (行数 / バイト数) を段階的に増やし、shell 直渡し `--body "..."` パスが切り詰める閾値と `--body-file` が切り詰めない閾値の境界を fixture で測定、regression gate として記録 +- **test 方式**: `gh pr create` の dry-run option がないため、cli-pr-monitor の argv 組み立て層を unit test 対象にする (実 PR 作成は行わない)、または integration test で mock gh CLI を介して argv の最終 shape を assert +- **memory `feedback_test_dry_antipattern`**: 各 variant 独立 setup、共通 helper 化しない + +#### 作業計画 + +- [ ] cli-pr-monitor の PR 作成モードで argv 組み立て層を関数化 (test 可能な shape に refactor、必要なら) +- [ ] `#[cfg(test)]` mod に 3 fixture を追加: (a) 短い single-line body、(b) 複数行 body 経由 `--body-file`、(c) 直接 `--body` を渡した場合の truncation 再現 +- [ ] cargo test で pass 確認 + 既存 cli-pr-monitor test との独立性確認 +- [ ] truncation 境界の測定結果を test コメントに記録 (将来の閾値変更時の reference) +- [ ] 本エントリ削除 + todo-summary.md 行削除 + +#### 完了基準 + +- `--body-file` 経由が複数行 body で truncation なしに動作することが unit test で保護 +- shell 直渡し `--body "..."` で truncation が起こる境界が fixture で測定済 +- silent regression を test で 1 件以上検出できる構造 +- 既存 cli-pr-monitor test との独立性 (mock 設定の交差なし) + +#### 詰まっている箇所 + +`gh pr create` 自体に dry-run option がないため、実 PR 作成を伴わない検証戦略を要設計 (argv 組み立て層の関数化 or mock gh CLI)。Effort S 想定だが test 戦略次第で M に膨らむ可能性あり。 + +--- + +### ADR-028 に PR body 複数行時の `--body-file` 推奨 + shell argument truncation の why/how 補足追記 (PR #181 T3-#1 採用) + +> **動機**: PR #134 + #181 で 2 回観測された `pnpm create-pr` の PR body 切り詰め問題 (順位 165 と同根)。memory `feedback_pnpm_create_pr_body` で recurring issue として記録されているが、ADR-028 (pnpm create-pr gate) には why (改行が shell delimiter として処理される) / how (`--body-file` または `gh pr edit --body-file` を使う) が codify されていない。順位 165 が test で防御層を作るのに対し、本タスクは ADR で permanent reference 層を作って後発の AI / reviewer が逆引き可能な状態にする構造的予防策。 +> +> **本タスクの位置づけ**: PR #181 post-merge-feedback Tier 3 #1 採用 (Severity Medium / Frequency Medium / Effort XS / Adoption Risk None、2026-05-29 ユーザー承認)。順位 165 が test 層、本タスクが docs 層で同根を別レイヤで補強する関係。 +> +> **参照**: `.claude/feedback-reports/181.md` Tier 3 #1、memory `feedback_pnpm_create_pr_body`、`docs/adr/adr-028-pnpm-create-pr-gate.md` (補足追記対象)、PR #134 / #181 の create-pr 観測 + +#### 設計決定 (案) + +ADR-028 に以下を補足セクションとして追記: + +- **PR body が複数行/長文の場合は `--body-file ` を使う**: shell argument 直渡し (`--body "..."`) は OS / シェル / `gh` CLI の引数解釈で改行が delimiter 処理され body が途中で切れるケースが PR #134・#181 で観測されている +- **why**: shell が改行を区切りとして解釈、`gh` CLI が受け取る argv に改行が含まれた時点で body が partial 化 +- **how**: (a) PR 作成時は `pnpm create-pr -- --title "..." --body-file ` で file 経由、(b) 既存 PR の body 修正は `gh pr edit --body-file `、(c) scratch file は `__pr-body.md` (gitignore 対象、CLAUDE.md scratch 命名規約) +- **配置**: ADR-028 の決定セクションまたは「実装上の注意」セクションに 1-2 段落で追記、メモリ entry `feedback_pnpm_create_pr_body` への back-link + +#### 作業計画 + +- [ ] `docs/adr/adr-028-pnpm-create-pr-gate.md` の適切な section (決定 / 実装注意点) に上記補足を 2-3 段落で追記 +- [ ] PR #134 / #181 を実例として inline cite +- [ ] markdownlint clean 確認 +- [ ] 本エントリ削除 + todo-summary.md 行削除 + +#### 完了基準 + +- ADR-028 に PR body 複数行ケースの why / how が codify される +- 後発の AI / reviewer が ADR から逆引き可能になる +- memory `feedback_pnpm_create_pr_body` との整合 (memory が ADR 参照を持つ or vice versa) +- markdownlint clean + +#### 詰まっている箇所 + +なし。Effort XS、ADR の section 追記のみで副作用最小。 + +--- + ## 既知課題 (記録のみ、本セッションで未対応) (現時点で本ファイルへの既知課題は無し。docs/todo8.md 末尾の post-merge-feedback workflow stale marker 問題を参照。)