feat(review): enhance /review with deterministic analysis, autofix, and security hardening#2932
feat(review): enhance /review with deterministic analysis, autofix, and security hardening#2932
Conversation
…, autofix, and security hardening
Comprehensive improvements to the /review skill based on competitive analysis
of Copilot Code Review, Claude Code /ultrareview, and Gemini CLI async-pr-review.
Key changes (all prompt-only, no TypeScript code changes):
- P0: Integrate linter/typecheck (Step 1.5) — run project tools before LLM agents,
with error/warning severity distinction
- P1: Add Agent 5 for build & test verification with env/code failure distinction
- P1: Cross-file impact analysis for Agents 1-4 with 10-symbol prioritization limit
- P1: Project custom review rules (.qwen/review-rules.md, copilot-instructions.md,
AGENTS.md, QWEN.md) with base-branch reading for PR security
- P2: Autofix with user confirmation, PR branch commit, and verdict split
(terminal vs PR submission)
- P2: Pattern aggregation for same-type findings across locations
- P2: Confidence levels (high/low) with "Needs Human Review" section
- P2: Skip "Nice to have" from PR inline comments to reduce noise
- P3: Incremental review via .qwen/review-cache/ with rebase fallback
- P3: Report persistence to .qwen/reviews/ with timestamp filenames
Security hardening:
- PR description prompt injection defense (untrusted DATA marker)
- Base-branch rule loading prevents review-bypass injection
- Concurrency-safe temp file paths with {target} suffix
- Safe git stash handling (conditional pop)
- Argument disambiguation (integer vs URL vs file path)
Audited through 14 rounds of undirected review with 40 issues found and fixed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📋 Review SummaryThis PR enhances the 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
- Add tsc --incremental flag to speed up repeated type checks
- Increase type checker timeout to 120s (linters remain 60s)
- Improve cross-file grep patterns to cover .functionName, import { functionName }
- Don't truncate Critical pattern groups — list all locations
- Clarify pre-commit hook as a commit failure scenario in autofix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
There was a problem hiding this comment.
Pull request overview
This PR updates the bundled /review skill instructions to add more deterministic, lower-noise review capabilities (lint/typecheck + build/test), support incremental PR reviews, enable optional user-confirmed autofix, and harden against prompt injection via PR descriptions and PR-authored rule files.
Changes:
- Add Step 1.1 rule loading (base-branch for PRs) and Step 1.5 deterministic analysis (lint/typecheck) with severity mapping.
- Add a 5th parallel agent for deterministic build/test verification plus cross-file impact analysis guidance.
- Add incremental review caching/report persistence, autofix flow, and PR comment noise controls (Critical/Suggestion only).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix filter-then-truncate ordering: capture full linter output first, filter to changed files, then truncate (not head before filter) - Record informational notes for skipped checks instead of silent skip - Agent 5: capture full build/test output, keep first 50 + last 100 lines instead of tail-only (preserves error context) - Fix [Needs Review] vs severity tag contradiction: use both [Needs Review][Suggestion] format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix Step 1.5 intro: clarify whole-project vs per-file tool handling and filter-then-report approach - Fix dedup + deterministic finding ambiguity: merged findings with any deterministic source are pre-confirmed and skip verification - Fix autofix stash orphan: stop and let user handle commit failure instead of silently stashing (which Step 5 wouldn't pop) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The autofix step (Step 3.5) needs targeted text replacement to apply fixes safely. Without the edit tool, only full-file rewrites via write_file would be available, which is risky for partial fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix golangci-lint: use ./... (package pattern) instead of file paths - Unify PR comment prefix format: define canonical prefixes for normal, auto-fixed, and low-confidence findings in the template - Stop workflow entirely on autofix commit failure (dirty tree would block Step 5 branch restore) - Accept broader .gitignore patterns like .qwen/* for cache/reviews dirs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
LLM ignored the "do not submit summary for Comment verdict" rule despite it being written. Restructured: warning emoji + STOP instruction first, then the exception cases. The default is "don't post", not "decide whether to post." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1. Step 9 --comment guard blocked "post comments" follow-up path.
Fixed: allow entry via either --comment flag OR follow-up request.
2. Cross-repo gh pr review needs -R {owner}/{repo}. Added note.
3. "post comments" tip shown even when --comment already set (double
posting risk). Fixed: tip only shown when --comment NOT specified.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Previously: fetch → create worktree → incremental check → "no changes" → delete worktree (wasted time creating it). Now: fetch → incremental check (via git rev-parse on fetched ref) → if no changes, delete ref and stop (no worktree ever created). Worktree only created when review will actually proceed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1. CI config auto-discovery: read from base branch for PR reviews (PR branch is untrusted, malicious PR could inject commands) 2. Incremental early-exit: don't block --comment on unchanged PR — allow posting comments from previous review findings 3. Doc: review summary not always posted (Comment verdict skips it) 4. Doc: cross-repo reviews skip report persistence 5. Doc: clarify "Agents 1-4 findings verified" (not all — reverse audit findings skip verification) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
LLM was ignoring the {{model}} template and writing its own footer
("— Qwen Code /review" instead of "— glm-5.1 via Qwen Code /review").
Added explicit warning: footer must appear EXACTLY as shown, do NOT
shorten or rephrase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --comment is used on an unchanged PR, Step 9 needs prior findings to post. Cache now stores reportPath pointing to the saved report from Step 10, allowing findings to be loaded without re-running the review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loading cached findings from a Markdown report is fragile (unstructured prose, LLM might misparse). Instead, when --comment is specified on an unchanged PR, simply run the full review. The user explicitly wants comments posted — spending 7 LLM calls is acceptable. Removed reportPath from cache schema (no longer needed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When PR review finds no issues and --comment was not specified, suggest "post comments" so the user can formally approve the PR on GitHub. Without this, the LGTM only appears in terminal — no approval status on the PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
User doc and PR description now include the "PR review, zero findings → post comments → approve PR" row in the follow-up actions table. Also fixed PR description: "Step 4" → "Step 9" for post comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM was writing detailed analysis in the review summary body despite "minimal body" instruction. Strengthened to "one-line body only, do NOT include analysis/findings/explanations" with concrete examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review comments, findings, and summaries must use the same language as the PR (title/description/code comments). English PR → English review. Chinese PR → Chinese review. No language switching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three issues found in real review output: 1. Summary repeated findings already posted as inline comments 2. "Review Stats" (agent count, raw/confirmed) is internal noise 3. Summary was too verbose Fix: partial-failure summary must contain ONLY the failed findings. Distinguish terminal output (stats OK) from PR comments (no stats). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logic errors causing incorrect behavior (wrong return values, skipped code paths) were being classified as Suggestion instead of Critical. Added explicit examples: "if code does something wrong, it's Critical — not Suggestion." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inline comments now use ```suggestion blocks when the fix is a direct line replacement. PR authors can accept fixes with one click instead of manually copying code. Falls back to regular code blocks when the fix spans multiple locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t GitHub Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prettier mangled the nested code fences (4-backtick outer + 3-backtick suggestion inner). Replaced with plain-text numbered structure + indented example. Also fixed orphaned 4-backtick fence closing Step B. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced 5 numbered rules + example with example-first format. LLMs pattern-match from examples better than parsing rules. Rules condensed to 2 sentences after the example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues found from real review (PR #2826): 1. Multiple /review runs on same PR create duplicate comments. Now Step 9 checks for existing "via Qwen Code /review" comments before posting and warns the user about potential duplicates. 2. Comments posted without line numbers appear as orphaned PR comments. Now enforced: every inline comment MUST reference a specific line in the diff. Findings that can't be mapped to diff lines go in the summary instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three issues found from real /review output on PR #2921: 1. Critical findings but verdict submitted as --comment instead of --request-changes. Added explicit: "Do NOT use --comment when verdict is Request changes — this loses the blocking status." 2. Nice to have findings appeared in PR summary. Added: "Do NOT include Nice to have findings" to all summary rules. 3. Clarified that failed-inline summary should only contain Critical/Suggestion, never Nice to have. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the two-phase posting (individual gh api comments + separate gh pr review verdict) with a single Create Review API call that bundles inline comments + verdict together — same approach as Copilot Code Review. Benefits: - No summary comment needed (inline comments ARE the review) - No "two-phase posting" complexity - No "STOP for Comment verdict" rules - No duplicate/orphaned reviews - One API call instead of N+1 - Verdict (approve/request_changes/comment) correctly attached Eliminates ~40 lines of complex posting rules replaced by ~30 lines of straightforward JSON construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM was putting all findings in the review body (creating a summary comment) instead of the comments array (inline comments). Added prominent warning: "Findings go in comments array, NOT in body." Also: "Do NOT use COMMENT when there are Critical findings." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
已经测试效果符合预期,比如: |
Background
The current
/review4-Agent parallel + independent verification architecture provides good coverage, but has notable capability gaps compared to Copilot Code Review (60M+ reviews), Claude Code/ultrareview, and Gemini CLIasync-pr-review.Detailed competitive analysis: qwen-code-review-improvements.md
Pipeline
Fixed 7 LLM calls regardless of finding count.
Before / After
gh pr diff <url>— LLM review + PR comments, no linter/build/autofixKey Design Decisions
Primarily prompt changes (SKILL.md), with a small TS addition:
{{model}}template variable in BundledSkillLoader for model attribution in PR review summaries"Silence is better than noise" — aligned with Copilot's design philosophy; linter warnings excluded from PR comments, uncertain issues not reported
PR security hardening — review rules read from base branch (prevents injection), PR descriptions marked as untrusted DATA, concurrent-safe temp file paths
Autofix does not auto-approve — even if all Critical findings are fixed, PR submission uses the pre-fix verdict (remote code not yet updated)
Worktree isolation for PR reviews — ephemeral worktree replaces stash/checkout/restore, eliminating stash orphan and wrong-branch risks entirely
Fixed token cost — batch verification (1 agent for all findings) + 5 review agents + 1 reverse audit = 7 LLM calls regardless of finding count
Follow-up tips leverage the suggestion system — context-aware tips appear as ghost text via the existing follow-up suggestion engine. Users Tab to accept, no blocking prompts:
fix these issuesedittoolpost commentspost commentscommitScope
Test plan
/reviewwith local uncommitted changes — verify linter runs, 5 agents launch, findings displayed/review <pr-number>— verify worktree creation, dependency install, base-branch rules, incremental cache/review <pr-number> --comment— verify inline comments (high-confidence Critical/Suggestion only)/review <file-path>with no diff — verify full-file review, linter runs on file.qwen/review-rules.md— verify rules injected into Agents 1-4 only.qwen/tmp/review-pr-*removed after review completes