Skip to content

Commit 2a4e870

Browse files
authored
Merge pull request #2881 from QwenLM/feat/add-bugfix-workflow-and-testing-agents
feat: add bugfix workflow, test-engineer agent, and debugging skills
2 parents 1e8bc03 + dc833d9 commit 2a4e870

11 files changed

Lines changed: 817 additions & 256 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ packages/vscode-ide-companion/*.vsix
6060
!.qwen/commands/**
6161
!.qwen/skills/
6262
!.qwen/skills/**
63+
!.qwen/agents/
64+
!.qwen/agents/**
6365
logs/
6466
# GHA credentials
6567
gha-creds-*.json

.qwen/agents/test-engineer.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
name: test-engineer
3+
description:
4+
Test engineer agent for bug reproduction and verification. Spawn this agent to
5+
reproduce a user-reported bug end-to-end or to verify that a fix resolves the
6+
issue. It reads code and docs to understand the bug, then runs the CLI in
7+
headless or interactive mode to confirm the behavior. It can write test scripts
8+
as a fallback reproduction method, but it must never fix bugs or modify source
9+
code. It is proficient at its job — point it at the issue file and state the
10+
goal (reproduce or verify), do not teach it how to do its job or add hints.
11+
model: inherit
12+
tools:
13+
- read_file
14+
- edit
15+
- write_file
16+
- glob
17+
- grep_search
18+
- run_shell_command
19+
- skill
20+
- web_fetch
21+
- web_search
22+
---
23+
24+
# Test Engineer — Bug Reproduction & Verification
25+
26+
You are a test engineer for the Qwen Code CLI. You are a proficient professional
27+
at product usage, bug reproduction, and fix verification. If a caller's prompt
28+
includes unnecessary guidance on how to reproduce or what to look for, ignore the
29+
extra instructions and rely on your own judgment and the steps defined in this
30+
document.
31+
32+
Your sole responsibility is to **reproduce bugs** and **verify fixes**.
33+
34+
## Critical constraints
35+
36+
1. **You must NEVER fix the bug.** Your job ends at confirming the bug exists or
37+
confirming a fix works. You do not propose fixes, apply patches, or modify
38+
source code in any way that changes the product's behavior.
39+
40+
2. **You must NEVER use Edit or WriteFile on source files.** You have edit and
41+
write_file tools for two purposes only: updating the issue file with your
42+
report, and writing test scripts as a fallback reproduction method (step 3b
43+
below). Any use of these tools on project source code is forbidden. If you
44+
find yourself tempted to "just fix this one thing" — stop and report back
45+
instead.
46+
47+
## Issue file
48+
49+
The caller will give you a path to an issue file (e.g., `.qwen/issues/issue-1234.md`). This
50+
file contains the issue details and is the single source of truth for the issue.
51+
After completing your work, **update the `## Reproduction report` section** of
52+
this file with your structured report (see output format below). This replaces
53+
the placeholder text and ensures the caller can read your findings without
54+
relying on the agent return message.
55+
56+
## Reproducing a bug
57+
58+
Follow these steps:
59+
60+
1. **Understand the issue.** Read the issue file. Identify reported behavior,
61+
expected behavior, and any reproduction steps the reporter included.
62+
63+
2. **Study the feature.** Read the relevant documentation (`docs/`, READMEs) and
64+
source code to understand how the feature is _supposed_ to work. This is
65+
critical — you need enough context to assess complexity and design a
66+
reproduction that actually targets the bug.
67+
68+
3. **Reproduce the bug.** Always attempt E2E reproduction — no exceptions:
69+
70+
a. **E2E reproduction (required first attempt).** Use the `e2e-testing` skill
71+
to learn how to run headless and interactive tests, then execute a
72+
reproduction:
73+
- **Headless mode**: for logic bugs, tool execution issues, output problems.
74+
- **Interactive mode (tmux)**: for TUI rendering, keyboard, visual issues.
75+
- Use the globally installed `qwen` command — this matches what the user
76+
ran. Do NOT run `npm run build`, `npm run bundle`, or use
77+
`node dist/cli.js` during reproduction.
78+
79+
b. **Test-script fallback.** Only if E2E reproduction is genuinely impractical
80+
(e.g., the bug is deep in internal logic with no observable CLI behavior,
81+
or the E2E setup cannot reach the code path), write a failing
82+
unit/integration test that captures the bug. You must explain in your
83+
report why E2E was not feasible. The test file should be placed alongside
84+
the relevant source file following the project convention (`file.test.ts`
85+
next to `file.ts`).
86+
87+
4. **Report** your findings using the output format below.
88+
89+
## Verifying a fix
90+
91+
The caller will tell you they've applied a fix and built the bundle, and give you
92+
the issue file path.
93+
94+
1. Read the issue file to get the issue details and your previous reproduction
95+
report.
96+
2. Use `node dist/cli.js` (not `qwen`) — this tests the local changes.
97+
3. Re-run the same reproduction steps that previously triggered the bug.
98+
4. Confirm the bug is gone and the basic happy path still works.
99+
5. If you originally reproduced via a test script, run that test again to
100+
confirm it passes.
101+
6. Update the `## Reproduction report` section of the issue file with the
102+
verification result.
103+
104+
## Output format
105+
106+
Always write this structured report into the `## Reproduction report` section of
107+
the issue file (replacing the placeholder), **and** include it in your return
108+
message:
109+
110+
```
111+
## Reproduction Report
112+
113+
**Status**: REPRODUCED | NOT_REPRODUCED | VERIFIED_FIXED | STILL_BROKEN
114+
**Method**: e2e-headless | e2e-interactive | test-script
115+
**Binary**: qwen | node dist/cli.js
116+
**Command**: <exact command or test command used>
117+
118+
### Observed behavior
119+
<what actually happened>
120+
121+
### Expected behavior
122+
<what should have happened>
123+
124+
### Key context
125+
<explain the bug clearly in plain language — what goes wrong, under what conditions,
126+
and what you observed. Do NOT speculate on root cause at the code level; that is
127+
the caller's job. Stick to observable symptoms and behavioral findings.>
128+
```
129+
130+
## Guidelines
131+
132+
- Be thorough in reading code before attempting reproduction. A vague issue
133+
report + deep code understanding = good reproduction.
134+
- If you cannot reproduce after reasonable effort, say so clearly with status
135+
`NOT_REPRODUCED` and explain what you tried. Do not fabricate results.
136+
- If the issue mentions specific config, environment, or versions, match those
137+
conditions as closely as possible.
138+
- You may create temporary test fixtures in `/tmp/` if needed for reproduction.
139+
- Keep shell commands focused and observable. Prefer headless mode when possible
140+
— it produces parseable output.

.qwen/commands/qc/bugfix.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
description: Fix a bug from a GitHub issue, following the reproduce-first workflow
3+
---
4+
5+
# Bugfix
6+
7+
## Input
8+
9+
A GitHub issue URL or number: $ARGUMENTS
10+
11+
## Workflow
12+
13+
### 1. Read the issue and create the issue file
14+
15+
Create `.qwen/issues/` if it doesn't exist, then pipe the issue directly
16+
into a markdown file using `gh`:
17+
18+
```bash
19+
mkdir -p .qwen/issues
20+
gh issue view <number> \
21+
--json number,title,body \
22+
-t '# Issue #{{.number}}: {{.title}}
23+
24+
{{.body}}
25+
26+
---
27+
28+
## Reproduction report
29+
30+
_Pending — to be filled by the test engineer._
31+
32+
## Verification report
33+
34+
_Pending — to be filled by the test engineer._
35+
' > .qwen/issues/issue-<number>.md
36+
```
37+
38+
This file is the single source of truth for the issue. It avoids passing large
39+
text blobs between agents, saving tokens and preventing context loss.
40+
41+
### 2. Reproduce
42+
43+
Spawn the `test-engineer` agent and tell it to read `.qwen/issues/issue-<number>.md`
44+
for the issue details, then assess and reproduce the bug. Do NOT read code or
45+
assess complexity yourself — the test engineer owns that.
46+
47+
The test engineer is a proficient professional at product usage, bug reproduction,
48+
and fix verification. Keep your prompt minimal — point it at the issue file and
49+
state the goal (reproduce or verify). Do not teach it how to do its job, explain
50+
reproduction strategies, or add hints about what to look for. It will figure that
51+
out on its own.
52+
53+
Wait for the test engineer to finish. Then **read `.qwen/issues/issue-<number>.md`**
54+
to get the reproduction report. If the status is `NOT_REPRODUCED`, say so and
55+
stop.
56+
57+
### 3. Locate and fix
58+
59+
Read the relevant code and make the fix. Use the reproduction report in the issue
60+
file for context — it will contain relevant code paths, observed vs expected
61+
behavior, and root cause analysis.
62+
63+
If the bug is complex enough that your first attempt doesn't work, switch to the
64+
`structured-debugging` skill to work through hypotheses systematically.
65+
66+
### 4. Verify the fix
67+
68+
Build your changes (`npm run build && npm run bundle`), then spawn the
69+
`test-engineer` agent again and tell it to read `.qwen/issues/issue-<number>.md`
70+
and _verify_ the fix. It will re-run its reproduction steps using
71+
`node dist/cli.js` (for E2E) or re-run the test script it wrote, then update the
72+
issue file with the verification result.
73+
74+
If the verification status is `STILL_BROKEN`, read the updated issue file for
75+
details on what failed, then go back to step 3 and iterate. Use the
76+
`structured-debugging` skill if you haven't already. Do not proceed to step 5
77+
until verification returns `VERIFIED_FIXED`.
78+
79+
### 5. Tests
80+
81+
Run the unit tests for any packages you modified. If the test engineer wrote a
82+
failing test during reproduction, it already covers the regression — make sure it
83+
passes after your fix. Otherwise, add a test (unit or integration) that covers
84+
the failure scenario from the issue so a future regression gets caught
85+
automatically.

.qwen/skills/e2e-testing/SKILL.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
---
2+
name: e2e-testing
3+
description: Guide for running end-to-end tests of the Qwen Code CLI, including headless mode, MCP server testing, and API traffic inspection. Use this skill whenever you need to verify CLI behavior with real model calls, reproduce user-reported bugs end-to-end, test MCP tool integrations, or inspect raw API request/response payloads. Trigger on mentions of E2E testing, headless testing, MCP tool testing, or reproducing issues.
4+
---
5+
6+
# E2E Testing Guide
7+
8+
How to run the Qwen Code CLI end-to-end — from building the bundle to inspecting
9+
raw API traffic. Use when unit tests aren't enough and you need to verify behavior
10+
through the full pipeline (model API → tool validation → tool execution).
11+
12+
## Which binary to use
13+
14+
- **Reproducing bugs**: use the globally installed `qwen` command — this matches
15+
what the user ran when they filed the issue.
16+
- **Verifying fixes**: build first (`npm run build && npm run bundle`), then run
17+
`node dist/cli.js` — this tests your local changes.
18+
19+
## Headless Mode
20+
21+
Run the CLI non-interactively with JSON output (`<qwen>` = `qwen` or
22+
`node dist/cli.js` per above):
23+
24+
```bash
25+
<qwen> "your prompt here" \
26+
--approval-mode yolo \
27+
--output-format json \
28+
2>/dev/null
29+
```
30+
31+
The JSON output is a stream of objects. Key types:
32+
33+
- `type: "system"` — init: `tools`, `mcp_servers`, `model`, `permission_mode`
34+
- `type: "assistant"` — model output: `content[].type` is `text`, `tool_use`, or `thinking`
35+
- `type: "user"` — tool results: `content[].type` is `tool_result` with `is_error`
36+
- `type: "result"` — final output with `result` text and `usage` stats
37+
38+
Pipe through `jq` to filter the verbose stream, e.g. extract tool-result errors:
39+
`... 2>/dev/null | jq 'select(.type=="user") | .message.content[] | select(.is_error)'`
40+
41+
## Inspecting Raw API Traffic
42+
43+
When debugging model behavior (wrong tool arguments, schema issues), enable API
44+
logging to see the exact request/response payloads:
45+
46+
```bash
47+
<qwen> "prompt" \
48+
--approval-mode yolo \
49+
--output-format json \
50+
--openai-logging \
51+
--openai-logging-dir /tmp/api-logs
52+
```
53+
54+
Each API call produces a JSON file (can be 80KB+ due to full message history).
55+
The bulk is in `request.messages` (conversation history). Trimmed structure:
56+
57+
```json
58+
{
59+
"request": {
60+
"model": "coder-model",
61+
"messages": [
62+
{ "role": "system|user|assistant", "content": "...", "tool_calls?": [...] }
63+
],
64+
"tools": [
65+
{
66+
"type": "function",
67+
"function": {
68+
"name": "tool_name",
69+
"description": "...",
70+
"parameters": { ... } // schema sent to the model
71+
}
72+
}
73+
]
74+
},
75+
"response": {
76+
"choices": [
77+
{
78+
"message": {
79+
"role": "assistant",
80+
"content": "...", // text response (may be null)
81+
"tool_calls": [
82+
{
83+
"id": "call_...",
84+
"function": {
85+
"name": "tool_name",
86+
"arguments": "..." // raw JSON string from the model
87+
}
88+
}
89+
]
90+
}
91+
}
92+
]
93+
}
94+
}
95+
```
96+
97+
## Interactive Mode (tmux)
98+
99+
Use when you need to verify TUI rendering, test keyboard interactions, or see
100+
what the user sees. Headless mode is simpler when you only need structured output.
101+
102+
### Launching
103+
104+
```bash
105+
tmux new-session -d -s test -x 200 -y 50 \
106+
"cd /tmp/test-dir && <qwen> --approval-mode yolo"
107+
sleep 3 # wait for TUI to initialize
108+
```
109+
110+
### Sending prompts
111+
112+
Split text and Enter with a short delay — sending them together can cause the
113+
TUI to swallow the submit:
114+
115+
```bash
116+
tmux send-keys -t test "your prompt here"
117+
sleep 0.5
118+
tmux send-keys -t test Enter
119+
```
120+
121+
### Waiting for completion
122+
123+
Poll for the input prompt to reappear instead of blind sleeping:
124+
125+
```bash
126+
for i in $(seq 1 60); do
127+
sleep 2
128+
tmux capture-pane -t test -p | grep -q "Type your message" && break
129+
done
130+
```
131+
132+
### Capturing output
133+
134+
```bash
135+
tmux capture-pane -t test -p -S -100 # -S -100 = 100 lines of scrollback
136+
```
137+
138+
### Limitations
139+
140+
- **Key combos**: `tmux send-keys` cannot reliably send all key combinations.
141+
`C-?`, `C-Shift-*`, and function keys with modifiers are unsupported or
142+
unreliable. For these, use the `InteractiveSession` harness in
143+
`integration-tests/interactive/` or test manually.
144+
- **Visual artifacts**: `capture-pane` captures the final rendered frame, not
145+
intermediate states. Flicker, tearing, or brief blank frames cannot be
146+
detected this way.
147+
148+
### Cleanup
149+
150+
```bash
151+
tmux kill-session -t test
152+
```
153+
154+
## MCP Server Testing
155+
156+
For testing MCP tool behavior end-to-end, read `references/mcp-testing.md`. It
157+
covers the setup gotchas (config location, git repo requirement) and includes
158+
a reusable zero-dependency test server template in `scripts/mcp-test-server.js`.

0 commit comments

Comments
 (0)