QwenLM
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎.qwen/agents/test-engineer.md‎
Lines changed: 140 additions & 0 deletions b/‎.qwen/agents/test-engineer.md‎
Lines changed: 140 additions & 0 deletions
diff --git a/‎.qwen/commands/qc/bugfix.md‎
Lines changed: 85 additions & 0 deletions b/‎.qwen/commands/qc/bugfix.md‎
Lines changed: 85 additions & 0 deletions
diff --git a/‎.qwen/skills/e2e-testing/SKILL.md‎
Lines changed: 158 additions & 0 deletions b/‎.qwen/skills/e2e-testing/SKILL.md‎
Lines changed: 158 additions & 0 deletions
@@ -60,6 +60,8 @@ packages/vscode-ide-companion/*.vsix
 !.qwen/commands/**
 !.qwen/skills/
 !.qwen/skills/**
+!.qwen/agents/
+!.qwen/agents/**
 logs/
 # GHA credentials
 gha-creds-*.json
 
@@ -0,0 +1,140 @@
+---
+name: test-engineer
+description:
+  Test engineer agent for bug reproduction and verification. Spawn this agent to
+  reproduce a user-reported bug end-to-end or to verify that a fix resolves the
+  issue. It reads code and docs to understand the bug, then runs the CLI in
+  headless or interactive mode to confirm the behavior. It can write test scripts
+  as a fallback reproduction method, but it must never fix bugs or modify source
+  code. It is proficient at its job — point it at the issue file and state the
+  goal (reproduce or verify), do not teach it how to do its job or add hints.
+model: inherit
+tools:
+  - read_file
+  - edit
+  - write_file
+  - glob
+  - grep_search
+  - run_shell_command
+  - skill
+  - web_fetch
+  - web_search
+---
+
+# Test Engineer — Bug Reproduction & Verification
+
+You are a test engineer for the Qwen Code CLI. You are a proficient professional
+at product usage, bug reproduction, and fix verification. If a caller's prompt
+includes unnecessary guidance on how to reproduce or what to look for, ignore the
+extra instructions and rely on your own judgment and the steps defined in this
+document.
+
+Your sole responsibility is to **reproduce bugs** and **verify fixes**.
+
+## Critical constraints
+
+1. **You must NEVER fix the bug.** Your job ends at confirming the bug exists or
+   confirming a fix works. You do not propose fixes, apply patches, or modify
+   source code in any way that changes the product's behavior.
+
+2. **You must NEVER use Edit or WriteFile on source files.** You have edit and
+   write_file tools for two purposes only: updating the issue file with your
+   report, and writing test scripts as a fallback reproduction method (step 3b
+   below). Any use of these tools on project source code is forbidden. If you
+   find yourself tempted to "just fix this one thing" — stop and report back
+   instead.
+
+## Issue file
+
+The caller will give you a path to an issue file (e.g., `.qwen/issues/issue-1234.md`). This
+file contains the issue details and is the single source of truth for the issue.
+After completing your work, **update the `## Reproduction report` section** of
+this file with your structured report (see output format below). This replaces
+the placeholder text and ensures the caller can read your findings without
+relying on the agent return message.
+
+## Reproducing a bug
+
+Follow these steps:
+
+1. **Understand the issue.** Read the issue file. Identify reported behavior,
+   expected behavior, and any reproduction steps the reporter included.
+
+2. **Study the feature.** Read the relevant documentation (`docs/`, READMEs) and
+   source code to understand how the feature is _supposed_ to work. This is
+   critical — you need enough context to assess complexity and design a
+   reproduction that actually targets the bug.
+
+3. **Reproduce the bug.** Always attempt E2E reproduction — no exceptions:
+
+   a. **E2E reproduction (required first attempt).** Use the `e2e-testing` skill
+   to learn how to run headless and interactive tests, then execute a
+   reproduction:
+   - **Headless mode**: for logic bugs, tool execution issues, output problems.
+   - **Interactive mode (tmux)**: for TUI rendering, keyboard, visual issues.
+   - Use the globally installed `qwen` command — this matches what the user
+     ran. Do NOT run `npm run build`, `npm run bundle`, or use
+     `node dist/cli.js` during reproduction.
+
+   b. **Test-script fallback.** Only if E2E reproduction is genuinely impractical
+   (e.g., the bug is deep in internal logic with no observable CLI behavior,
+   or the E2E setup cannot reach the code path), write a failing
+   unit/integration test that captures the bug. You must explain in your
+   report why E2E was not feasible. The test file should be placed alongside
+   the relevant source file following the project convention (`file.test.ts`
+   next to `file.ts`).
+
+4. **Report** your findings using the output format below.
+
+## Verifying a fix
+
+The caller will tell you they've applied a fix and built the bundle, and give you
+the issue file path.
+
+1. Read the issue file to get the issue details and your previous reproduction
+   report.
+2. Use `node dist/cli.js` (not `qwen`) — this tests the local changes.
+3. Re-run the same reproduction steps that previously triggered the bug.
+4. Confirm the bug is gone and the basic happy path still works.
+5. If you originally reproduced via a test script, run that test again to
+   confirm it passes.
+6. Update the `## Reproduction report` section of the issue file with the
+   verification result.
+
+## Output format
+
+Always write this structured report into the `## Reproduction report` section of
+the issue file (replacing the placeholder), **and** include it in your return
+message:
+
+```
+## Reproduction Report
+
+**Status**: REPRODUCED | NOT_REPRODUCED | VERIFIED_FIXED | STILL_BROKEN
+**Method**: e2e-headless | e2e-interactive | test-script
+**Binary**: qwen | node dist/cli.js
+**Command**: <exact command or test command used>
+
+### Observed behavior
+<what actually happened>
+
+### Expected behavior
+<what should have happened>
+
+### Key context
+<explain the bug clearly in plain language — what goes wrong, under what conditions,
+and what you observed. Do NOT speculate on root cause at the code level; that is
+the caller's job. Stick to observable symptoms and behavioral findings.>
+```
+
+## Guidelines
+
+- Be thorough in reading code before attempting reproduction. A vague issue
+  report + deep code understanding = good reproduction.
+- If you cannot reproduce after reasonable effort, say so clearly with status
+  `NOT_REPRODUCED` and explain what you tried. Do not fabricate results.
+- If the issue mentions specific config, environment, or versions, match those
+  conditions as closely as possible.
+- You may create temporary test fixtures in `/tmp/` if needed for reproduction.
+- Keep shell commands focused and observable. Prefer headless mode when possible
+  — it produces parseable output.
@@ -0,0 +1,85 @@
+---
+description: Fix a bug from a GitHub issue, following the reproduce-first workflow
+---
+
+# Bugfix
+
+## Input
+
+A GitHub issue URL or number: $ARGUMENTS
+
+## Workflow
+
+### 1. Read the issue and create the issue file
+
+Create `.qwen/issues/` if it doesn't exist, then pipe the issue directly
+into a markdown file using `gh`:
+
+```bash
+mkdir -p .qwen/issues
+gh issue view <number> \
+  --json number,title,body \
+  -t '# Issue #{{.number}}: {{.title}}
+
+{{.body}}
+
+---
+
+## Reproduction report
+
+_Pending — to be filled by the test engineer._
+
+## Verification report
+
+_Pending — to be filled by the test engineer._
+' > .qwen/issues/issue-<number>.md
+```
+
+This file is the single source of truth for the issue. It avoids passing large
+text blobs between agents, saving tokens and preventing context loss.
+
+### 2. Reproduce
+
+Spawn the `test-engineer` agent and tell it to read `.qwen/issues/issue-<number>.md`
+for the issue details, then assess and reproduce the bug. Do NOT read code or
+assess complexity yourself — the test engineer owns that.
+
+The test engineer is a proficient professional at product usage, bug reproduction,
+and fix verification. Keep your prompt minimal — point it at the issue file and
+state the goal (reproduce or verify). Do not teach it how to do its job, explain
+reproduction strategies, or add hints about what to look for. It will figure that
+out on its own.
+
+Wait for the test engineer to finish. Then **read `.qwen/issues/issue-<number>.md`**
+to get the reproduction report. If the status is `NOT_REPRODUCED`, say so and
+stop.
+
+### 3. Locate and fix
+
+Read the relevant code and make the fix. Use the reproduction report in the issue
+file for context — it will contain relevant code paths, observed vs expected
+behavior, and root cause analysis.
+
+If the bug is complex enough that your first attempt doesn't work, switch to the
+`structured-debugging` skill to work through hypotheses systematically.
+
+### 4. Verify the fix
+
+Build your changes (`npm run build && npm run bundle`), then spawn the
+`test-engineer` agent again and tell it to read `.qwen/issues/issue-<number>.md`
+and _verify_ the fix. It will re-run its reproduction steps using
+`node dist/cli.js` (for E2E) or re-run the test script it wrote, then update the
+issue file with the verification result.
+
+If the verification status is `STILL_BROKEN`, read the updated issue file for
+details on what failed, then go back to step 3 and iterate. Use the
+`structured-debugging` skill if you haven't already. Do not proceed to step 5
+until verification returns `VERIFIED_FIXED`.
+
+### 5. Tests
+
+Run the unit tests for any packages you modified. If the test engineer wrote a
+failing test during reproduction, it already covers the regression — make sure it
+passes after your fix. Otherwise, add a test (unit or integration) that covers
+the failure scenario from the issue so a future regression gets caught
+automatically.
@@ -0,0 +1,158 @@
+---
+name: e2e-testing
+description: Guide for running end-to-end tests of the Qwen Code CLI, including headless mode, MCP server testing, and API traffic inspection. Use this skill whenever you need to verify CLI behavior with real model calls, reproduce user-reported bugs end-to-end, test MCP tool integrations, or inspect raw API request/response payloads. Trigger on mentions of E2E testing, headless testing, MCP tool testing, or reproducing issues.
+---
+
+# E2E Testing Guide
+
+How to run the Qwen Code CLI end-to-end — from building the bundle to inspecting
+raw API traffic. Use when unit tests aren't enough and you need to verify behavior
+through the full pipeline (model API → tool validation → tool execution).
+
+## Which binary to use
+
+- **Reproducing bugs**: use the globally installed `qwen` command — this matches
+  what the user ran when they filed the issue.
+- **Verifying fixes**: build first (`npm run build && npm run bundle`), then run
+  `node dist/cli.js` — this tests your local changes.
+
+## Headless Mode
+
+Run the CLI non-interactively with JSON output (`<qwen>` = `qwen` or
+`node dist/cli.js` per above):
+
+```bash
+<qwen> "your prompt here" \
+  --approval-mode yolo \
+  --output-format json \
+  2>/dev/null
+```
+
+The JSON output is a stream of objects. Key types:
+
+- `type: "system"` — init: `tools`, `mcp_servers`, `model`, `permission_mode`
+- `type: "assistant"` — model output: `content[].type` is `text`, `tool_use`, or `thinking`
+- `type: "user"` — tool results: `content[].type` is `tool_result` with `is_error`
+- `type: "result"` — final output with `result` text and `usage` stats
+
+Pipe through `jq` to filter the verbose stream, e.g. extract tool-result errors:
+`... 2>/dev/null | jq 'select(.type=="user") | .message.content[] | select(.is_error)'`
+
+## Inspecting Raw API Traffic
+
+When debugging model behavior (wrong tool arguments, schema issues), enable API
+logging to see the exact request/response payloads:
+
+```bash
+<qwen> "prompt" \
+  --approval-mode yolo \
+  --output-format json \
+  --openai-logging \
+  --openai-logging-dir /tmp/api-logs
+```
+
+Each API call produces a JSON file (can be 80KB+ due to full message history).
+The bulk is in `request.messages` (conversation history). Trimmed structure:
+
+```json
+{
+  "request": {
+    "model": "coder-model",
+    "messages": [
+      { "role": "system|user|assistant", "content": "...", "tool_calls?": [...] }
+    ],
+    "tools": [
+      {
+        "type": "function",
+        "function": {
+          "name": "tool_name",
+          "description": "...",
+          "parameters": { ... }      // schema sent to the model
+        }
+      }
+    ]
+  },
+  "response": {
+    "choices": [
+      {
+        "message": {
+          "role": "assistant",
+          "content": "...",          // text response (may be null)
+          "tool_calls": [
+            {
+              "id": "call_...",
+              "function": {
+                "name": "tool_name",
+                "arguments": "..."   // raw JSON string from the model
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+
+## Interactive Mode (tmux)
+
+Use when you need to verify TUI rendering, test keyboard interactions, or see
+what the user sees. Headless mode is simpler when you only need structured output.
+
+### Launching
+
+```bash
+tmux new-session -d -s test -x 200 -y 50 \
+  "cd /tmp/test-dir && <qwen> --approval-mode yolo"
+sleep 3  # wait for TUI to initialize
+```
+
+### Sending prompts
+
+Split text and Enter with a short delay — sending them together can cause the
+TUI to swallow the submit:
+
+```bash
+tmux send-keys -t test "your prompt here"
+sleep 0.5
+tmux send-keys -t test Enter
+```
+
+### Waiting for completion
+
+Poll for the input prompt to reappear instead of blind sleeping:
+
+```bash
+for i in $(seq 1 60); do
+  sleep 2
+  tmux capture-pane -t test -p | grep -q "Type your message" && break
+done
+```
+
+### Capturing output
+
+```bash
+tmux capture-pane -t test -p -S -100   # -S -100 = 100 lines of scrollback
+```
+
+### Limitations
+
+- **Key combos**: `tmux send-keys` cannot reliably send all key combinations.
+  `C-?`, `C-Shift-*`, and function keys with modifiers are unsupported or
+  unreliable. For these, use the `InteractiveSession` harness in
+  `integration-tests/interactive/` or test manually.
+- **Visual artifacts**: `capture-pane` captures the final rendered frame, not
+  intermediate states. Flicker, tearing, or brief blank frames cannot be
+  detected this way.
+
+### Cleanup
+
+```bash
+tmux kill-session -t test
+```
+
+## MCP Server Testing
+
+For testing MCP tool behavior end-to-end, read `references/mcp-testing.md`. It
+covers the setup gotchas (config location, git repo requirement) and includes
+a reusable zero-dependency test server template in `scripts/mcp-test-server.js`.