QwenLM · wenshao · Apr 7, 2026 · Apr 7, 2026 · Apr 7, 2026
diff --git a/docs/design/fork-subagent/fork-subagent-design.md b/docs/design/fork-subagent/fork-subagent-design.md
@@ -0,0 +1,112 @@
+# Fork Subagent Design
+
+> Implicit fork subagent that inherits the parent's full conversation context and shares prompt cache for cost-efficient parallel task execution.
+
+## Overview
+
+When the Agent tool is called without `subagent_type`, it triggers an implicit **fork** — a background subagent that inherits the parent's conversation history, system prompt, and tool definitions. The fork uses `CacheSafeParams` to ensure its API requests share the same prefix as the parent's, enabling DashScope prompt cache hits.
+
+## Architecture
+
+```
+Parent conversation: [SystemPrompt | Tools | Msg1 | Msg2 | ... | MsgN (model)]
+                              ↑ identical prefix for all forks ↑
+
+Fork A: [...MsgN | placeholder results | "Research A"]  ← shared cache
+Fork B: [...MsgN | placeholder results | "Modify B"]    ← shared cache
+Fork C: [...MsgN | placeholder results | "Test C"]      ← shared cache
+```
+
+## Key Components
+
+### 1. FORK_AGENT (`forkSubagent.ts`)
+
+Synthetic agent config, not registered in `builtInAgents`. Has a fallback `systemPrompt` but in practice uses the parent's rendered system prompt via `generationConfigOverride`.
+
+### 2. CacheSafeParams Integration (`agent.ts` + `forkedQuery.ts`)
+
+```
+agent.ts (fork path)
+  │
+  ├── getCacheSafeParams()          ← parent's generationConfig snapshot
+  │     ├── generationConfig        ← systemInstruction + tools + temp/topP
+  │     └── history                 ← (not used — we build extraHistory instead)
+  │
+  ├── forkGenerationConfig          ← passed as generationConfigOverride
+  └── forkToolsOverride             ← FunctionDeclaration[] extracted from tools
+        │
+        ▼
+  AgentHeadless.execute(context, signal, {
+    extraHistory,                   ← parent conversation history
+    generationConfigOverride,       ← parent's exact systemInstruction + tools
+    toolsOverride,                  ← parent's exact tool declarations
+  })
+        │
+        ▼
+  AgentCore.createChat(context, {
+    extraHistory,
+    generationConfigOverride,       ← bypasses buildChatSystemPrompt()
+  })                                   AND skips getInitialChatHistory()
+        │                              (extraHistory already has env context)
+        ▼
+  new GeminiChat(config, generationConfig, startHistory)
+                          ↑ byte-identical to parent's config
+```
+
+### 3. History Construction (`agent.ts` + `forkSubagent.ts`)
+
+The fork's `extraHistory` must end with a model message to maintain Gemini API's user/model alternation when `agent-headless` sends the `task_prompt`.
+
+Three cases:
+
+| Parent history ends with      | extraHistory construction                                              | task_prompt                    |
+| ----------------------------- | ---------------------------------------------------------------------- | ------------------------------ |
+| `model` (no function calls)   | `[...rawHistory]` (unchanged)                                          | `buildChildMessage(directive)` |
+| `model` (with function calls) | `[...rawHistory, model(clone), user(responses+directive), model(ack)]` | `'Begin.'`                     |
+| `user` (unusual)              | `rawHistory.slice(0, -1)` (drop trailing user)                         | `buildChildMessage(directive)` |
+
+### 4. Recursive Fork Prevention (`forkSubagent.ts`)
+
+`isInForkChild()` scans conversation history for the `<fork-boilerplate>` tag. If found, the fork attempt is rejected with an error message.
+
+### 5. Background Execution (`agent.ts`)
+
+Fork uses `void executeSubagent()` (fire-and-forget) and returns `FORK_PLACEHOLDER_RESULT` immediately to the parent. Errors in the background task are caught, logged, and reflected in the display state.
+
+## Data Flow
+
+```
+1. Model calls Agent tool (no subagent_type)
+2. agent.ts: import forkSubagent.js
+3. agent.ts: getCacheSafeParams() → forkGenerationConfig + forkToolsOverride
+4. agent.ts: build extraHistory from parent's getHistory(true)
+5. agent.ts: build forkTaskPrompt (directive or 'Begin.')
+6. agent.ts: createAgentHeadless(FORK_AGENT, ...)
+7. agent.ts: void executeSubagent() — background
+8. agent.ts: return FORK_PLACEHOLDER_RESULT to parent immediately
+9. Background:
+   a. AgentHeadless.execute(context, signal, {extraHistory, generationConfigOverride, toolsOverride})
+   b. AgentCore.createChat() — uses parent's generationConfig (cache-shared)
+   c. runReasoningLoop() — uses parent's tool declarations
+   d. Fork executes tools, produces result
+   e. updateDisplay() with final status
+```
+
+## Graceful Degradation
+
+If `getCacheSafeParams()` returns null (first turn, no history yet), the fork falls back to:
+
+- `FORK_AGENT.systemPrompt` for system instruction
+- `prepareTools()` for tool declarations
+
+This ensures the fork always works, even without cache sharing.
+
+## Files
+
+| File                                                 | Role                                                                                  |
+| ---------------------------------------------------- | ------------------------------------------------------------------------------------- |
+| `packages/core/src/agents/runtime/forkSubagent.ts`   | FORK_AGENT config, buildForkedMessages(), isInForkChild(), buildChildMessage()        |
+| `packages/core/src/tools/agent.ts`                   | Fork path: CacheSafeParams retrieval, extraHistory construction, background execution |
+| `packages/core/src/agents/runtime/agent-headless.ts` | execute() options: generationConfigOverride, toolsOverride                            |
+| `packages/core/src/agents/runtime/agent-core.ts`     | CreateChatOptions.generationConfigOverride                                            |
+| `packages/core/src/followup/forkedQuery.ts`          | CacheSafeParams infrastructure (existing, no changes)                                 |
diff --git a/docs/users/features/sub-agents.md b/docs/users/features/sub-agents.md
@@ -12,18 +12,54 @@ Subagents are independent AI assistants that:
 - **Work autonomously** - Once given a task, they work independently until completion or failure
 - **Provide detailed feedback** - You can see their progress, tool usage, and execution statistics in real-time
 
+## Fork Subagent (Implicit Fork)
+
+In addition to named subagents, Qwen Code supports **implicit forking** — when the AI omits the `subagent_type` parameter, it triggers a fork that inherits the parent's full conversation context.
+
+### How Fork Differs from Named Subagents
+
+|               | Named Subagent                    | Fork Subagent                                         |
+| ------------- | --------------------------------- | ----------------------------------------------------- |
+| Context       | Starts fresh, no parent history   | Inherits parent's full conversation history           |
+| System prompt | Uses its own configured prompt    | Uses parent's exact system prompt (for cache sharing) |
+| Execution     | Blocks the parent until done      | Runs in background, parent continues immediately      |
+| Use case      | Specialized tasks (testing, docs) | Parallel tasks that need the current context          |
+
+### When Fork is Used
+
+The AI automatically uses fork when it needs to:
+
+- Run multiple research tasks in parallel (e.g., "investigate module A, B, and C")
+- Perform background work while continuing the main conversation
+- Delegate tasks that require understanding of the current conversation context
+
+### Prompt Cache Sharing
+
+All forks share the parent's exact API request prefix (system prompt, tools, conversation history), enabling DashScope prompt cache hits. When 3 forks run in parallel, the shared prefix is cached once and reused — saving 80%+ token costs compared to independent subagents.
+
+### Recursive Fork Prevention
+
+Fork children cannot create further forks. This is enforced at runtime — if a fork attempts to spawn another fork, it receives an error instructing it to execute tasks directly.
+
+### Current Limitations
+
+- **No result feedback**: Fork results are reflected in the UI progress display but are not automatically fed back into the main conversation. The parent AI sees a placeholder message and cannot act on the fork's output.
+- **No worktree isolation**: Forks share the parent's working directory. Concurrent file modifications from multiple forks may conflict.
+
 ## Key Benefits
 
 - **Task Specialization**: Create agents optimized for specific workflows (testing, documentation, refactoring, etc.)
 - **Context Isolation**: Keep specialized work separate from your main conversation
+- **Context Inheritance**: Fork subagents inherit the full conversation for context-heavy parallel tasks
+- **Prompt Cache Sharing**: Fork subagents share the parent's cache prefix, reducing token costs
 - **Reusability**: Save and reuse agent configurations across projects and sessions
 - **Controlled Access**: Limit which tools each agent can use for security and focus
 - **Progress Visibility**: Monitor agent execution with real-time progress updates
 
 ## How Subagents Work
 
 1. **Configuration**: You create Subagents configurations that define their behavior, tools, and system prompts
-2. **Delegation**: The main AI can automatically delegate tasks to appropriate Subagents
+2. **Delegation**: The main AI can automatically delegate tasks to appropriate Subagents — or implicitly fork when no specific subagent type is needed
 3. **Execution**: Subagents work independently, using their configured tools to complete tasks
 4. **Results**: They return results and execution summaries back to the main conversation
 

diff --git a/packages/core/src/agents/runtime/agent-core.ts b/packages/core/src/agents/runtime/agent-core.ts
@@ -101,6 +101,15 @@ export interface CreateChatOptions {
    * conversational context (e.g., from the main session that spawned it).
    */
   extraHistory?: Content[];
+  /**
+   * When provided, replaces the auto-built generationConfig
+   * (systemInstruction, temperature, etc.) with this exact config.
+   * Used by fork subagents to share the parent conversation's cache
+   * prefix for DashScope prompt caching.
+   */
+  generationConfigOverride?: GenerateContentConfig & {
+    systemInstruction?: string | Content;
+  };
 }
 
 /**
@@ -222,30 +231,43 @@ export class AgentCore {
       );
     }
 
-    const envHistory = await getInitialChatHistory(this.runtimeContext);
+    // When generationConfigOverride is provided (fork path), the extraHistory
+    // already contains the parent's env context. Skip getInitialChatHistory
+    // to avoid duplicating the env messages and breaking cache prefix match.
+    const envHistory = options?.generationConfigOverride
+      ? []
+      : await getInitialChatHistory(this.runtimeContext);
 
     const startHistory = [
       ...envHistory,
       ...(options?.extraHistory ?? []),
       ...(this.promptConfig.initialMessages ?? []),
     ];
 
-    const systemInstruction = this.promptConfig.systemPrompt
-      ? this.buildChatSystemPrompt(context, options)
-      : undefined;
+    // If an override is provided (fork path), use it directly for cache
+    // sharing. Otherwise, build the config from this agent's promptConfig.
+    // Note: buildChatSystemPrompt is called OUTSIDE the try/catch so template
+    // errors propagate to the caller (not swallowed by reportError).
+    let generationConfig: GenerateContentConfig & {
+      systemInstruction?: string | Content;
+    };
 
-    try {
-      const generationConfig: GenerateContentConfig & {
-        systemInstruction?: string | Content;
-      } = {
+    if (options?.generationConfigOverride) {
+      generationConfig = options.generationConfigOverride;
+    } else {
+      const systemInstruction = this.promptConfig.systemPrompt
+        ? this.buildChatSystemPrompt(context, options)
+        : undefined;
+      generationConfig = {
         temperature: this.modelConfig.temp,
         topP: this.modelConfig.top_p,
       };
-
       if (systemInstruction) {
         generationConfig.systemInstruction = systemInstruction;
       }
+    }
 
+    try {
       return new GeminiChat(
         this.runtimeContext,
         generationConfig,

diff --git a/packages/core/src/agents/runtime/agent-headless.ts b/packages/core/src/agents/runtime/agent-headless.ts
@@ -192,8 +192,18 @@ export class AgentHeadless {
   async execute(
     context: ContextState,
     externalSignal?: AbortSignal,
+    options?: {
+      extraHistory?: Array<import('@google/genai').Content>;
+      /** Override generationConfig for cache sharing (fork subagent). */
+      generationConfigOverride?: import('@google/genai').GenerateContentConfig;
+      /** Override tool declarations for cache sharing (fork subagent). */
+      toolsOverride?: Array<import('@google/genai').FunctionDeclaration>;
+    },
   ): Promise<void> {
-    const chat = await this.core.createChat(context);
+    const chat = await this.core.createChat(context, {
+      extraHistory: options?.extraHistory,
+      generationConfigOverride: options?.generationConfigOverride,
+    });
 
     if (!chat) {
       this.terminateMode = AgentTerminateMode.ERROR;
@@ -212,7 +222,7 @@ export class AgentHeadless {
       abortController.abort();
     }
 
-    const toolsList = this.core.prepareTools();
+    const toolsList = options?.toolsOverride ?? this.core.prepareTools();
 
     const initialTaskText = String(
       (context.get('task_prompt') as string) ?? 'Get Started!',

diff --git a/packages/core/src/agents/runtime/forkSubagent.ts b/packages/core/src/agents/runtime/forkSubagent.ts
@@ -0,0 +1,116 @@
+import type { Content } from '@google/genai';
+
+export const FORK_SUBAGENT_TYPE = 'fork';
+
+export const FORK_BOILERPLATE_TAG = 'fork-boilerplate';
+export const FORK_DIRECTIVE_PREFIX = 'Directive: ';
+
+export const FORK_AGENT = {
+  name: FORK_SUBAGENT_TYPE,
+  description:
+    'Implicit fork — inherits full conversation context. Not selectable via subagent_type; triggered by omitting subagent_type.',
+  tools: ['*'],
+  systemPrompt:
+    'You are a forked worker process. Follow the directive in the conversation history. Execute tasks directly using available tools. Do not spawn sub-agents.',
+  level: 'session' as const,
+};
+
+export function isInForkChild(messages: Content[]): boolean {
+  return messages.some((m) => {
+    if (m.role !== 'user') return false;
+    return m.parts?.some(
+      (part) => part.text && part.text.includes(`<${FORK_BOILERPLATE_TAG}>`),
+    );
+  });
+}
+
+export const FORK_PLACEHOLDER_RESULT =
+  'Fork started — processing in background';
+
+/**
+ * Build extra history messages for a forked subagent.
+ *
+ * When the last model message has function calls, we must include matching
+ * function responses in a user message (Gemini API requirement). The
+ * directive is embedded in this same user message to avoid consecutive
+ * user messages.
+ *
+ * When there are no function calls, we return [] — the parent history
+ * already ends with a model text message and the directive will be sent
+ * as the task_prompt by agent-headless (model → user alternation is OK).
+ *
+ * @param directive - The fork directive text (user's prompt)
+ * @param assistantMessage - The last model message from the parent history
+ * @returns Extra messages to append to history (may be empty)
+ */
+export function buildForkedMessages(
+  directive: string,
+  assistantMessage: Content,
+): Content[] {
+  const toolUseParts =
+    assistantMessage.parts?.filter((part) => part.functionCall) || [];
+
+  if (toolUseParts.length === 0) {
+    // No function calls — no extra messages needed.
+    // The parent history already ends with this model message.
+    return [];
+  }
+
+  // Clone the assistant message to avoid mutating the original
+  const fullAssistantMessage: Content = {
+    role: assistantMessage.role,
+    parts: [...(assistantMessage.parts || [])],
+  };
+
+  // Build tool_result blocks for every tool_use, all with identical placeholder text.
+  // Include the directive text in the same user message to maintain
+  // proper user/model alternation.
+  const toolResultParts = toolUseParts.map((part) => ({
+    functionResponse: {
+      id: part.functionCall!.id,
+      name: part.functionCall!.name,
+      response: { output: FORK_PLACEHOLDER_RESULT },
+    },
+  }));
+
+  const toolResultMessage: Content = {
+    role: 'user',
+    parts: [
+      ...toolResultParts,
+      {
+        text: buildChildMessage(directive),
+      },
+    ],
+  };
+
+  return [fullAssistantMessage, toolResultMessage];
+}
+
+export function buildChildMessage(directive: string): string {
+  return `<${FORK_BOILERPLATE_TAG}>
+STOP. READ THIS FIRST.
+
+You are a forked worker process. You are NOT the main agent.
+
+RULES (non-negotiable):
+1. You ARE the fork. Do NOT spawn sub-agents; execute directly.
+2. Do NOT converse, ask questions, or suggest next steps
+3. Do NOT editorialize or add meta-commentary
+4. USE your tools directly: Bash, Read, Write, etc.
+5. If you modify files, commit your changes before reporting. Include the commit hash in your report.
+6. Do NOT emit text between tool calls. Use tools silently, then report once at the end.
+7. Stay strictly within your directive's scope. If you discover related systems outside your scope, mention them in one sentence at most — other workers cover those areas.
+8. Keep your report under 500 words unless the directive specifies otherwise. Be factual and concise.
+9. Your response MUST begin with "Scope:". No preamble, no thinking-out-loud.
+10. REPORT structured facts, then stop
+
+Output format (plain text labels, not markdown headers):
+  Scope: <echo back your assigned scope in one sentence>
+  Result: <the answer or key findings, limited to the scope above>
+  Key files: <relevant file paths — include for research tasks>
+  Files changed: <list with commit hash — include only if you modified files>
+  Issues: <list — include only if there are issues to flag>
+</${FORK_BOILERPLATE_TAG}>
+
+${FORK_DIRECTIVE_PREFIX}${directive}`;
+}
diff --git a/packages/core/src/tools/agent.test.ts b/packages/core/src/tools/agent.test.ts
@@ -398,6 +398,7 @@ describe('AgentTool', () => {
       expect(mockAgent.execute).toHaveBeenCalledWith(
         mockContextState,
         undefined, // signal parameter (undefined when not provided)
+        { extraHistory: undefined }, // extraHistory
       );
 
       const llmText = partToString(result.llmContent);