Jinja2 chat templates stored here serve two purposes:
- Identity comparison: detecting which model is being used (by comparing
tokenizer.chat_templateagainst known templates) to add the appropriate response schema (add_response_schema) or swap in a training template (get_training_chat_template). - Training patches: modified templates that fix training-specific issues (prefix-preservation for GRPO,
{% generation %}markers for SFT assistant-only loss).
Why prefix-preserving? The GRPO tool call loop extracts tool response formatting tokens by comparing tokenizations with and without tool messages appended (_get_tool_suffix_ids). This requires the chat template to be prefix-preserving: appending messages must not change how earlier messages are rendered.
Why generation-tagged? SFT with assistant_only_loss=True requires the chat template to include {% generation %} / {% endgeneration %} markers around assistant output, so return_assistant_tokens_mask=True can produce correct masks. Most model templates don't include these markers natively.
Used for identity comparison only.
Original Qwen3 chat template.
Original Qwen3.5 chat templates.
Patched templates that fix training-specific issues. Swapped in at init when tools are enabled (GRPO) or when assistant_only_loss=True (SFT).
Patched Qwen3 template. Diff vs qwen3.jinja:
Require both <think> and </think> to be present before parsing, to avoid incorrect splitting when the model generates only one tag:
- {%- if '</think>' in content %}
+ {%- if '<think>' in content and '</think>' in content %}Always include the thinking block regardless of message position. The original conditionally omits it based on loop.last, which changes the assistant rendering when a tool message is appended — breaking prefix-preservation:
- {%- if loop.index0 > ns.last_query_index %}
- {%- if loop.last or (not loop.last and reasoning_content) %}
- {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
- {%- else %}
- {{- '<|im_start|>' + message.role + '\n' + content }}
- {%- endif %}
- {%- else %}
- {{- '<|im_start|>' + message.role + '\n' + content }}
- {%- endif %}
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}Wrap assistant message output with {% generation %} / {% endgeneration %} so that return_assistant_tokens_mask=True produces correct masks for SFT assistant-only loss.