[Gemma4] Fix chat template and stop tokens for OpenAI tool calling compatibility#45257
Draft
lucianommartins wants to merge 2 commits intohuggingface:mainfrom
Draft
[Gemma4] Fix chat template and stop tokens for OpenAI tool calling compatibility#45257lucianommartins wants to merge 2 commits intohuggingface:mainfrom
lucianommartins wants to merge 2 commits intohuggingface:mainfrom
Conversation
…atibility - Chat Template: Added handler for OpenAI-standard 'role: "tool"' messages to render inline as <|tool_response> without initiating a new <|turn> block. - Chat Template: Extended turn-close condition to inhibit <turn|> emission when model has pending 'tool_calls' without corresponding responses, preserving the continuous turn structure. - Generation Config: Updated 'eos_token_id' derivation in convert_gemma4_weights.py to prioritize the terminal '<tool_call|>' token over the starting '<|tool_response>' token, resolving post-call generation hallucinations in HuggingFace inference. Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Chat template patcher (_patch_template_for_openai_tool_role):
- Inject format_tool_response_block macro after strip_thinking to DRY
up tool-response rendering (used by both legacy and OpenAI paths)
- Replace the entire message loop instead of two point patches:
* Skip role:'tool' messages in outer loop; render them proactively
via forward-scan from the preceding assistant message
* Suppress duplicate <|turn>model on consecutive assistant messages
separated only by tool messages (multi-round tool-call loops)
* Resolve tool_call_id back to function name from originating
tool_calls array (prevents response:unknown fallback)
* Handle tool response content as both plain strings and OpenAI
content-parts arrays ([{type:'text', text:'...'}])
* Render reasoning/reasoning_content fields as <|channel>thought
blocks (supports both vLLM and older inference server variants)
- Preserve legacy tool_responses on assistant messages (Gemma native)
- Pre-scan loop_messages for last_user_idx to guard reasoning injection
Stop tokens (eos_token_id):
- Remove <tool_call|> (etc_token) from the stop token list
- Keeps only <eos> + <turn|> (eot_token)
- Enables parallel tool calls without premature truncation after the
first <tool_call|>; <turn|> still terminates the model turn correctly
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma4 |
Member
|
cc @Rocketknight1 for chat templates and tool calling, but this seems to be only the conversion. Prob the latest changes you made before release, didn't make it to conversion script 😅 |
This was referenced Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Gemma4] Fix chat template and stop tokens for OpenAI tool calling compatibility
What does this PR do?
Rewrites the
_patch_template_for_openai_tool_role()function inconvert_gemma4_weights.pyto fully support OpenAI Chat Completions tool-calling semantics for Gemma4 (E4B and 31B).Chat template patcher
role: "tool"messages are skipped in the outer loop and rendered proactively as<|tool_response>blocks from the preceding assistant turn that issued thetool_calls<|turn>modelwhen consecutive assistant messages are separated only by tool messages (multi-round tool-call loops)tool_call_idresolution: Matches tool results back to the originatingtool_callsarray by ID to resolve function names correctly (preventsresponse:unknown)contentas both plain strings and OpenAI content-parts arrays ([{type: "text", text: "..."}])format_tool_response_blockmacro: Injects a reusable macro to centralize tool response rendering (used by both legacy Gemma nativetool_responsesand OpenAI-stylerole: "tool"paths)reasoning/reasoning_contentsupport: Renders thinking fields as<|channel>thoughtblocks (compatible with vLLM, DeepSeek, and o1-style inference servers)tool_responseson assistant messages (Google/Gemma format)Stop tokens (
eos_token_id)<tool_call|>(etc_token) from the stop token list<eos>+<turn|>(eot_token)<tool_call|>;<turn|>still terminates the model turn correctlyTesting
Validated with 17 functional test scenarios across both E4B and 31B templates:
<|turn>modelemitted)tool_responses,tool_call_idresolution, content-parts arraysreasoning/reasoning_contentfield renderingadd_generation_promptcorrectness, Jinja2 syntax validationBefore submitting
Who can review?
Models:
Library: