Skip to content

Commit a0b6011

Browse files
committed
[Gemma4] Updated chat template, reasoning property
This documents the updated chat template to use with Gemma 4 models for reasoning and/or tool calling that was merged in vllm-project/vllm#39027 . It also adds instructions for how to enable thinking by default, if a user prefers to always think. And, it replaces the deprecated `reasoning_content` field with the updated `reasoning` field. Signed-off-by: Ben Browning <bbrownin@redhat.com>
1 parent 2734539 commit a0b6011

File tree

1 file changed

+20
-8
lines changed

1 file changed

+20
-8
lines changed

Google/Gemma4.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,10 @@ print(outputs[0].outputs[0].text)
524524

525525
## Thinking / Reasoning Mode
526526

527-
Gemma 4 supports structured thinking, where the model can reason step-by-step before producing a final answer. The reasoning process is exposed via the `reasoning_content` field in the API response.
527+
Gemma 4 supports structured thinking, where the model can reason step-by-step before producing a final answer. The reasoning process is exposed via the `reasoning` field in the API response.
528+
529+
> ℹ️ **Note**
530+
> The example chat template file is included in the official container and can also be downloaded from the [vLLM repository](https://github.com/vllm-project/vllm/blob/main/examples/tool_chat_template_gemma4.jinja).
528531
529532
### Launch Server with Thinking Support
530533

@@ -533,9 +536,12 @@ vllm serve google/gemma-4-31B-it \
533536
--max-model-len 16384 \
534537
--enable-auto-tool-choice \
535538
--reasoning-parser gemma4 \
536-
--tool-call-parser gemma4
539+
--tool-call-parser gemma4 \
540+
--chat-template examples/tool_chat_template_gemma4.jinja
537541
```
538542

543+
If you want to default to thinking enabled for all requests, add the argument `--default-chat-template-kwargs '{"enable_thinking": true}'` to the above command.
544+
539545
### Thinking Mode (OpenAI SDK)
540546

541547
```python
@@ -559,10 +565,10 @@ response = client.chat.completions.create(
559565

560566
message = response.choices[0].message
561567

562-
# The thinking process is in reasoning_content
563-
if hasattr(message, "reasoning_content") and message.reasoning_content:
568+
# The thinking process is in reasoning
569+
if hasattr(message, "reasoning") and message.reasoning:
564570
print("=== Thinking ===")
565-
print(message.reasoning_content)
571+
print(message.reasoning)
566572

567573
print("\n=== Answer ===")
568574
print(message.content)
@@ -591,14 +597,18 @@ curl http://localhost:8000/v1/chat/completions \
591597

592598
Gemma 4 supports function calling with a dedicated tool-call protocol using custom special tokens (`<|tool_call|>`, `<tool_call|>`, etc.).
593599

600+
> ℹ️ **Note**
601+
> The example chat template file is included in the official container and can also be downloaded from the [vLLM repository](https://github.com/vllm-project/vllm/blob/main/examples/tool_chat_template_gemma4.jinja).
602+
594603
### Launch Server with Tool Calling
595604

596605
```bash
597606
vllm serve google/gemma-4-31B-it \
598607
--max-model-len 8192 \
599608
--enable-auto-tool-choice \
600609
--tool-call-parser gemma4 \
601-
--reasoning-parser gemma4
610+
--reasoning-parser gemma4 \
611+
--chat-template examples/tool_chat_template_gemma4.jinja
602612
```
603613

604614
### Tool Calling (OpenAI SDK)
@@ -878,9 +888,9 @@ response = client.chat.completions.create(
878888

879889
message = response.choices[0].message
880890

881-
if hasattr(message, "reasoning_content") and message.reasoning_content:
891+
if hasattr(message, "reasoning") and message.reasoning:
882892
print("=== Thinking ===")
883-
print(message.reasoning_content)
893+
print(message.reasoning)
884894

885895
print("\n=== Structured Output ===")
886896
print(message.content)
@@ -1025,6 +1035,7 @@ Key metrics:
10251035
| `--reasoning-parser gemma4` | Enable Gemma 4 thinking/reasoning parser | Required for thinking mode |
10261036
| `--tool-call-parser gemma4` | Enable Gemma 4 tool call parser | Required for function calling |
10271037
| `--enable-auto-tool-choice` | Auto-detect tool calls in output | Required for function calling |
1038+
| `--chat-template examples/tool_chat_template_gemma4.jinja` | Override the model's default chat template to one optimized for reasoning and tool calling with vLLM |
10281039
| `--mm-processor-kwargs '{"max_soft_tokens": N}'` | Set default vision token budget | 280 (default), up to 1120 |
10291040
| `--async-scheduling` | Overlap scheduling with decoding | Recommended for throughput |
10301041
| `--gpu-memory-utilization 0.90` | GPU memory fraction for model + KV cache | 0.85-0.95 |
@@ -1042,6 +1053,7 @@ vllm serve google/gemma-4-31B-it \
10421053
--enable-auto-tool-choice \
10431054
--reasoning-parser gemma4 \
10441055
--tool-call-parser gemma4 \
1056+
--chat-template examples/tool_chat_template_gemma4.jinja \
10451057
--limit-mm-per-prompt image=4,audio=1 \
10461058
--async-scheduling \
10471059
--host 0.0.0.0 \

0 commit comments

Comments
 (0)