Skip to content

[Bugfix] Fix gemma4_utils._parse_tool_arguments truncating strings with internal quotes#39070

Open
Rih0z wants to merge 2 commits intovllm-project:mainfrom
Rih0z:fix/gemma4-utils-parse-tool-arguments
Open

[Bugfix] Fix gemma4_utils._parse_tool_arguments truncating strings with internal quotes#39070
Rih0z wants to merge 2 commits intovllm-project:mainfrom
Rih0z:fix/gemma4-utils-parse-tool-arguments

Conversation

@Rih0z
Copy link
Copy Markdown

@Rih0z Rih0z commented Apr 6, 2026

Purpose

Fix vllm/tool_parsers/gemma4_utils._parse_tool_arguments() silently truncating string values that contain " (double quotes). This breaks offline inference tool call parsing for any tool that passes structured text (HTML, code, JSON) as parameters.

This is not duplicating an existing PR. Checked: #38992 (streaming fix only), #39027 (chat template/reasoning, does not touch gemma4_utils.py), #38909 (streaming HTML duplication). No existing PR modifies _parse_tool_arguments.

Fixes #39069

The Bug

_parse_tool_arguments() replaces <|"|> delimiters with ", then uses regex [^"]* which stops at the first internal quote:

# Before: content:<|"|>She said "hello" loudly<|"|>
# After replace: content:"She said "hello" loudly"
# Regex [^"]* captures: "She said " → TRUNCATED

Measured output: {'content': 'She said '} instead of {'content': 'She said "hello" loudly'}

The Fix

The API server parser (gemma4_tool_parser.py) already has a correct implementation (_parse_gemma4_args) that handles <|"|> delimiters natively without replacement. This PR:

  1. Moves _parse_gemma4_args, _parse_gemma4_array, _parse_gemma4_value from gemma4_tool_parser.py into gemma4_utils.py as public functions (parse_gemma4_args, etc.)
  2. Updates gemma4_tool_parser.py to import from gemma4_utils.py (no behavior change for API server path)
  3. Replaces the broken _parse_tool_arguments with a wrapper around parse_gemma4_args (backward compatible — still returns dict[str, str])
  4. Adds tests/tool_parsers/test_gemma4_utils.py with regression tests for internal quotes, HTML attributes, braces, and code content
  5. Adds test cases to existing test_gemma4_tool_parser.py for the same edge cases

Test commands run and results

# Direct function verification (vLLM not installed — pure Python test):
python -c "... _parse_tool_arguments('content:<|"|>She said \"hello\" loudly<|"|>') ..."
# Result: {'content': 'She said "hello" loudly'} ✅ (was: {'content': 'She said '})

python -c "... _parse_tool_arguments('path:<|"|>out.html<|"|>,content:<|"|><div class=\"main\">hello</div><|"|>') ..."  
# Result: {'path': 'out.html', 'content': '<div class="main">hello</div>'} ✅ (was: {'content': '<div class='})

All 8 manual tests passed (internal quotes, HTML attributes, braces, code, simple string, multi-value, empty, bare numeric).

AI Assistance Disclosure

AI assistance (Claude) was used for code analysis, identifying the root cause, writing tests, and drafting this PR. All changes were reviewed and verified by a human.

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify mergify bot added tool-calling bug Something isn't working labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Gemma4 tool argument parsing by moving core logic to a shared utility module and enhancing the handling of special characters within delimited strings. It also adds comprehensive unit and regression tests. A bug was identified in the nested array parsing logic where the failure to skip string delimiters could lead to incorrect depth calculation when strings contain brackets.

Comment on lines +238 to +243
while i < n and depth > 0:
if arr_str[i] == "[":
depth += 1
elif arr_str[i] == "]":
depth -= 1
i += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The nested array parsing logic is missing the string delimiter skipping logic. If a nested array contains a string with [ or ] characters (e.g., items: [ [<|"|>foo]<|"|> ]), the parser will incorrectly increment or decrement the depth counter and terminate early, leading to truncated or malformed results. This should be consistent with the logic used for nested objects and top-level arrays.

Suggested change
while i < n and depth > 0:
if arr_str[i] == "[":
depth += 1
elif arr_str[i] == "]":
depth -= 1
i += 1
while i < n and depth > 0:
if arr_str[i:].startswith(_ESCAPE_TOKEN):
i += len(_ESCAPE_TOKEN)
nd = arr_str.find(_ESCAPE_TOKEN, i)
i = nd + len(_ESCAPE_TOKEN) if nd != -1 else n
continue
if arr_str[i] == "[":
depth += 1
elif arr_str[i] == "]":
depth -= 1
i += 1

Riho and others added 2 commits April 6, 2026 02:03
…th internal quotes

Consolidate the Gemma4 argument parser into gemma4_utils.py so both
the offline parser and the API server parser share the same correct
implementation. The old _parse_tool_arguments() replaced <|"|>
delimiters with " then used a regex that stopped at internal quotes,
silently truncating values like HTML attributes or code.

Fixes vllm-project#39069

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Riho <koki.riho@necam.com>
Address Gemini Code Assist review: nested arrays containing strings
with [ or ] characters (e.g., <|"|>foo]<|"|>) would incorrectly
change the depth counter, leading to truncated results. Add the same
_ESCAPE_TOKEN skipping logic already used in nested object parsing.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Riho <koki.riho@necam.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 8, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Rih0z.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: gemma4_utils._parse_tool_arguments truncates string values containing internal quotes

1 participant