Simplify `_get_tool_suffix_ids` by qgallouedec · Pull Request #5440 · huggingface/trl

qgallouedec · 2026-04-02T19:34:35Z

I think that the processor.__call__ path for VLM + tool images is unnecessary

Proof that both approaches produce identical results:

from transformers import AutoProcessor
from PIL import Image
from trl.data_utils import prepare_multimodal_messages

for model_id in [
    "trl-internal-testing/tiny-Qwen2VLForConditionalGeneration",
    "trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration",
]:
    processor = AutoProcessor.from_pretrained(model_id)
    img = Image.new("RGB", (512, 512), color="red")

    dummy_messages = [{"role": "user", "content": "dummy"}, {"role": "assistant", "content": "dummy"}]
    dummy_messages = prepare_multimodal_messages(dummy_messages, [])
    tool_messages = [
        {
            "role": "tool", "name": "screenshot",
            "content": [
                {"type": "image", "image": img},
                {"type": "text", "text": "Here is the screenshot"},
            ]
        },
    ]

    # Method 1: apply_chat_template(tokenize=True): the new approach
    prefix_1 = processor.apply_chat_template(dummy_messages, add_generation_prompt=False, tokenize=True, return_dict=False)[0]
    full_1 = processor.apply_chat_template(dummy_messages + tool_messages, add_generation_prompt=True, tokenize=True, return_dict=False)[0]
    suffix_1 = full_1[len(prefix_1):]

    # Method 2: processor.__call__ with images: the old approach
    prefix_text = processor.apply_chat_template(dummy_messages, add_generation_prompt=False, tokenize=False)
    prefix_2 = processor(text=prefix_text, return_tensors="pt")["input_ids"][0].tolist()
    full_text = processor.apply_chat_template(dummy_messages + tool_messages, add_generation_prompt=True, tokenize=False)
    full_2 = processor(text=full_text, images=[img], return_tensors="pt")["input_ids"][0].tolist()
    suffix_2 = full_2[len(prefix_2):]

    assert suffix_1 == suffix_2

@sergiopaniego I need your feedback here, because this was added in #5323 on purpose, so I guess you encountered some issues, but I was not able to reproduce the issue.

Note

Medium Risk
Changes how tool-role messages are normalized for multimodal processing and simplifies _get_tool_suffix_ids tokenization; this could affect prompt/suffix alignment for tool-calling, especially with VLM chat templates.

Overview
Simplifies GRPO _get_tool_suffix_ids by removing the special processor.__call__ path for VLM tool-image responses and always deriving prefix/full IDs via apply_chat_template(tokenize=True), after normalizing tool_messages with prepare_multimodal_messages.

Updates prepare_multimodal_messages to also wrap string tool outputs into structured [{"type":"text","text":...}] content, and adjusts the corresponding unit test expectation for tool turns.

^{Reviewed by Cursor Bugbot for commit 67914bf. Bugbot is set up for automated code reviews on this repo. Configure here.}

qgallouedec · 2026-04-02T19:35:07Z

@codex review

HuggingFaceDocBuilderDev · 2026-04-02T19:38:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector · 2026-04-02T19:40:53Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

sergiopaniego

I need your feedback here, because this was added in #5323 on purpose, so I guess you encountered some issues, but I was not able to reproduce the issue.

i don't remember the exact reason after so many iterations 🫠 i just tested your idea with qwen3.5 and works.

Updating the idea raised by codex, I think we're good to go

albertvillanova

Thanks.

Just please consider the Cursor review below about: Removed VLM string content normalization for tool messages

trl/trainer/grpo_trainer.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit df2c013. Configure here.}

tests/test_data_utils.py

Simplify _get_tool_suffix_ids in GRPO

05d0994

qgallouedec requested review from albertvillanova and sergiopaniego April 2, 2026 19:35

Merge branch 'main' into simplify_get_tool_suffix_ids

aa4d3af

qgallouedec requested a review from AmineDiro April 4, 2026 03:19

Merge branch 'main' into simplify_get_tool_suffix_ids

90c460c

qgallouedec requested a review from kashif April 4, 2026 03:20

cursor bot reviewed Apr 4, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

Merge branch 'main' into simplify_get_tool_suffix_ids

c73ed1c

sergiopaniego reviewed Apr 6, 2026

View reviewed changes

qgallouedec added 3 commits April 6, 2026 15:24

Merge branch 'main' into simplify_get_tool_suffix_ids

b2c2a46

Merge branch 'main' into simplify_get_tool_suffix_ids

e3530ea

Merge branch 'main' into simplify_get_tool_suffix_ids

d9ca0c8

albertvillanova approved these changes Apr 7, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

prepare tool messages for vlm

645eda4

cursor bot reviewed Apr 7, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

qgallouedec and others added 3 commits April 7, 2026 14:37

revert

1ee66d7

Merge branch 'main' into simplify_get_tool_suffix_ids

f6b4d87

Update test_data_utils.py

df2c013

cursor bot reviewed Apr 7, 2026

View reviewed changes

tests/test_data_utils.py Outdated Show resolved Hide resolved

Change content format in test for get_current_weather

67914bf

qgallouedec merged commit 0cb9667 into main Apr 7, 2026
12 of 14 checks passed

qgallouedec deleted the simplify_get_tool_suffix_ids branch April 7, 2026 16:10

This was referenced Apr 8, 2026

Avoid image deepcopy in prepare_multimodal_messages #5475

Merged

Update docstring about tool messages in prepare_multimodal_messages #5476

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify `_get_tool_suffix_ids`#5440

Simplify `_get_tool_suffix_ids`#5440
qgallouedec merged 12 commits intomainfrom
simplify_get_tool_suffix_ids

qgallouedec commented Apr 2, 2026 •

edited by cursor bot

Loading

Uh oh!

qgallouedec commented Apr 2, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 2, 2026

Uh oh!

Uh oh!

sergiopaniego left a comment

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qgallouedec commented Apr 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Apr 2, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 2, 2026

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qgallouedec commented Apr 2, 2026 •

edited by cursor bot

Loading