Skip to content

Simplify _get_tool_suffix_ids#5440

Merged
qgallouedec merged 12 commits intomainfrom
simplify_get_tool_suffix_ids
Apr 7, 2026
Merged

Simplify _get_tool_suffix_ids#5440
qgallouedec merged 12 commits intomainfrom
simplify_get_tool_suffix_ids

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented Apr 2, 2026

I think that the processor.__call__ path for VLM + tool images is unnecessary

Proof that both approaches produce identical results:

from transformers import AutoProcessor
from PIL import Image
from trl.data_utils import prepare_multimodal_messages

for model_id in [
    "trl-internal-testing/tiny-Qwen2VLForConditionalGeneration",
    "trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration",
]:
    processor = AutoProcessor.from_pretrained(model_id)
    img = Image.new("RGB", (512, 512), color="red")

    dummy_messages = [{"role": "user", "content": "dummy"}, {"role": "assistant", "content": "dummy"}]
    dummy_messages = prepare_multimodal_messages(dummy_messages, [])
    tool_messages = [
        {
            "role": "tool", "name": "screenshot",
            "content": [
                {"type": "image", "image": img},
                {"type": "text", "text": "Here is the screenshot"},
            ]
        },
    ]

    # Method 1: apply_chat_template(tokenize=True): the new approach
    prefix_1 = processor.apply_chat_template(dummy_messages, add_generation_prompt=False, tokenize=True, return_dict=False)[0]
    full_1 = processor.apply_chat_template(dummy_messages + tool_messages, add_generation_prompt=True, tokenize=True, return_dict=False)[0]
    suffix_1 = full_1[len(prefix_1):]

    # Method 2: processor.__call__ with images: the old approach
    prefix_text = processor.apply_chat_template(dummy_messages, add_generation_prompt=False, tokenize=False)
    prefix_2 = processor(text=prefix_text, return_tensors="pt")["input_ids"][0].tolist()
    full_text = processor.apply_chat_template(dummy_messages + tool_messages, add_generation_prompt=True, tokenize=False)
    full_2 = processor(text=full_text, images=[img], return_tensors="pt")["input_ids"][0].tolist()
    suffix_2 = full_2[len(prefix_2):]

    assert suffix_1 == suffix_2

@sergiopaniego I need your feedback here, because this was added in #5323 on purpose, so I guess you encountered some issues, but I was not able to reproduce the issue.


Note

Medium Risk
Changes how tool-role messages are normalized for multimodal processing and simplifies _get_tool_suffix_ids tokenization; this could affect prompt/suffix alignment for tool-calling, especially with VLM chat templates.

Overview
Simplifies GRPO _get_tool_suffix_ids by removing the special processor.__call__ path for VLM tool-image responses and always deriving prefix/full IDs via apply_chat_template(tokenize=True), after normalizing tool_messages with prepare_multimodal_messages.

Updates prepare_multimodal_messages to also wrap string tool outputs into structured [{"type":"text","text":...}] content, and adjusts the corresponding unit test expectation for tool turns.

Reviewed by Cursor Bugbot for commit 67914bf. Bugbot is set up for automated code reviews on this repo. Configure here.

@qgallouedec
Copy link
Copy Markdown
Member Author

@codex review

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@qgallouedec qgallouedec requested a review from AmineDiro April 4, 2026 03:19
@qgallouedec qgallouedec requested a review from kashif April 4, 2026 03:20
Copy link
Copy Markdown
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need your feedback here, because this was added in #5323 on purpose, so I guess you encountered some issues, but I was not able to reproduce the issue.

i don't remember the exact reason after so many iterations 🫠 i just tested your idea with qwen3.5 and works.

Updating the idea raised by codex, I think we're good to go

Copy link
Copy Markdown
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Just please consider the Cursor review below about: Removed VLM string content normalization for tool messages

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit df2c013. Configure here.

@qgallouedec qgallouedec merged commit 0cb9667 into main Apr 7, 2026
12 of 14 checks passed
@qgallouedec qgallouedec deleted the simplify_get_tool_suffix_ids branch April 7, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants