Skip to content

Avoid image deepcopy in prepare_multimodal_messages#5475

Merged
albertvillanova merged 3 commits intohuggingface:mainfrom
albertvillanova:avoid-image-deepcopy-prepare_multimodal_messages
Apr 9, 2026
Merged

Avoid image deepcopy in prepare_multimodal_messages#5475
albertvillanova merged 3 commits intohuggingface:mainfrom
albertvillanova:avoid-image-deepcopy-prepare_multimodal_messages

Conversation

@albertvillanova
Copy link
Copy Markdown
Member

@albertvillanova albertvillanova commented Apr 8, 2026

Avoid image deepcopy in prepare_multimodal_messages.

This PR refactors the prepare_multimodal_messages function by replacing the deepcopy of original input messages with an incremental build of the output list of message dictionaries with transformed content.

Follow-up to:

Motivation

prepare_multimodal_messages used copy.deepcopy(messages) to avoid mutating the caller's input. This becomes a problem now that messages can contain PIL images (e.g. in "tool" role turns: prepare_multimodal_messages(tool_messages)): deepcopying an image is expensive and can fail for certain image types.

Solution

The fix replaces the deepcopy with an incremental build of the output list. A new message dict ({**message, "content": [...]}) is only created when a string "content" is transformed into a structured list; messages whose content is already a list are passed through as-is. Because the image-filling step only writes into newly-created placeholder dicts, the original messages are never mutated.

Changes

Refactoring for input immutability and safer message transformation:

  • The function no longer deep-copies the input messages; instead, it builds a new list by creating new message dictionaries only when transformations are needed, ensuring the original input is left unchanged.
  • All further processing, including counting image placeholders and inserting images, now operates on new messages rather than the original input.
  • The function's return type description is updated to clarify that a new list of messages is returned, not a deep copy.

Note

Medium Risk
Behavior around immutability changes: messages whose content is already structured may now be returned by reference, so downstream mutation of the returned objects could affect the caller’s originals.

Overview
Refactors prepare_multimodal_messages to avoid copy.deepcopy(messages) (which can be expensive/fail when messages contain PIL images), and instead incrementally builds new_messages while transforming only string content into structured blocks.

Updates placeholder counting and image injection to operate on new_messages, and changes image filling to create new content-part dicts (leaving original parts untouched). Docstring return description is updated from deep-copied to new list.

Reviewed by Cursor Bugbot for commit 7107104. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f57630f. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels correct, let's see what @codex review says

@qgallouedec
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71071043a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +124 to +125
else:
new_content.append(part)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deep-copy non-image blocks when rebuilding content

This branch reuses the original block dict object for every non-image part, so the returned structure aliases nested objects from the caller’s input when messages are already in structured format. Any downstream in-place edit of prepare_multimodal_messages(...) output (for example, adding keys to text/tool blocks) will mutate the original messages, which is a regression from the previous deep-copy behavior and breaks practical immutability expectations for this helper.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@albertvillanova albertvillanova merged commit 1e667d8 into huggingface:main Apr 9, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants