diff --git a/trl/data_utils.py b/trl/data_utils.py index f72c9e5693..56c20114b5 100644 --- a/trl/data_utils.py +++ b/trl/data_utils.py @@ -53,8 +53,6 @@ def prepare_multimodal_messages(messages: list[dict[str, Any]], images: list | N the function transforms them into the structured format by wrapping text in `{"type": "text", "text": ...}` and inserting `{"type": "image"}` placeholders for the images *before* the first user message. If the number of placeholders does not match the number of provided images, an error is raised. - - When the input `messages` contains either `"tool_calls"` in the `"assistant"` turns, or `"tool"` roles with - `"content"` and `"name"` those are left as-is, since those don't require any specific handling for multimodal data. Example: ```python