-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add Qwen3-VL tool calling support #5469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
4b3aa51
Narrow prefix-preserving check to the actual requirement
qgallouedec 0894910
Merge branch 'main' into narrow-prefix-preserving-check
qgallouedec 730070b
Update chat template examples to use multiplication function calls
qgallouedec 4622d77
style
qgallouedec 08d4c51
Move chat templates from inline strings to `.jinja` files
qgallouedec 276559d
tools in dummy
qgallouedec 673c35d
Add chat template files to MANIFEST.in
qgallouedec 604c476
Enhance chat template handling to include tool call formatting in mes…
qgallouedec 83a7ef6
align grpo and async
qgallouedec 0f28384
Merge branch 'main' into chat-templates-files
qgallouedec e5d7cdf
revert no content
qgallouedec a618809
docstyle ignore
qgallouedec a0b81b1
Merge branch 'main' into chat-templates-files
qgallouedec 67ab0af
Merge branch 'main' into chat-templates-files
qgallouedec 63ec7d3
Merge branch 'main' into chat-templates-files
qgallouedec c838146
Merge branch 'main' into chat-templates-files
qgallouedec 7b7f5d1
revert old modif
qgallouedec 8e31596
Add Qwen3-VL tool calling support
qgallouedec 91e940e
Merge branch 'main' into qwen3vl-tool-calling
qgallouedec 116d5c0
Merge branch 'main' into qwen3vl-tool-calling
qgallouedec e111044
Merge branch 'main' into qwen3vl-tool-calling
qgallouedec 39f0f32
Merge branch 'main' into qwen3vl-tool-calling
qgallouedec 535544b
Merge branch 'main' into qwen3vl-tool-calling
qgallouedec File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| {%- if tools %} | ||
| {{- '<|im_start|>system\n' }} | ||
| {%- if messages[0].role == 'system' %} | ||
| {%- if messages[0].content is string %} | ||
| {{- messages[0].content }} | ||
| {%- else %} | ||
| {%- for content in messages[0].content %} | ||
| {%- if 'text' in content %} | ||
| {{- content.text }} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- endif %} | ||
| {{- '\n\n' }} | ||
| {%- endif %} | ||
| {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} | ||
| {%- for tool in tools %} | ||
| {{- "\n" }} | ||
| {{- tool | tojson }} | ||
| {%- endfor %} | ||
| {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} | ||
| {%- else %} | ||
| {%- if messages[0].role == 'system' %} | ||
| {{- '<|im_start|>system\n' }} | ||
| {%- if messages[0].content is string %} | ||
| {{- messages[0].content }} | ||
| {%- else %} | ||
| {%- for content in messages[0].content %} | ||
| {%- if 'text' in content %} | ||
| {{- content.text }} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- endif %} | ||
| {{- '<|im_end|>\n' }} | ||
| {%- endif %} | ||
| {%- endif %} | ||
| {%- set image_count = namespace(value=0) %} | ||
| {%- set video_count = namespace(value=0) %} | ||
| {%- for message in messages %} | ||
| {%- if message.role == "user" %} | ||
| {{- '<|im_start|>' + message.role + '\n' }} | ||
| {%- if message.content is string %} | ||
| {{- message.content }} | ||
| {%- else %} | ||
| {%- for content in message.content %} | ||
| {%- if content.type == 'image' or 'image' in content or 'image_url' in content %} | ||
| {%- set image_count.value = image_count.value + 1 %} | ||
| {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%} | ||
| <|vision_start|><|image_pad|><|vision_end|> | ||
| {%- elif content.type == 'video' or 'video' in content %} | ||
| {%- set video_count.value = video_count.value + 1 %} | ||
| {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%} | ||
| <|vision_start|><|video_pad|><|vision_end|> | ||
| {%- elif 'text' in content %} | ||
| {{- content.text }} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- endif %} | ||
| {{- '<|im_end|>\n' }} | ||
| {%- elif message.role == "assistant" %} | ||
| {{- '<|im_start|>' + message.role + '\n' }} | ||
| {%- if message.content is string %} | ||
| {{- message.content }} | ||
| {%- else %} | ||
| {%- for content_item in message.content %} | ||
| {%- if 'text' in content_item %} | ||
| {{- content_item.text }} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- endif %} | ||
| {%- if message.tool_calls %} | ||
| {%- for tool_call in message.tool_calls %} | ||
| {%- if (loop.first and message.content) or (not loop.first) %} | ||
| {{- '\n' }} | ||
| {%- endif %} | ||
| {%- if tool_call.function %} | ||
| {%- set tool_call = tool_call.function %} | ||
| {%- endif %} | ||
| {{- '<tool_call>\n{"name": "' }} | ||
| {{- tool_call.name }} | ||
| {{- '", "arguments": ' }} | ||
| {%- if tool_call.arguments is string %} | ||
| {{- tool_call.arguments }} | ||
| {%- else %} | ||
| {{- tool_call.arguments | tojson }} | ||
| {%- endif %} | ||
| {{- '}\n</tool_call>' }} | ||
| {%- endfor %} | ||
| {%- endif %} | ||
| {{- '<|im_end|>\n' }} | ||
| {%- elif message.role == "tool" %} | ||
| {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %} | ||
| {{- '<|im_start|>user' }} | ||
| {%- endif %} | ||
| {{- '\n<tool_response>\n' }} | ||
| {%- if message.content is string %} | ||
| {{- message.content }} | ||
| {%- else %} | ||
| {%- for content in message.content %} | ||
| {%- if content.type == 'image' or 'image' in content or 'image_url' in content %} | ||
| {%- set image_count.value = image_count.value + 1 %} | ||
| {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%} | ||
| <|vision_start|><|image_pad|><|vision_end|> | ||
| {%- elif content.type == 'video' or 'video' in content %} | ||
| {%- set video_count.value = video_count.value + 1 %} | ||
| {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%} | ||
| <|vision_start|><|video_pad|><|vision_end|> | ||
| {%- elif 'text' in content %} | ||
| {{- content.text }} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- endif %} | ||
| {{- '\n</tool_response>' }} | ||
| {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} | ||
| {{- '<|im_end|>\n' }} | ||
| {%- endif %} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- if add_generation_prompt %} | ||
| {{- '<|im_start|>assistant\n' }} | ||
| {%- endif %} | ||
qgallouedec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same template, but it's more natural to use the CausalLM instead of the SequenceClassification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember why we used Qwen3MoeForSequenceClassification in the first place