Conversation
…sages + async grpo
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b18e39efd6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
cc @Rocketknight1 for the schema |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 450b9ef. Configure here.
| """ | ||
| if tokenizer.chat_template == gptoss_chat_template: | ||
| tokenizer.response_schema = gptoss_schema | ||
| return tokenizer |
There was a problem hiding this comment.
Prefix-preserving check diverges from suffix extraction construction
Low Severity
is_chat_template_prefix_preserving still uses a hardcoded "dummy" tool name, while _get_tool_suffix_ids was changed to use the real tool name via tool_messages[0]["name"]. The comment on line 350 explicitly states "Use the same dummy messages as _get_tool_suffix_ids", but the constructions now differ. For GPT-OSS, the tool name is embedded in the rendered text (e.g. to=functions.NAME), so the validation function is no longer testing the exact property that _get_tool_suffix_ids relies on.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 450b9ef. Configure here.
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>


gptoss.jinjatemplate for identity matching inadd_response_schemaTestAddResponseSchemaandTestParseResponsetest parametrizationsPart of #5460
Warning
Requires/contains #5459
Note
Medium Risk
Introduces a new GPT-OSS response parsing schema and updates GRPO tool-suffix tokenization to depend on real tool names, which could affect tool-call loop formatting and parsing across models if templates behave differently.
Overview
Adds GPT-OSS tool-calling support by introducing a new
gptoss.jinjaidentity template and a correspondinggptoss_schema, and wiring both intoadd_response_schemasoparse_responsecan extractcontentand single tool calls from GPT-OSS outputs.Updates GRPO tool-call suffix extraction in both
GRPOTrainerandAsyncRolloutWorkerto build the dummy conversation using the actual tool name (not"dummy"), matching templates (like GPT-OSS) that derive tool-response headers from the preceding tool call.Extends response-schema/parsing tests to include a
tiny-GptOssForCausalLMtokenizer with model-specific skips/expectations, and documents GPT-OSS as a supported agent-training model ingrpo_trainer.md.Reviewed by Cursor Bugbot for commit 392dece. Bugbot is set up for automated code reviews on this repo. Configure here.