fix _get_per_token_logps_and_entropies return type#5456
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 08c44a1. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08c44a1af0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |

fix
_get_per_token_logps_and_entropiesreturn typeWhat does this PR do?
Fixes # (issue)
Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Note
Low Risk
Type-hint-only change that aligns
_get_per_token_logps_and_entropieswith its actual(logps, entropies)return shape; low risk aside from potential downstream typing/IDE expectations.Overview
Updates
_get_per_token_logps_and_entropiesin bothgrpo_trainer.pyandrloo_trainer.pyto annotate the return value as a tuple(per_token_logps, per_token_entropies|None)instead of a dict, matching the function’s actual return.Reviewed by Cursor Bugbot for commit 3c3f0db. Bugbot is set up for automated code reviews on this repo. Configure here.