Skip to content

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add GPT-OSS tool calling support
#5464 opened Apr 6, 2026 by qgallouedec Loading…
Add GLM-4-MoE tool calling support
#5463 opened Apr 6, 2026 by qgallouedec Loading…
GOLDTrainer VLM support
#5461 opened Apr 6, 2026 by Strongich Loading…
4 of 8 tasks
[docs] Clarify dtype defaults between trf v5 and TRL
#5457 opened Apr 4, 2026 by casinca Loading…
2 of 4 tasks
fix _get_per_token_logps_and_entropies return type
#5456 opened Apr 4, 2026 by kashif Loading…
2 of 8 tasks
Gemma 4 support
#5453 opened Apr 4, 2026 by qgallouedec Loading…
[AsyncGRPO] Support async tool calls in AsyncRolloutWorker
#5446 opened Apr 3, 2026 by PoilZero Loading…
5 of 8 tasks
Simplify _get_tool_suffix_ids
#5440 opened Apr 2, 2026 by qgallouedec Loading…
FIPO loss
#5434 opened Apr 2, 2026 by kdubovikov Loading…
4 of 8 tasks
feat(async-grpo): add sampling parameter parity
#5418 opened Mar 31, 2026 by kdubovikov Loading…
4 of 8 tasks
Delta weight sync using Xet buckets
#5417 opened Mar 31, 2026 by AmineDiro Draft
8 tasks
fix(async-grpo): honor model init dtype
#5416 opened Mar 31, 2026 by kdubovikov Loading…
3 of 8 tasks
Skip redundant forward pass for on-policy vLLM importance sampling
#5413 opened Mar 31, 2026 by GJ98 Loading…
3 of 8 tasks
add JEPO trainer
#5411 opened Mar 31, 2026 by zbills Loading…
3 of 7 tasks
Add DistillationTrainer for efficient on-policy distillation
#5407 opened Mar 30, 2026 by cmpatino Loading…
3 of 5 tasks
Add length-normalized sigmoid loss type to DPO trainer
#5406 opened Mar 30, 2026 by BrownianNotion Loading…
5 of 8 tasks
Add per-sample tool filtering to GRPOTrainer via tools column
#5398 opened Mar 27, 2026 by lailanelkoussy Loading…
3 tasks done
Add tool calling support to RLOOTrainer
#5395 opened Mar 27, 2026 by qgallouedec Loading…
Fix DAPO token-level loss to use prompt-level aggregation
#5381 opened Mar 26, 2026 by matdou Loading…
2 of 5 tasks
Remove truncation_mode from DPO
#5372 opened Mar 25, 2026 by albertvillanova Loading…
ProTip! Filter pull requests by the default branch with base:main.