Add length-normalized sigmoid loss type to DPO trainer by BrownianNotion · Pull Request #5406 · huggingface/trl

BrownianNotion · 2026-03-30T12:26:35Z

What does this PR do?

Adds the length-normalised DPO loss which is used in the Tulu-3/OLMo models for DPO. Source: https://arxiv.org/pdf/2411.15124, section 5.1.2 equation 6.

Let me know if a section for paper index should also be added. The source cited by Tulu-3 is SimPO, section 2.2 equation 4, but this paper focuses on CPO (and is already in the paper index) not DPO.

Fixes #2964 (issue was closed without being addressed)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Note

Medium Risk
Medium risk because it extends core DPOTrainer loss computation and could affect training behavior when selected, though it is opt-in and covered by an added loss-type training test.

Overview
Adds a new DPO loss_type="sigmoid_norm" that normalizes chosen/rejected scores by completion token count before applying the sigmoid (-logsigmoid(beta * delta)), to reduce length bias.

Updates DPOConfig help text, DPO trainer docs, and the loss-type parametrized training test list to include sigmoid_norm (and improves the unknown-loss error message accordingly).

^{Reviewed by Cursor Bugbot for commit 8f55007. Bugbot is set up for automated code reviews on this repo. Configure here.}

trl/trainer/dpo_trainer.py

docs/source/dpo_trainer.md

cursor · 2026-03-30T12:34:08Z

docs/source/dpo_trainer.md

 | `"sigmoid"` (default) | Given the preference data, we can fit a binary classifier according to the Bradley-Terry model and in fact the [DPO](https://huggingface.co/papers/2305.18290) authors propose the sigmoid loss on the normalized likelihood via the `logsigmoid` to fit a logistic regression. |
 | `"hinge"` | The [RSO](https://huggingface.co/papers/2309.06657) authors propose to use a hinge loss on the normalized likelihood from the [SLiC](https://huggingface.co/papers/2305.10425) paper. In this case, the `beta` is the reciprocal of the margin. |
 | `"ipo"` | The [IPO](https://huggingface.co/papers/2310.12036) authors argue the logit transform can overfit and propose the identity transform to optimize preferences directly; TRL exposes this as `loss_type="ipo"`. |
+| `"sigmoid_norm"` | The [SIMPO](https://huggingface.co/papers/2405.14734) authors address the length-bias in the original sigmoid loss by normalizing by the number of non-mask tokens; TRL exposes this as `loss_type="sigmoid_norm"`. |


Missing paper_index.md update for new loss type

Low Severity

This PR adds the sigmoid_norm loss type implementing the length-normalized DPO loss from the Tulu-3 paper (arXiv 2411.15124), but paper_index.md is not updated with a corresponding subsection. The project rule in AGENTS.md requires that PRs implementing methods from research papers must also add a subsection to paper_index.md. While SimPO is already listed in paper_index.md, it's associated with the CPO trainer, and the Tulu-3 paper is not referenced at all.

^{Triggered by project rule: ../.ai/AGENTS.md}

See PR description: Tulu-3 references SimPO for the length-normalised DPO loss but SimPO focuses on CPO, not DPO. Waiting on Maintainers to indicate the best way to add this to paper index.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-31T14:23:28Z

trl/trainer/dpo_trainer.py

+                chosen_avg_score = chosen_scores / chosen_mask.sum(dim=1).clamp(min=1.0)
+                rejected_avg_score = rejected_scores / rejected_mask.sum(dim=1).clamp(min=1.0)
+                delta = chosen_avg_score - rejected_avg_score
+                per_sequence_loss = -F.logsigmoid(self.beta * delta)


Missing required paper_index.md update for new loss

Low Severity

This PR implements the length-normalized sigmoid DPO loss from a research paper (Tulu-3 / SimPO), but paper_index.md is not updated with a corresponding subsection. The project rule in AGENTS.md requires that any PR implementing a method from a research paper must also add an entry to paper_index.md. While SimPO already has an entry under CPO, the Tulu-3 paper (which is the primary source for this DPO variant) has no entry, and there is no DPO-specific subsection for this loss type.

Additional Locations (1)

docs/source/dpo_trainer.md#L113-L114

^{Triggered by project rule: ../.ai/AGENTS.md}

BrownianNotion · 2026-04-01T17:52:45Z

@qgallouedec would you be able to take a look when you can? Let me know how you want the paper index to be updated and if any further changes/tests are required. Thank you!

Add sigmoid_norm loss type to DPO trainer

0652be5

BrownianNotion changed the title ~~Add length-normalised sigmoid loss type to DPO trainer~~ Add length-normalized sigmoid loss type to DPO trainer Mar 30, 2026

cursor bot reviewed Mar 30, 2026

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

docs/source/dpo_trainer.md Outdated Show resolved Hide resolved

PR fixes

6872246

cursor bot reviewed Mar 30, 2026

View reviewed changes

Merge branch 'main' into dpo-add-sigmoid-norm-loss-type

47c461a

This was referenced Mar 30, 2026

DPO transformers v0.29 fixes axolotl-ai-cloud/axolotl#3560

Merged

DPO support loss types axolotl-ai-cloud/axolotl#3566

Draft

Merge branch 'huggingface:main' into dpo-add-sigmoid-norm-loss-type

cb2af7f

cursor bot reviewed Mar 31, 2026

View reviewed changes

Merge branch 'main' into dpo-add-sigmoid-norm-loss-type

8f55007

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add length-normalized sigmoid loss type to DPO trainer#5406

Add length-normalized sigmoid loss type to DPO trainer#5406
BrownianNotion wants to merge 5 commits intohuggingface:mainfrom
BrownianNotion:dpo-add-sigmoid-norm-loss-type

BrownianNotion commented Mar 30, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 30, 2026

Uh oh!

BrownianNotion Mar 30, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 31, 2026

Uh oh!

BrownianNotion commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BrownianNotion commented Mar 30, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 30, 2026

Choose a reason for hiding this comment

Missing paper_index.md update for new loss type

Uh oh!

BrownianNotion Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 31, 2026

Choose a reason for hiding this comment

Missing required paper_index.md update for new loss

Uh oh!

BrownianNotion commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BrownianNotion commented Mar 30, 2026 •

edited by cursor bot

Loading