Skip to content

Add length-normalized sigmoid loss type to DPO trainer#5406

Open
BrownianNotion wants to merge 5 commits intohuggingface:mainfrom
BrownianNotion:dpo-add-sigmoid-norm-loss-type
Open

Add length-normalized sigmoid loss type to DPO trainer#5406
BrownianNotion wants to merge 5 commits intohuggingface:mainfrom
BrownianNotion:dpo-add-sigmoid-norm-loss-type

Conversation

@BrownianNotion
Copy link
Copy Markdown

@BrownianNotion BrownianNotion commented Mar 30, 2026

What does this PR do?

Adds the length-normalised DPO loss which is used in the Tulu-3/OLMo models for DPO. Source: https://arxiv.org/pdf/2411.15124, section 5.1.2 equation 6.

Let me know if a section for paper index should also be added. The source cited by Tulu-3 is SimPO, section 2.2 equation 4, but this paper focuses on CPO (and is already in the paper index) not DPO.

Fixes #2964 (issue was closed without being addressed)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Note

Medium Risk
Medium risk because it extends core DPOTrainer loss computation and could affect training behavior when selected, though it is opt-in and covered by an added loss-type training test.

Overview
Adds a new DPO loss_type="sigmoid_norm" that normalizes chosen/rejected scores by completion token count before applying the sigmoid (-logsigmoid(beta * delta)), to reduce length bias.

Updates DPOConfig help text, DPO trainer docs, and the loss-type parametrized training test list to include sigmoid_norm (and improves the unknown-loss error message accordingly).

Reviewed by Cursor Bugbot for commit 8f55007. Bugbot is set up for automated code reviews on this repo. Configure here.

@BrownianNotion BrownianNotion changed the title Add length-normalised sigmoid loss type to DPO trainer Add length-normalized sigmoid loss type to DPO trainer Mar 30, 2026
| `"sigmoid"` (default) | Given the preference data, we can fit a binary classifier according to the Bradley-Terry model and in fact the [DPO](https://huggingface.co/papers/2305.18290) authors propose the sigmoid loss on the normalized likelihood via the `logsigmoid` to fit a logistic regression. |
| `"hinge"` | The [RSO](https://huggingface.co/papers/2309.06657) authors propose to use a hinge loss on the normalized likelihood from the [SLiC](https://huggingface.co/papers/2305.10425) paper. In this case, the `beta` is the reciprocal of the margin. |
| `"ipo"` | The [IPO](https://huggingface.co/papers/2310.12036) authors argue the logit transform can overfit and propose the identity transform to optimize preferences directly; TRL exposes this as `loss_type="ipo"`. |
| `"sigmoid_norm"` | The [SIMPO](https://huggingface.co/papers/2405.14734) authors address the length-bias in the original sigmoid loss by normalizing by the number of non-mask tokens; TRL exposes this as `loss_type="sigmoid_norm"`. |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing paper_index.md update for new loss type

Low Severity

This PR adds the sigmoid_norm loss type implementing the length-normalized DPO loss from the Tulu-3 paper (arXiv 2411.15124), but paper_index.md is not updated with a corresponding subsection. The project rule in AGENTS.md requires that PRs implementing methods from research papers must also add a subsection to paper_index.md. While SimPO is already listed in paper_index.md, it's associated with the CPO trainer, and the Tulu-3 paper is not referenced at all.

Fix in Cursor Fix in Web

Triggered by project rule: ../.ai/AGENTS.md

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR description: Tulu-3 references SimPO for the length-normalised DPO loss but SimPO focuses on CPO, not DPO. Waiting on Maintainers to indicate the best way to add this to paper index.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

chosen_avg_score = chosen_scores / chosen_mask.sum(dim=1).clamp(min=1.0)
rejected_avg_score = rejected_scores / rejected_mask.sum(dim=1).clamp(min=1.0)
delta = chosen_avg_score - rejected_avg_score
per_sequence_loss = -F.logsigmoid(self.beta * delta)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required paper_index.md update for new loss

Low Severity

This PR implements the length-normalized sigmoid DPO loss from a research paper (Tulu-3 / SimPO), but paper_index.md is not updated with a corresponding subsection. The project rule in AGENTS.md requires that any PR implementing a method from a research paper must also add an entry to paper_index.md. While SimPO already has an entry under CPO, the Tulu-3 paper (which is the primary source for this DPO variant) has no entry, and there is no DPO-specific subsection for this loss type.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by project rule: ../.ai/AGENTS.md

@BrownianNotion
Copy link
Copy Markdown
Author

@qgallouedec would you be able to take a look when you can? Let me know how you want the paper index to be updated and if any further changes/tests are required. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add length normalization option to DPO Trainer

1 participant