-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add length-normalized sigmoid loss type to DPO trainer #5406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
0652be5
6872246
47c461a
cb2af7f
8f55007
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1273,6 +1273,13 @@ def _compute_loss(self, model, inputs, return_outputs): | |
| # (Eq. 17) of the paper where beta is the regularization parameter for the IPO loss, denoted by τ. | ||
| per_sequence_loss = (ipo_delta - 1 / (2 * self.beta)) ** 2 | ||
|
|
||
| elif loss_type == "sigmoid_norm": | ||
| chosen_mask, rejected_mask = completion_mask.chunk(2, dim=0) | ||
| chosen_avg_score = chosen_scores / chosen_mask.sum(dim=1).clamp(min=1.0) | ||
| rejected_avg_score = rejected_scores / rejected_mask.sum(dim=1).clamp(min=1.0) | ||
| delta = chosen_avg_score - rejected_avg_score | ||
| per_sequence_loss = -F.logsigmoid(self.beta * delta) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing required paper_index.md update for new lossLow Severity This PR implements the length-normalized sigmoid DPO loss from a research paper (Tulu-3 / SimPO), but Additional Locations (1)Triggered by project rule: ../.ai/AGENTS.md |
||
|
|
||
| elif loss_type == "exo_pair": | ||
| # Implements EXO-pref from the paper https://huggingface.co/papers/2402.00856, (Eq. 16) | ||
| # Minimize KL(p_fθ || p_rh) for K=2; p_fθ = softmax(βπ * (log πθ − log π_ref)) over {chosen, rejected} | ||
|
|
@@ -1373,7 +1380,7 @@ def _compute_loss(self, model, inputs, return_outputs): | |
|
|
||
| else: | ||
| raise ValueError( | ||
| f"Unknown loss type: {loss_type}. Should be one of ['sigmoid', 'hinge', 'ipo', 'exo_pair', " | ||
| f"Unknown loss type: {loss_type}. Should be one of ['sigmoid', 'hinge', 'ipo', 'sigmoid_norm', 'exo_pair', " | ||
| "'nca_pair', 'robust', 'bco_pair', 'sppo_hard', 'aot', 'aot_unpaired', 'apo_zero', 'apo_down', " | ||
| "'discopop', 'sft']" | ||
| ) | ||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing paper_index.md update for new loss type
Low Severity
This PR adds the
sigmoid_normloss type implementing the length-normalized DPO loss from the Tulu-3 paper (arXiv 2411.15124), butpaper_index.mdis not updated with a corresponding subsection. The project rule inAGENTS.mdrequires that PRs implementing methods from research papers must also add a subsection topaper_index.md. While SimPO is already listed inpaper_index.md, it's associated with the CPO trainer, and the Tulu-3 paper is not referenced at all.Triggered by project rule: ../.ai/AGENTS.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See PR description: Tulu-3 references SimPO for the length-normalised DPO loss but SimPO focuses on CPO, not DPO. Waiting on Maintainers to indicate the best way to add this to paper index.