Remove truncation_mode from DPO by albertvillanova · Pull Request #5372 · huggingface/trl

albertvillanova · 2026-03-25T10:13:48Z

Remove truncation_mode from DPO.

This PR simplifies and standardizes the truncation logic for tokenized sequences in the DPO Trainer by removing the truncation_mode option. All truncation now consistently keeps the start of the sequence and truncates from the end, eliminating ambiguity and reducing configuration complexity. Documentation and code are updated accordingly.

Motivation

I'd like to propose a potential simplification: removing the truncation_mode option altogether.

The motivation is to reduce complexity and avoid supporting behaviors that are either rarely used or potentially undesirable in practice. Before going further, I wanted to get feedback on whether this kind of simplification would be acceptable, and whether dropping this functionality would be considered reasonable.

This is just an initial proposal for discussion. If there is agreement, I'd be happy to follow up and apply the same change across other trainers for consistency. Otherwise, no worries at all: feel free to disregard this suggestion and close the PR.

CC: @qgallouedec

Changes

Truncation logic simplification:

Removed the truncation_mode parameter and all related logic from DPOConfig, DataCollatorForPreference, and DPOTrainer, so sequences longer than max_length are always truncated from the end.

Documentation updates:

Updated docstrings and help texts to reflect the new, simplified truncation behavior, removing references to truncation_mode and clarifying that truncation always happens from the end.

Note

Medium Risk
Removes a public configuration option and changes truncation behavior for DPO runs that previously relied on keep_end, which could alter training inputs and results. Risk is contained to DPO preprocessing/collation and is covered by updating/removing the associated regression test.

Overview
DPO no longer supports configurable truncation behavior. truncation_mode is removed from DPOConfig, DataCollatorForPreference, and the DPOTrainer wiring; when max_length is set, sequences are now always truncated by keeping the start and cutting off the end.

Documentation/help text is updated to reflect the fixed truncation behavior, the VLM-specific keep_end validation in DPOTrainer is deleted, and the corresponding test that asserted this error is removed.

^{Reviewed by Cursor Bugbot for commit 4bf9351. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-03-25T10:16:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/dpo_config.py

cursor · 2026-03-25T10:18:31Z

trl/trainer/dpo_trainer.py

        rejected_mask = [[0] * len(example["prompt_ids"]) + [1] * len(example["rejected_ids"]) for example in examples]

        if self.max_length is not None:
-            if self.truncation_mode == "keep_start":


Truncation mode removal not propagated to SFT trainer

Low Severity

The truncation_mode removal from DataCollatorForPreference and DPOConfig was not propagated to the SFT trainer, which has the identical pattern in DataCollatorForLanguageModeling (including the same keep_start/keep_end if/elif/else block) and SFTConfig. The VLM keep_end guard is also duplicated verbatim in SFTTrainer.__init__. This creates an inconsistency between the two main-code trainers.

Additional Locations (1)

trl/trainer/dpo_config.py#L50-L53

^{Triggered by project rule: BUGBOT.md}

As commented in the description, if there is agreement, I'd be happy to follow up and apply the same change across other trainers for consistency.

qgallouedec · 2026-03-25T15:48:39Z

I think removing the truncation-side option would be reasonable.

I checked a few other repos that implement DPO/SFT, and keep_end-style behavior looks more like an exception than the norm:

torchtune: it supports the equivalent of keep_end via truncation_type="left", but the default is still "right": link
OpenRLHF: I did not find any keep_end-style option for DPO/SFT. It relies on the tokenizer truncation path. Interestingly, after truncation it forces the last token to EOS: link
LLaMA-Factory: its preprocessing mostly preserves the beginning of the sequence, not the end: link
Unsloth: it mostly delegates this behavior to trl, so it is not really an independent reference point
Axolotl: I did not find an explicit keep_end mode either; for DPO it mostly inherits TRL behavior and drops overlong pairs

So the implementation landscape seems fairly consistent: keep_end can exist as an explicit option, but it is rarely the default, and in several repos it is not exposed at all.

For SFT the case is a bit less clear, but for DPO in particular, left truncation seems unusual. If it is supported, it is generally an opt-in behavior rather than the standard path.

We could also check papers, but that would likely take quite a bit more time for what is probably the same conclusion.

cc @kashif if you have some insights

qgallouedec · 2026-03-25T15:50:49Z

not for this PR, but we might want to re-add this option in the future to enable dropping overlong examples, which I think make more sense

qgallouedec · 2026-03-27T01:21:06Z

I don't want to rush this before v1, I recommend keeping truncation_mode, and decide whether we should stop supporting "keep_end"

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4bf9351. Configure here.}

cursor · 2026-04-06T13:44:47Z

trl/trainer/dpo_config.py

-            Truncation mode to use when the sequence exceeds `max_length`. Possible values are `"keep_end"` and
-            `"keep_start"`.
+            Maximum length of the tokenized sequence. Sequences longer than `max_length` are truncated from the end.
+            If `None`, no truncation is applied.


Inconsistent truncation_mode removal across trainers violates consistency rule

Medium Severity

The truncation_mode parameter and its associated logic (config field, collator field, truncation branching, VLM guard) are duplicated across the DPO and SFT trainers. This PR removes truncation_mode from DPOConfig, DataCollatorForPreference, and DPOTrainer, but the identical pattern remains in SFTConfig, DataCollatorForLanguageModeling, and SFTTrainer. Per the project's AGENTS.md consistency rules, changes to duplicated logic across trainers must be propagated to all copies.

Additional Locations (1)

trl/trainer/dpo_trainer.py#L154-L155

^{Triggered by project rule: ../.ai/AGENTS.md}

^{Reviewed by Cursor Bugbot for commit 4bf9351. Configure here.}

qgallouedec · 2026-04-06T14:08:53Z

We may want to have a "drop" truncation mode in the future, ie drop overlong samples instead of truncating them. Maybe keep this arg (and avoid breaking change) and just deprecate "keep_start"?

albertvillanova · 2026-04-07T06:15:11Z

OK, I agree that "drop" truncation mode will be useful. So let's keep truncation_mode and address the rest of changes in separate PRs. I'm working on this.

Remove truncation_mode from DPO

a444d39

cursor bot reviewed Mar 25, 2026

View reviewed changes

Remove test VLM keep_end raises

a9706c6

qgallouedec requested review from kashif March 25, 2026 15:33

BrownianNotion mentioned this pull request Mar 25, 2026

Supportexcess_length_strategy for RL. axolotl-ai-cloud/axolotl#3547

Open

5 tasks

Merge remote-tracking branch 'upstream/main' into rm-truncation-mode-dpo

01cfab1

Merge branch 'main' into rm-truncation-mode-dpo

4bf9351

cursor bot reviewed Apr 6, 2026

View reviewed changes

albertvillanova closed this Apr 7, 2026

albertvillanova mentioned this pull request Apr 7, 2026

Deprecate keep_end truncation mode #5465

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove truncation_mode from DPO#5372

Remove truncation_mode from DPO#5372
albertvillanova wants to merge 4 commits intohuggingface:mainfrom
albertvillanova:rm-truncation-mode-dpo

albertvillanova commented Mar 25, 2026 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2026

Uh oh!

Uh oh!

cursor bot Mar 25, 2026

Uh oh!

albertvillanova Mar 25, 2026

Uh oh!

qgallouedec commented Mar 25, 2026 •

edited

Loading

Uh oh!

qgallouedec commented Mar 25, 2026 •

edited

Loading

Uh oh!

qgallouedec commented Mar 27, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 6, 2026

Uh oh!

qgallouedec commented Apr 6, 2026 •

edited

Loading

Uh oh!

albertvillanova commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Mar 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2026

Uh oh!

Uh oh!

cursor bot Mar 25, 2026

Choose a reason for hiding this comment

Truncation mode removal not propagated to SFT trainer

Uh oh!

albertvillanova Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Mar 27, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 6, 2026

Choose a reason for hiding this comment

Inconsistent truncation_mode removal across trainers violates consistency rule

Uh oh!

qgallouedec commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertvillanova commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Mar 25, 2026 •

edited by cursor bot

Loading

qgallouedec commented Mar 25, 2026 •

edited

Loading

qgallouedec commented Mar 25, 2026 •

edited

Loading

qgallouedec commented Apr 6, 2026 •

edited

Loading