Skip to content

[Trainer] Support multi-loss component logging#45270

Draft
madhav1k wants to merge 1 commit intohuggingface:mainfrom
madhav1k:main
Draft

[Trainer] Support multi-loss component logging#45270
madhav1k wants to merge 1 commit intohuggingface:mainfrom
madhav1k:main

Conversation

@madhav1k
Copy link
Copy Markdown

@madhav1k madhav1k commented Apr 6, 2026

Introduce logging of individual loss components when models return a dict of losses.

  • Add TrainingArguments.logging_loss_components flag to enable/disable this behavior.
  • Track per-component running sums with _tr_loss_components and aggregate scalars in _total_loss_components_scalar.
  • Extend NaN/Inf filtering to handle dict-form losses and preserve previous logged averages per component.
  • When enabled, compute_loss is called to return outputs; Trainer extracts scalar loss-like tensors (components and main loss), accumulates them, and includes them in logs.
  • Add tests (tests/trainer/test_multi_loss.py) to verify logging enabled/disabled behavior and presence/absence of loss_part_* entries in training logs.

This lets users surface and monitor auxiliary loss terms alongside the main loss during training.

What does this PR do?

This PR introduces the ability for the Trainer to automatically log individual loss components when a model returns a dictionary of losses. This is particularly useful for multi-task learning or models with auxiliary loss terms (e.g., Distillation, MoE).

Adds TrainingArguments.logging_loss_components to toggle this behavior.
Extends training_step to extract and accumulate scalar loss components.
Handles distributed training by using nested_gather to aggregate components across processes before logging.
Updates the NaN/Inf filter to support dictionary-based losses.

Fixes: This work is coordinated on issue #31081 (also addresses the duplicate #30725).

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Test Results
I created a dedicated test file tests/trainer/test_multi_loss.py and ran it in a python 3.12 venv:

PYTHONPATH=src python -m pytest tests/trainer/test_multi_loss.py

Results:
test_multi_loss_logging: PASSED (Verified components appear in log_history)
test_multi_loss_logging_disabled: PASSED (Verified backward compatibility)

Who can review?

@SunMarc @ArthurZucker (Trainer and Distributed expertise)

Introduce logging of individual loss components when models return a dict of losses.

- Add TrainingArguments.logging_loss_components flag to enable/disable this behavior.
- Track per-component running sums with _tr_loss_components and aggregate scalars in _total_loss_components_scalar.
- Extend NaN/Inf filtering to handle dict-form losses and preserve previous logged averages per component.
- When enabled, compute_loss is called to return outputs; Trainer extracts scalar loss-like tensors (components and main loss), accumulates them, and includes them in logs.
- Add tests (tests/trainer/test_multi_loss.py) to verify logging enabled/disabled behavior and presence/absence of loss_part_* entries in training logs.

This lets users surface and monitor auxiliary loss terms alongside the main loss during training.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45270&sha=eb3a15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant