Delta weight sync using Xet buckets by AmineDiro · Pull Request #5417 · huggingface/trl

AmineDiro · 2026-03-31T15:04:12Z

What does this PR do?

Sparse weight sync between trainer and vLLM is working — encodes only changed bf16 elements as sparse safetensors (indices + values), uploads to HF Storage Bucket
~20-35 MB per delta vs 1.2 GB full model (Qwen3-0.6B), sparsity >99%

Training converges correctly on the immediate-EOS sanity check 👍

Still have some optimizations to implement. Both trainer and vLLM hold a CPU bf16 snapshot of the model

I tried predicting changes from Adam state (m, v) to skip the snapshot based on bf16 ULP but recall was only ~30%, ground truth difference needed instead. So maybe I am doing something incorrectly
vLLM: snapshot needed because load_weights only accepts full tensors, no API for sparse in-place updates in vllm for now but maybe I can contribute this feature to vllm directly

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

- Add `huggingface-hub` as dependency - Introduce sparse weight patching via `DeltaWeightTransferEngine` - Add `ULPChangeDetector` for optimizer-level change tracking - Add config parameters for delta sync control (repo, anchor interval, checksum verification) - Support both anchor checkpoints and delta patches via HF Hub (Xet storage)

- Add `huggingface-hub` as dependency - Introduce sparse weight patching via `DeltaWeightTransferEngine` - Add `ULPChangeDetector` for optimizer-level change tracking - Add config parameters for delta sync control (repo, anchor interval, checksum verification) - Support both anchor checkpoints and delta patches via HF Hub (Xet storage) Add delta weight synchronization support to AsyncGRPO Implements two-phase delta sync workflow: non-blocking upload to HF Hub while inference continues, followed by a signal to vLLM to fetch and apply. Adds ULP change detection to selectively sync only modified parameters with element-level masks. Simplifies delta engine API by removing anchor/checksum logic; now uses HF Hub directly without intermediate configuration objects.

Remove ULP prediction logic, diagnostic logging config, and checkpoint chain reconstruction. Keep only ground-truth bf16 change detection via optimizer hooks and sparse patch metadata.

- Move anchor/delta decision from trainer to rollout worker - Remove change detector from streaming iter; only check for validated masks - Migrate from HfApi to bucket_id and HF Bucket APIs - Simplify upload/download paths and remove revision parameter - Refactor _send_weights_delta with clearer empty/non-empty logic

qgallouedec

That looks good, I'm not not sure to understand the big picture. Why do we push the weights (the diffs actually) if it's only to download them. Shouldn't we just keep the diff locally?

qgallouedec · 2026-04-07T00:31:02Z

vLLM: snapshot needed because load_weights only accepts full tensors, no API for sparse in-place updates in vllm for now but maybe I can contribute this feature to vllm directly

Let's ask them directly

AmineDiro · 2026-04-07T06:26:40Z

That looks good, I'm not not sure to understand the big picture. Why do we push the weights (the diffs actually) if it's only to download them. Shouldn't we just keep the diff locally?

I think the goal of this technique is mainly for completely disggregating the inference and trainer servers. Like if the inference is running on some HF space you can still exchange weights. It also provides a good checkpoint mechanism for the weights because you have that for free (although I need to check the Xet atomicity semantics :? )

AmineDiro added 4 commits March 31, 2026 09:46

Simplify weight diff detector and remove unused features

592734f

Remove ULP prediction logic, diagnostic logging config, and checkpoint chain reconstruction. Keep only ground-truth bf16 change detection via optimizer hooks and sparse patch metadata.

AmineDiro requested review from albertvillanova, lewtun and qgallouedec March 31, 2026 15:04

AmineDiro and others added 2 commits March 31, 2026 16:28

Remove unnecessary section comments from delta_engine.py

9f2e55f

Merge branch 'main' into delta-weight-sync

a7f0a86

qgallouedec reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delta weight sync using Xet buckets#5417

Delta weight sync using Xet buckets#5417
AmineDiro wants to merge 6 commits intomainfrom
delta-weight-sync

AmineDiro commented Mar 31, 2026

Uh oh!

qgallouedec left a comment

Uh oh!

qgallouedec commented Apr 7, 2026

Uh oh!

AmineDiro commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AmineDiro commented Mar 31, 2026

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Apr 7, 2026

Uh oh!

AmineDiro commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants