Skip to content

Delta weight sync using Xet buckets#5417

Draft
AmineDiro wants to merge 6 commits intomainfrom
delta-weight-sync
Draft

Delta weight sync using Xet buckets#5417
AmineDiro wants to merge 6 commits intomainfrom
delta-weight-sync

Conversation

@AmineDiro
Copy link
Copy Markdown
Member

What does this PR do?

  • Sparse weight sync between trainer and vLLM is working — encodes only changed bf16 elements as sparse safetensors (indices + values), uploads to HF Storage Bucket
  •  ~20-35 MB per delta vs 1.2 GB full model (Qwen3-0.6B), sparsity >99%

Training converges correctly on the immediate-EOS sanity check 👍

Still have some optimizations to implement. Both trainer and vLLM hold a CPU bf16 snapshot of the model

  • I tried predicting changes from Adam state (m, v) to skip the snapshot based on bf16 ULP but recall was only ~30%, ground truth difference needed instead. So maybe I am doing something incorrectly
  • vLLM: snapshot needed because load_weights only accepts full tensors, no API for sparse in-place updates in vllm for now but maybe I can contribute this feature to vllm directly

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

- Add `huggingface-hub` as dependency
- Introduce sparse weight patching via `DeltaWeightTransferEngine`
- Add `ULPChangeDetector` for optimizer-level change tracking
- Add config parameters for delta sync control (repo, anchor interval, checksum verification)
- Support both anchor checkpoints and delta patches via HF Hub (Xet storage)
- Add `huggingface-hub` as dependency
- Introduce sparse weight patching via `DeltaWeightTransferEngine`
- Add `ULPChangeDetector` for optimizer-level change tracking
- Add config parameters for delta sync control (repo, anchor interval,
  checksum verification)
- Support both anchor checkpoints and delta patches via HF Hub (Xet
  storage)
  Add delta weight synchronization support to AsyncGRPO

Implements two-phase delta sync workflow: non-blocking upload to HF Hub
while
inference continues, followed by a signal to vLLM to fetch and apply.
Adds ULP
change detection to selectively sync only modified parameters with
element-level
masks. Simplifies delta engine API by removing anchor/checksum logic;
now uses
HF Hub directly without intermediate configuration objects.
Remove ULP prediction logic, diagnostic logging config, and checkpoint
chain reconstruction. Keep only ground-truth bf16 change detection via
optimizer hooks and sparse patch metadata.
- Move anchor/delta decision from trainer to rollout worker
- Remove change detector from streaming iter; only check for validated
  masks
- Migrate from HfApi to bucket_id and HF Bucket APIs
- Simplify upload/download paths and remove revision parameter
- Refactor _send_weights_delta with clearer empty/non-empty logic
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good, I'm not not sure to understand the big picture. Why do we push the weights (the diffs actually) if it's only to download them. Shouldn't we just keep the diff locally?

@qgallouedec
Copy link
Copy Markdown
Member

vLLM: snapshot needed because load_weights only accepts full tensors, no API for sparse in-place updates in vllm for now but maybe I can contribute this feature to vllm directly

Let's ask them directly

@AmineDiro
Copy link
Copy Markdown
Member Author

That looks good, I'm not not sure to understand the big picture. Why do we push the weights (the diffs actually) if it's only to download them. Shouldn't we just keep the diff locally?

I think the goal of this technique is mainly for completely disggregating the inference and trainer servers. Like if the inference is running on some HF space you can still exchange weights. It also provides a good checkpoint mechanism for the weights because you have that for free (although I need to check the Xet atomicity semantics :? )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants