fix: smoother progress for xet downloads#4059
Open
tobocop2 wants to merge 1 commit intohuggingface:mainfrom
Open
fix: smoother progress for xet downloads#4059tobocop2 wants to merge 1 commit intohuggingface:mainfrom
tobocop2 wants to merge 1 commit intohuggingface:mainfrom
Conversation
e124bb4 to
dcc4347
Compare
84bde27 to
205c369
Compare
205c369 to
a58216f
Compare
26d9fca to
9a69c7a
Compare
Switch xet_get() from a 1-arg to 2-arg callback so xet-core reports progress frequently instead of barely at all. Fixes huggingface#4058
9a69c7a to
6a8f8a3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #4058.
Problem
Xet downloads barely report progress. For a 2.5GB file the bar updates about 9 times total --
0% → 1.7% → 5.6% → 11% → 25% → 84% → 100%. That 25% to 84% gap is ~1.5GB of nothing. Looks completely stuck.The cause:
xet_get()anddownload_bucket_files()use a 1-arg callback. xet-core supports a 2-arg signature that gives fine-grained network-level updates instead.Fix
make_xet_progress_callback(progress_bar, file_size)returns a 2-arg callback so xet-core gives us useful progress. Transfer bytes are scaled to file size so the bar tracks 0-100% correctly (xet dedup can make transfer size differ from file size).Wired into both
xet_get()anddownload_bucket_files(). No public API changes.Demo
Qwen3 4B (2.5 GB, xet-stored) with a custom
tqdm_classcallback. Builds on #4056 which fixes customtqdm_classin non-TTY environments.Before (bar appears stuck)
After (smooth progress)
Note
Both downloads finish in roughly the same time (~80s). The difference is purely in how progress is reported.
Multi-file: snapshot_download Qwen3-8B (5 xet shards, 16.4 GB)
Before (bar barely moves)
After (smooth per-file progress)
Related
tqdm_classsilently broken in non-TTY environments #4056 (fixes customtqdm_classhandling, orthogonal to this)Note
Low Risk
Low risk: changes are limited to progress reporting callbacks for Xet downloads, with new tests covering scaling/capping and edge cases. Main risk is minor regressions in progress bar updates or callback compatibility with
hf_xet.download_files.Overview
Improves progress reporting for Xet-powered downloads.
Adds
make_xet_progress_callbackto provide the 2-argument progress callback expected by xet-core for fine-grained network updates, while scaling transfer bytes to the expected file size (and capping at 100%) so tqdm bars advance smoothly and finish correctly.Wires the new callback into both
xet_get()andHfApi.download_bucket_files(), and adds focused tests validating the callback signature, scaling behavior (including dedup/transfer-total mismatch), and edge cases like zero increments and unknown file sizes.Reviewed by Cursor Bugbot for commit 6a8f8a3. Bugbot is set up for automated code reviews on this repo. Configure here.