Centralize hf:// URI parsing into utils/_hf_uri.py #3994
Centralize hf:// URI parsing into utils/_hf_uri.py #3994omkar-334 wants to merge 8 commits intohuggingface:mainfrom
hf:// URI parsing into utils/_hf_uri.py #3994Conversation
| else: | ||
| # First segment isn't a known type -> model type | ||
| vol_type_str = constants.REPO_TYPE_MODEL | ||
| remaining = source_part |
There was a problem hiding this comment.
Volume @revision silently dropped during parsing
High Severity
The Volume dataclass has a revision field, but the refactored _parse_volumes never populates it. The old code preserved @revision as part of the source string (e.g., source="user/repo@main"). Now parse_hf_url extracts the revision into parsed.revision and strips it from repo_id, but parsed.revision is never passed to the Volume constructor. For any volume spec using @revision syntax, the revision is silently lost.
Additional Locations (1)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| if not is_hf_url and len(url_segments) == 3: | ||
| # Passed <repo_type>/<user>/<model_id> — accept singular type names | ||
| # (e.g. "dataset/user/id") which parse_hf_url doesn't handle. | ||
| repo_type, namespace, repo_id = url_segments |
There was a problem hiding this comment.
Bucket paths with 3 segments fail validation
High Severity
When repo_type_and_id_from_hf_id receives a 3-segment non-URL bucket path like "hf://buckets/ns/name", the code at the len(url_segments) == 3 branch does a simple repo_type, namespace, repo_id = url_segments, setting repo_type to "buckets" (plural). The downstream validation checks repo_type != "bucket" (singular), so "buckets" fails and raises a ValueError. The old code had explicit elif url_segments[0] == "buckets" handling that correctly set repo_type = "bucket".


Related issue - #3971
Note
Medium Risk
Touches core path/URI parsing used by
HfFileSystem, buckets, CLI jobs volumes, andrepo_type_and_id_from_hf_id, so subtle parsing/regression risks exist around revisions, bucket IDs, and ambiguous repo paths.Overview
Centralizes
hf://and HF identifier parsing by addingutils/_hf_uri.pywith a sharedparse_hf_urlimplementation (including bucket support and specialrefs/...revision handling).Updates buckets helpers,
HfFileSystem.resolve_path, CLIhf jobsvolume parsing, andhf_api.repo_type_and_id_from_hf_idto delegate to the new parser, removing duplicated split/regex logic and adjusting a few edge cases (explicit-type detection, bare bucket name handling, and mount-spec parsing around:/).Adds
tests/test_hf_uri.pyto cover repo, bucket, revision, and error cases.Written by Cursor Bugbot for commit 4c080e6. This will update automatically on new commits. Configure here.