Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 52 additions & 1 deletion docs/source/en/guides/buckets.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,20 @@ Use [`batch_bucket_files`] to upload files to a bucket. You can upload from loca
... )
```

You can also copy xet files from another bucket or repository using the `copy` parameter. This is a server-side
operation — no data is downloaded or re-uploaded:

```python
# Copy files by xet hash (source_repo_type, source_repo_id, xet_hash, destination)
>>> batch_bucket_files(
... "username/my-bucket",
... copy=[
... ("bucket", "username/source-bucket", "xet-hash-abc123", "models/model.safetensors"),
... ("model", "username/my-model", "xet-hash-def456", "models/config.safetensors"),
... ],
... )
```

You can also delete files while uploading others.

```python
Expand All @@ -360,7 +374,7 @@ You can also delete files while uploading others.
```

> [!WARNING]
> Calls to [`batch_bucket_files`] are non-transactional. If an error occurs during the process, some files may have been uploaded or deleted while others haven't.
> Calls to [`batch_bucket_files`] are non-transactional. If an error occurs during the process, some files may have been uploaded, copied, or deleted while others haven't.

### Upload a single file with the CLI

Expand Down Expand Up @@ -470,6 +484,43 @@ Use `hf buckets sync` to download all files from a bucket to a local directory:

See the [Sync directories](#sync-directories) section below for the full set of sync options.

## Copy files to Bucket

Use [`copy_files`] to copy files already hosted on the Hub to a Bucket:

```py
>>> from huggingface_hub import copy_files

# Bucket to bucket (same or different bucket)
>>> copy_files(
... "hf://buckets/username/source-bucket/checkpoints/model.safetensors",
... "hf://buckets/username/destination-bucket/archive/model.safetensors",
... )

# Repo to bucket
>>> copy_files(
... "hf://datasets/username/my-dataset/processed/",
... "hf://buckets/username/my-bucket/datasets/processed/",
... )
```

The same is available from the CLI:

```bash
# Bucket to bucket
>>> hf buckets cp hf://buckets/username/source-bucket/logs/ hf://buckets/username/destination-bucket/logs/

# Repo to bucket
>>> hf buckets cp hf://username/my-model/config.json hf://buckets/username/my-bucket/models/config.json
```

Notes:

- Folder copy requires destination to end with `/`.
- Bucket-to-repo copy is not yet supported.
- Files tracked with Xet (in buckets or repos) are copied server-side by hash — no data is downloaded or re-uploaded.
- Small text files not tracked with Xet on repo sources are downloaded and re-uploaded to the destination bucket.

## Sync directories

The `hf buckets sync` command (and its API equivalent [`sync_bucket`]) is the most powerful way to transfer files between a local directory and a bucket. It compares source and destination, and only transfers files that have changed.
Expand Down
18 changes: 17 additions & 1 deletion docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -673,7 +673,8 @@ To filter by prefix, append the prefix to the bucket path:

### Copy single files

Use `hf buckets cp` to copy individual files to and from a bucket. Bucket paths use the `hf://buckets/` prefix.
Use `hf buckets cp` to copy individual files to and from a bucket, or to copy any file hosted on the Hub to a Bucket.
Bucket paths use the `hf://buckets/` prefix.

To upload a file:

Expand Down Expand Up @@ -703,6 +704,21 @@ You can also stream to stdout or from stdin using `-`:
>>> echo "hello" | hf buckets cp - hf://buckets/username/my-bucket/hello.txt
```

To copy from a repo or a bucket on the Hub:

```bash
# Bucket to bucket
>>> hf buckets cp hf://buckets/username/source-bucket/logs/ hf://buckets/username/archive-bucket/logs/

# Repo to bucket
>>> hf buckets cp hf://datasets/username/my-dataset/data/train/ hf://buckets/username/my-bucket/datasets/train/
```

Notes:

- Folder copy requires destination to end with `/`.
- Bucket-to-repo copy is not supported.

### Sync directories

Use `hf buckets sync` to synchronize directories between your local machine and a bucket. It compares source and destination and transfers only changed files.
Expand Down
10 changes: 6 additions & 4 deletions docs/source/en/package_reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ $ hf buckets [OPTIONS] COMMAND [ARGS]...

**Commands**:

* `cp`: Copy a single file to or from a bucket.
* `cp`: Copy files to or from buckets.
* `create`: Create a new bucket.
* `delete`: Delete a bucket.
* `info`: Get info about a bucket.
Expand All @@ -219,7 +219,7 @@ $ hf buckets [OPTIONS] COMMAND [ARGS]...

### `hf buckets cp`

Copy a single file to or from a bucket.
Copy files to or from buckets.

**Usage**:

Expand All @@ -229,8 +229,8 @@ $ hf buckets cp [OPTIONS] SRC [DST]

**Arguments**:

* `SRC`: Source: local file, hf://buckets/... path, or - for stdin [required]
* `[DST]`: Destination: local path, hf://buckets/... path, or - for stdout
* `SRC`: Source: local file, HF handle (hf://...), or - for stdin [required]
* `[DST]`: Destination: local path, HF handle (hf://...), or - for stdout

**Options**:

Expand All @@ -247,6 +247,8 @@ Examples
$ hf buckets cp my-config.json hf://buckets/user/my-bucket/logs/
$ hf buckets cp my-config.json hf://buckets/user/my-bucket/remote-config.json
$ hf buckets cp - hf://buckets/user/my-bucket/config.json
$ hf buckets cp hf://buckets/user/my-bucket/logs/ hf://buckets/user/archive-bucket/logs/
$ hf buckets cp hf://datasets/user/my-dataset/processed/ hf://buckets/user/my-bucket/dataset/processed/

Learn more
Use `hf <command> --help` for more information about a command.
Expand Down
3 changes: 3 additions & 0 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@
"cancel_job",
"change_discussion_status",
"comment_discussion",
"copy_files",
"create_branch",
"create_bucket",
"create_collection",
Expand Down Expand Up @@ -903,6 +904,7 @@
"check_cli_update",
"close_session",
"comment_discussion",
"copy_files",
"create_branch",
"create_bucket",
"create_collection",
Expand Down Expand Up @@ -1327,6 +1329,7 @@ def __dir__():
cancel_job, # noqa: F401
change_discussion_status, # noqa: F401
comment_discussion, # noqa: F401
copy_files, # noqa: F401
create_branch, # noqa: F401
create_bucket, # noqa: F401
create_collection, # noqa: F401
Expand Down
24 changes: 21 additions & 3 deletions src/huggingface_hub/_buckets.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,27 @@ def __post_init__(self) -> None:
if self.content_type is None: # or default to destination path content type
self.content_type = mimetypes.guess_type(self.destination)[0]

self.mtime = int(
os.path.getmtime(self.source) * 1000 if not isinstance(self.source, bytes) else time.time() * 1000
)
self.mtime = int(time.time() * 1000)
if isinstance(self.source, str):
try:
self.mtime = int(os.path.getmtime(self.source) * 1000)
except FileNotFoundError:
pass


@dataclass
class _BucketCopyFile:
destination: str
xet_hash: str
source_repo_type: str # "model", "dataset", "space", "bucket"
source_repo_id: str
size: int | None = field(default=None)
mtime: int = field(init=False)
content_type: str | None = field(init=False)

def __post_init__(self) -> None:
self.content_type = mimetypes.guess_type(self.destination)[0]
self.mtime = int(time.time() * 1000)


@dataclass
Expand Down
33 changes: 27 additions & 6 deletions src/huggingface_hub/cli/buckets.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@
buckets_cli = typer_factory(help="Commands to interact with buckets.")


def _is_hf_handle(path: str) -> bool:
return path.startswith("hf://")


def _parse_bucket_argument(argument: str) -> tuple[str, str]:
"""Parse a bucket argument accepting both 'namespace/name(/prefix)' and 'hf://buckets/namespace/name(/prefix)'.

Expand Down Expand Up @@ -928,28 +932,45 @@ def sync(
"hf buckets cp my-config.json hf://buckets/user/my-bucket/logs/",
"hf buckets cp my-config.json hf://buckets/user/my-bucket/remote-config.json",
"hf buckets cp - hf://buckets/user/my-bucket/config.json",
"hf buckets cp hf://buckets/user/my-bucket/logs/ hf://buckets/user/archive-bucket/logs/",
"hf buckets cp hf://datasets/user/my-dataset/processed/ hf://buckets/user/my-bucket/dataset/processed/",
],
)
def cp(
src: Annotated[str, typer.Argument(help="Source: local file, hf://buckets/... path, or - for stdin")],
src: Annotated[str, typer.Argument(help="Source: local file, HF handle (hf://...), or - for stdin")],
dst: Annotated[
str | None, typer.Argument(help="Destination: local path, hf://buckets/... path, or - for stdout")
str | None, typer.Argument(help="Destination: local path, HF handle (hf://...), or - for stdout")
] = None,
quiet: QuietOpt = False,
token: TokenOpt = None,
) -> None:
"""Copy a single file to or from a bucket."""
"""Copy files to or from buckets."""
api = get_hf_api(token=token)

src_is_hf = _is_hf_handle(src)
dst_is_hf = dst is not None and _is_hf_handle(dst)
src_is_bucket = _is_bucket_path(src)
dst_is_bucket = dst is not None and _is_bucket_path(dst)
src_is_stdin = src == "-"
dst_is_stdout = dst == "-"

# --- Validation ---
if src_is_bucket and dst_is_bucket:
raise typer.BadParameter("Remote-to-remote copy not supported.")
# Remote to remote copy
if src_is_hf and dst_is_hf:
assert dst is not None
if quiet:
disable_progress_bars()
try:
api.copy_files(src, dst)
finally:
if quiet:
enable_progress_bars()

if not quiet:
print(f"Copied: {src} -> {dst}")
return

# Local to remote copy
# --- Validation ---
if not src_is_bucket and not dst_is_bucket and not src_is_stdin:
if dst is None:
raise typer.BadParameter("Missing destination. Provide a bucket path as DST.")
Expand Down
Loading
Loading