Skip to content

Add experimental bucket+mount transport for Jobs script upload#4025

Open
davanstrien wants to merge 4 commits intomainfrom
feat/bucket-transport-jobs
Open

Add experimental bucket+mount transport for Jobs script upload#4025
davanstrien wants to merge 4 commits intomainfrom
feat/bucket-transport-jobs

Conversation

@davanstrien
Copy link
Copy Markdown
Member

@davanstrien davanstrien commented Apr 2, 2026

Motivation

This is the first step in broader work to integrate buckets with Jobs, with follow-up PRs to come as the design iterates (cleanup, --data/--data-out flags, configurable bucket ID, etc.).

The current base64 transport works but requires a slightly fragile bash -c + xargs + base64 -d decoder pipeline with manual shell quoting. The bucket path produces a plain uv run /artifacts/script.py command — simpler and easier to debug. It also lets jobs write output artifacts back to /artifacts/, which base64 can't do.

Summary

  • Opt-in via HF_JOBS_USE_BUCKET_TRANSPORT=1 environment variable
  • Scripts uploaded to a {namespace}/jobs-artifacts bucket under _scripts/{uuid}/, mounted at /artifacts
  • Falls back to base64 if bucket creation fails, hf_xet is unavailable, or /artifacts is already taken by a user volume
  • Existing base64 path is completely untouched

Open questions

  • Script subfolder naming: Currently uses a UUID per job. Alternatives: timestamp-based, job-ID-based (not available pre-submission), or fixed path (breaks concurrent jobs).
  • Cleanup: Scripts are tiny (KB each) so accumulation isn't a real storage concern, but worth considering for a follow-up.
  • Configurable bucket: Deferred — currently hardcoded to jobs-artifacts.

Happy to iterate on the patterns here — this is a draft to get feedback on the direction.

Test plan

  • 6 new unit tests: happy path, fallback on failure, xet unavailable, mount path collision, default (no opt-in), multiple files
  • All existing tests pass (225/225 in test_cli.py)
  • Smoke tested end-to-end: bucket created, script uploaded, job ran from /artifacts/

Note

Medium Risk
Adds a new opt-in Jobs execution path that creates/uses buckets and mounts volumes, introducing new network/storage side effects and fallback logic that could affect job submission when enabled.

Overview
Adds an experimental, opt-in Jobs script transport controlled by HF_JOBS_USE_BUCKET_TRANSPORT, uploading local uv job scripts/configs to a {namespace}/jobs-artifacts bucket and mounting it at /artifacts so jobs run via a plain uv run command.

Updates run_uv_job/create_scheduled_uv_job to accept and merge any returned extra bucket Volumes, and extends _create_uv_command_env_and_secrets to choose between bucket transport and the existing base64 bash -c transport with explicit fallbacks (bucket upload failure, missing hf_xet, or /artifacts mount collision). Adds unit tests covering the new bucket path and fallback scenarios.

Written by Cursor Bugbot for commit 93e0070. This will update automatically on new commits. Configure here.

davanstrien and others added 2 commits April 2, 2026 08:40
When `HF_JOBS_USE_BUCKET_TRANSPORT=1` is set, `run_uv_job` and
`create_scheduled_uv_job` upload local scripts to a
`{namespace}/jobs-artifacts` bucket instead of base64-encoding them
into an environment variable. The bucket is mounted at `/artifacts`
and the job runs scripts directly from disk.

Falls back to base64 transport if bucket creation fails, hf_xet is
unavailable, or the `/artifacts` mount path is already taken by a
user-provided volume. The existing base64 path is unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davanstrien
Copy link
Copy Markdown
Member Author

cc @Wauplin @lhoestq

@bot-ci-comment
Copy link
Copy Markdown

bot-ci-comment bot commented Apr 2, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

Comment on lines +11923 to +11925
# Bucket transport constants for Jobs
_HF_JOBS_ARTIFACTS_MOUNT_PATH = "/artifacts"
_HF_JOBS_ARTIFACTS_BUCKET_NAME = "jobs-artifacts"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good names !

HF_XET_HIGH_PERFORMANCE: bool = _is_true(os.environ.get("HF_XET_HIGH_PERFORMANCE"))

# Opt-in to bucket-based script transport for Jobs (experimental)
HF_JOBS_USE_BUCKET_TRANSPORT: bool = _is_true(os.environ.get("HF_JOBS_USE_BUCKET_TRANSPORT"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we can already set it to True by default

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably makes sense indeed! I'll wait to see what @Wauplin thinks

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Follows lhoestq's review suggestion in #4025.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants