Skip to content

Add scripts+workflow to build and upload tarballs from artifacts#4448

Open
ScottTodd wants to merge 4 commits intoROCm:mainfrom
ScottTodd:multi-arch-build-tarballs
Open

Add scripts+workflow to build and upload tarballs from artifacts#4448
ScottTodd wants to merge 4 commits intoROCm:mainfrom
ScottTodd:multi-arch-build-tarballs

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd commented Apr 9, 2026

Motivation

We'd like to produce tarballs as part of multi-arch release pipelines. For context, see:

This will also enable building JAX packages as part of CI pipelines, see:

Technical Details

This downloads artifacts from a workflow run (current workflow run when included as part of CI/CD workflows, or a prior workflow for testing or repackaging) and then uploads them to an artifacts bucket (e.g. therock-dev-artifacts). Release workflows (to be added) can then choose to copy these tarballs to a tarballs bucket (e.g. therock-dev-tarball).

Important

The workflow is not yet integrated into any workflows via workflow_call. It is only run manually via workflow_dispatch.

Tarball files use substantial storage (2GB+ per tarball), so I'd like to only include this for release builds and opt-in for PRs that want to build JAX -- at least until KPACK_SPLIT_ARTIFACTS is flipped and we can produce a single "multiarch" tarball instead of separate tarballs per family.

Behavior with and without KPACK_SPLIT_ARTIFACTS

In this initial implementation,

Condition Behavior
KPACK_SPLIT_ARTIFACTS disabled Creates a single tarball per GPU family
KPACK_SPLIT_ARTIFACTS enabled Creates a single tarball per GPU target and a "multiarch" tarball with all GPU targets

We may later want to also produce tarballs without including test artifacts, produce larger groups independent of the current families like "all Radeon GPU targets", etc. All of that is just changes to the filtering and repackaging.

Downloading and extracting

This implementation runs a loop around:

python build_tools/artifact_manager.py fetch \
    --stage=all \  # artifacts from all stages (foundation,math-libs,etc.), all components (lib,doc,test,etc.)
    --amdgpu-families=${families_str} \  # filter to a single family
    --output-dir=${output_dir} \
    --flatten \  # extract and flatten into "dist" directory in one command
    --download-cache-dir=${download_cache_dir}  # reuse generic artifacts downloaded by prior calls

This has the advantage of being easy to reproduce outside of the script and reusing cached downloaded artifacts for local debugging and CI efficiency. We also considered fetching and not flattening, then using artifacts.py::ArtifactCatalog to repackage as build_python_packages.py does (using py_packaging.py), but this is simpler.

Compression

This implementation produces .tar.gz, matching existing tarball releases. Compression would be faster and more efficient using .tar.zst. I ran some benchmarks on my Windows dev machine:

Expand for benchmark results

Method Time (s) Size (MB) Ratio
tar-cfz 21.0 419.4 29.5% <- current default
gz-1 12.2 449.8 31.6%
gz-3 15.2 440.5 31.0%
gz-6 26.4 420.9 29.6%
gz-9 67.9 420.2 29.5%
zst-1 3.3 420.2 29.5% <- matches gz-6 ratio, 6x faster
zst-3 4.4 360.5 25.3% <- sweet spot
zst-6 8.0 343.9 24.2%
zst-9 10.0 317.9 22.3%
zst-19 197.9 199.4 14.0%

I did wrap compression in a ProcessPoolExecutor since parallel compression does make efficient use of CPU cores, sample benchmarks showing speedup (so not oversubscribed):

Expand for benchmark results

Workers Wall (s) Avg/job Speedup Efficiency
1 244.2 24.4 1.0x 103%
2 128.3 25.6 2.0x 98%
4 79.4 26.6 3.2x 79%
6 54.4 27.2 4.6x 77%
8 54.0 27.6 4.7x 58%
10 28.8 28.6 8.8x 88%

Test Plan

  • New unit tests for some logic
  • Tested locally with artifacts from prior workflow runs with and without KPACK_SPLIT_ARTIFACTS, artifacts were downloaded, packaged into the expected tarballs, and "uploaded" to a staging directory
  • Trigger the new workflow on my fork, check that the workflow succeeds (except for upload, missing credentials)

Test Result

Without KPACK_SPLIT_ARTIFACTS: https://github.com/ScottTodd/TheRock/actions/runs/24205988455/job/70661826987

Building tarballs for 2 families: gfx1151, gfx110X-all
  Platform: linux
  Version: 7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3
  Output: /home/runner/work/TheRock/TheRock/tarballs
...
Done. Tarballs in /home/runner/work/TheRock/TheRock/tarballs:
  therock-dist-linux-gfx110X-all-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2711.5 MB)
  therock-dist-linux-gfx1151-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2820.1 MB)
...
[INFO] Uploading to s3://therock-ci-artifacts-external/ScottTodd-TheRock/24205988455-linux/tarballs

With KPACK_SPLIT_ARTIFACTS: https://github.com/ScottTodd/TheRock/actions/runs/24217435275/job/70701188683

Building tarballs for 2 families: gfx1151, gfx1100
  Platform: linux
  Version: 7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3
  Output: /home/runner/work/TheRock/TheRock/tarballs
...
Done. Tarballs in /home/runner/work/TheRock/TheRock/tarballs:
  therock-dist-linux-gfx1100-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2891.5 MB)
  therock-dist-linux-gfx1151-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (2907.4 MB)
  therock-dist-linux-multiarch-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz (3085.2 MB)
...
[INFO] Uploading to s3://therock-ci-artifacts-external/ScottTodd-TheRock/24217435275-linux/tarballs

Submission Checklist

@ScottTodd
Copy link
Copy Markdown
Member Author

@ScottTodd ScottTodd requested a review from erman-gurses April 10, 2026 01:26
Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, added one concern - will do one more pass tomorrow.

ScottTodd and others added 4 commits April 10, 2026 10:19
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The upload path includes the platform ({run_id}-{platform}/tarballs/),
so the script needs to know the target platform rather than
auto-detecting from the current system. This matters when building
Windows tarballs on a Linux runner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ScottTodd ScottTodd force-pushed the multi-arch-build-tarballs branch from 54e29a9 to a763196 Compare April 10, 2026 17:28
@ScottTodd ScottTodd marked this pull request as ready for review April 10, 2026 17:43
@ScottTodd ScottTodd requested a review from erman-gurses April 10, 2026 17:43
Comment on lines +90 to +91
f"--amdgpu-families={families_str}",
"--expand-family-to-targets",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre this from #4449 is working as expected now, it expands gfx110X-all to gfx1100, gfx1101, gfx1102, gfx1103:

https://github.com/ScottTodd/TheRock/actions/runs/24255576558/job/70826158778

  python build_tools/build_tarballs.py \
    --run-id="24187929660" \
    --run-github-repo="ROCm/TheRock" \
    --dist-amdgpu-families="gfx110X-all;gfx1151" \

  ++ Downloading prim_test_gfx1100.tar.zst
  ++ Downloading prim_test_gfx1101.tar.zst
  ++ Downloading prim_test_gfx1102.tar.zst
  ++ Downloading prim_test_gfx1103.tar.zst

Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants