Skip to content

Add portable Docker build workflow using public TheRock wheels#3149

Open
ethanwee1 wants to merge 28 commits intoROCm:developfrom
ethanwee1:add-portable-docker-build
Open

Add portable Docker build workflow using public TheRock wheels#3149
ethanwee1 wants to merge 28 commits intoROCm:developfrom
ethanwee1:add-portable-docker-build

Conversation

@ethanwee1
Copy link
Copy Markdown

@ethanwee1 ethanwee1 commented Apr 13, 2026

Copied from https://github.com/AMD-ROCm-Internal/rocm-npi-dev/actions/workflows/build_portable_linux_pytorch_dockers.yml

Latest run and docker generated: docker.io/rocm/pytorch-private:pytorch-nightly-f8d08404-rocm7.13.0a20260413-ubuntu24.04-py3.12-gfx950-dcgpu
https://github.com/ethanwee1/pytorch/actions/runs/24362482659

Bring in the Dockerfile, install scripts, and workflow from rocm-npi-dev
adapted for the public ROCm nightlies index at rocm.nightlies.amd.com.
Pushes images to rocmshared/pytorch-private on DockerHub.
Adds a temporary push trigger scoped to add-portable-docker-build so
the workflow can be tested on a fork before merge. Also resolves all
inputs through a defaults step so push-triggered runs use sensible
values (gfx94X-dcgpu, ROCm 7.13.0a20260413, Python 3.12).
Public nightlies use git-hash local versions for triton (e.g.
3.7.0+git20a46016.rocm7.13.0a20260413) that pip cannot match via ==.
Let torch's dependency resolver pull the correct triton instead.
Runs test_binary_ufuncs (CPU-only) inside the built image to validate
the PyTorch installation before pushing to Docker Hub.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 13, 2026

Jenkins build for 0d1061bf0ad4f40123d7f76ece2d4f72969221f0 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

The previous version piped through tail which swallowed the exit code,
and pytest was not installed in the image.
…rsion

Tag format: pytorch-{branch}-{commit}-rocm{version}-{os}-py{python}-{gfx}
If the requested PYTHON_VERSION is not the system default (3.12 on
Ubuntu 24.04), install it from the deadsnakes PPA before creating
the venv.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 13, 2026

Jenkins build for 0d1061bf0ad4f40123d7f76ece2d4f72969221f0 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 13, 2026

Jenkins build for 3360f8d4c3252d9e9d9b7ee667f040b7e41f6666 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

- Nightly schedule (daily 06:00 UTC) builds 4 images: pytorch/pytorch
  main (nightly), ROCm/pytorch release/2.11, 2.10, 2.9
- ROCm version is no longer hardcoded; discovered from the latest
  available torch wheel on the staging index for each matrix entry
- Manual dispatch also auto-discovers ROCm version when left empty
- Added --torch-version-prefix support to install_pytorch_wheels.py
  for filtering wheels by major.minor version
- Dockerfile supports Python 3.10-3.14 via deadsnakes PPA
- schedule (cron) → 4-build nightly matrix
- push → single build with defaults
- workflow_dispatch → single build with user inputs
- Added pytorch_repo input (default: pytorch/pytorch)
- Changed pytorch_branch default from develop to main
- Nightly matrix unchanged (already uses correct repos per entry)
- Repo and branch first, followed by python, gfx, rocm, torch prefix, index URL
- Removed rarely-used inputs (exact versions, base_image) that auto-discovery handles
- Removed torch_version_prefix input
- Prefix is now extracted from the branch: release/2.11 → "2.11", nightly → empty
- Applies to both nightly matrix and manual dispatch
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 13, 2026

Jenkins build for 14ce620d192c6a461ca44d38410f1b8cf1d78dc5 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

The key function was only comparing the base torch version (e.g. 2.9.1)
and ignoring the ROCm version after +. This caused it to pick old 2025
wheels over latest 2026 ones when the base versions were identical.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 13, 2026

Jenkins build for 4b63c1c933dfad75aae068aa83ba98144bb28373 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

…ABI mismatches

torch import is required; torchaudio, torchvision, and triton are
best-effort — logged but won't fail the build if there's a version
mismatch on older release branches.
In auto-discovery mode, install torch and rocm[devel] with exact
versions first, then install torchaudio/torchvision/triton unpinned
so pip picks versions compatible with the installed torch.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 13, 2026

Jenkins build for 4b63c1c933dfad75aae068aa83ba98144bb28373 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@ethanwee1 ethanwee1 marked this pull request as ready for review April 14, 2026 14:27
@ethanwee1 ethanwee1 force-pushed the add-portable-docker-build branch from d70128b to 4b63c1c Compare April 14, 2026 15:57
@pruthvistony pruthvistony requested a review from WBobby April 14, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant