Add portable Docker build workflow using public TheRock wheels#3149
Add portable Docker build workflow using public TheRock wheels#3149ethanwee1 wants to merge 28 commits intoROCm:developfrom
Conversation
Bring in the Dockerfile, install scripts, and workflow from rocm-npi-dev adapted for the public ROCm nightlies index at rocm.nightlies.amd.com. Pushes images to rocmshared/pytorch-private on DockerHub.
Adds a temporary push trigger scoped to add-portable-docker-build so the workflow can be tested on a fork before merge. Also resolves all inputs through a defaults step so push-triggered runs use sensible values (gfx94X-dcgpu, ROCm 7.13.0a20260413, Python 3.12).
Public nightlies use git-hash local versions for triton (e.g. 3.7.0+git20a46016.rocm7.13.0a20260413) that pip cannot match via ==. Let torch's dependency resolver pull the correct triton instead.
Runs test_binary_ufuncs (CPU-only) inside the built image to validate the PyTorch installation before pushing to Docker Hub.
|
Jenkins build for 0d1061bf0ad4f40123d7f76ece2d4f72969221f0 commit finished as NOT_BUILT |
The previous version piped through tail which swallowed the exit code, and pytest was not installed in the image.
…rsion
Tag format: pytorch-{branch}-{commit}-rocm{version}-{os}-py{python}-{gfx}
If the requested PYTHON_VERSION is not the system default (3.12 on Ubuntu 24.04), install it from the deadsnakes PPA before creating the venv.
|
Jenkins build for 0d1061bf0ad4f40123d7f76ece2d4f72969221f0 commit finished as NOT_BUILT |
|
Jenkins build for 3360f8d4c3252d9e9d9b7ee667f040b7e41f6666 commit finished as NOT_BUILT |
- Nightly schedule (daily 06:00 UTC) builds 4 images: pytorch/pytorch main (nightly), ROCm/pytorch release/2.11, 2.10, 2.9 - ROCm version is no longer hardcoded; discovered from the latest available torch wheel on the staging index for each matrix entry - Manual dispatch also auto-discovers ROCm version when left empty - Added --torch-version-prefix support to install_pytorch_wheels.py for filtering wheels by major.minor version - Dockerfile supports Python 3.10-3.14 via deadsnakes PPA
- schedule (cron) → 4-build nightly matrix - push → single build with defaults - workflow_dispatch → single build with user inputs
- Added pytorch_repo input (default: pytorch/pytorch) - Changed pytorch_branch default from develop to main - Nightly matrix unchanged (already uses correct repos per entry)
- Repo and branch first, followed by python, gfx, rocm, torch prefix, index URL - Removed rarely-used inputs (exact versions, base_image) that auto-discovery handles
- Removed torch_version_prefix input - Prefix is now extracted from the branch: release/2.11 → "2.11", nightly → empty - Applies to both nightly matrix and manual dispatch
|
Jenkins build for 14ce620d192c6a461ca44d38410f1b8cf1d78dc5 commit finished as NOT_BUILT |
The key function was only comparing the base torch version (e.g. 2.9.1) and ignoring the ROCm version after +. This caused it to pick old 2025 wheels over latest 2026 ones when the base versions were identical.
|
Jenkins build for 4b63c1c933dfad75aae068aa83ba98144bb28373 commit finished as NOT_BUILT |
…ABI mismatches torch import is required; torchaudio, torchvision, and triton are best-effort — logged but won't fail the build if there's a version mismatch on older release branches.
In auto-discovery mode, install torch and rocm[devel] with exact versions first, then install torchaudio/torchvision/triton unpinned so pip picks versions compatible with the installed torch.
|
Jenkins build for 4b63c1c933dfad75aae068aa83ba98144bb28373 commit finished as FAILURE |
d70128b to
4b63c1c
Compare
Copied from https://github.com/AMD-ROCm-Internal/rocm-npi-dev/actions/workflows/build_portable_linux_pytorch_dockers.yml
Latest run and docker generated: docker.io/rocm/pytorch-private:pytorch-nightly-f8d08404-rocm7.13.0a20260413-ubuntu24.04-py3.12-gfx950-dcgpu
https://github.com/ethanwee1/pytorch/actions/runs/24362482659