Skip to content

Pass --gpu-targets to artifact splitter#4464

Draft
BrianHarrisonAMD wants to merge 1 commit intomainfrom
users/bharriso/gpu-targets-split-arg
Draft

Pass --gpu-targets to artifact splitter#4464
BrianHarrisonAMD wants to merge 1 commit intomainfrom
users/bharriso/gpu-targets-split-arg

Conversation

@BrianHarrisonAMD
Copy link
Copy Markdown
Contributor

@BrianHarrisonAMD BrianHarrisonAMD commented Apr 10, 2026

Summary

  • Passes THEROCK_AMDGPU_TARGETS to the artifact splitter via --gpu-targets so only per-arch artifacts for the current build's GPU targets are created

Problem

When multiple CI arch jobs (gfx94X, gfx110X, gfx120X) build MIOpen, the splitter creates per-arch artifacts for ALL architectures found in database filenames. All jobs upload to the same flat S3 prefix, so a later arch job's miopen_lib_gfx942.tar.zst (containing only db files) overwrites the earlier gfx94X job's version (which had both CK .so files AND db files).

Changes

Single change in cmake/therock_artifacts.cmake: append --gpu-targets ${THEROCK_AMDGPU_TARGETS} to the split command args, guarded by the THEROCK_AMDGPU_TARGETS-NOTFOUND sentinel.

Merge ordering

This PR must land AFTER ROCm/rocm-systems#4916 is merged and rocm-systems is bumped in TheRock. The --gpu-targets flag is added to split_artifacts.py in that PR. Without it, argparse will reject the unknown argument and the split command will fail.

Dependencies

Test plan

  • Dry-run CI with TheRock branch users/bharriso/dry-run-miopen-multi-arch (which includes both this change and the rocm-systems bump)

🤖 Generated with Claude Code

Pass THEROCK_AMDGPU_TARGETS to the artifact splitter so only per-arch
artifacts matching the current build's GPU targets are created. This
prevents later CI arch jobs from overwriting earlier jobs' per-arch
artifacts with versions that lack compiled binaries.

Requires rocm-systems ROCm/rocm-systems#4916 for the --gpu-targets
CLI flag. The flag is silently ignored by older rocm-systems versions
that don't recognize it (argparse will error, so this should land
after the rocm-systems change).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@BrianHarrisonAMD BrianHarrisonAMD self-assigned this Apr 10, 2026
@BrianHarrisonAMD BrianHarrisonAMD marked this pull request as ready for review April 10, 2026 16:23
@BrianHarrisonAMD BrianHarrisonAMD marked this pull request as draft April 10, 2026 16:26
@BrianHarrisonAMD
Copy link
Copy Markdown
Contributor Author

Leaving as a draft until this can actually land.
Waiting on rocm-systems PR to be merged, and bump to happen.

Copy link
Copy Markdown
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I don't think other components have had this kind of overlap/race before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants