Build RCCL in a single generuc stage for all Instinct/CDNA targets#4450
Draft
Build RCCL in a single generuc stage for all Instinct/CDNA targets#4450
Conversation
RCCL's host code can vary depending on which GPU targets are compiled in. Building per-arch shards risks producing inconsistent host code across shards. This switches to build RCCL with USE_DIST_AMDGPU_TARGETS in a single comm-libs CI stage, yielding a single consistent artifact. Changes: - Add USE_DIST_AMDGPU_TARGETS to therock_cmake_subproject_declare(rccl) - Add TARGET_NEUTRAL to therock_provide_artifact(rccl), producing a single rccl_dev_generic.tar.xz artifact - Change artifact_groups.comm-libs type from per-arch to generic - Remove type = per-arch from build_stages.comm-libs - Replace the matrix CI job with a single generic job The [artifacts.rccl] entry in BUILD_TOPOLOGY.toml intentionally remains target-specific so the kpack splitter can later split the monolithic artifact by architecture. Co-Authored-By: Claude <noreply@anthropic.com>
rccl is a multi-GPU collective comms library requiring high-bandwidth GPU-to-GPU interconnects (xGMI on Instinct/CDNA). With USE_DIST_AMDGPU_TARGETS, building for all dist families spawns one device-link lld invocation per arch at O3 LTO on ~160 bitcode objects, pushing CI toward the 2-hour timeout. Add restrict_dist_families_regex to ArtifactGroup in BUILD_TOPOLOGY.toml. configure_stage.py reads this and filters dist_amdgpu_families before generating THEROCK_DIST_AMDGPU_FAMILIES, limiting comm-libs to Instinct families (dcgpu|^gfx9). Also add exclude_family to the rccl test configuration to prevent test failures when RDNA runners gain full test coverage. Co-Authored-By: Claude <noreply@anthropic.com>
7b08402 to
5e84d5f
Compare
rccl-tests is bundled into the same artifact as rccl. Using USE_TEST_AMDGPU_TARGETS would build rccl-tests for all available targets, producing an artifact inconsistent with rccl which only contains Instinct/CDNA device code. USE_DIST_AMDGPU_TARGETS keeps both in sync and ensures the restrict_dist_families_regex on the comm-libs artifact group applies to rccl-tests as well. Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
RCCL's host code can vary depending on which GPU targets are compiled in. Building per-arch shards risks producing inconsistent host code across shards. This switches to build RCCL with USE_DIST_AMDGPU_TARGETS in a single comm-libs CI stage, yielding a single consistent artifact.
Building for all dist families in one pass would spawn one device-link lld invocation per arch, pushing the multi-arch CI stage close to the 2 hour timeout. The comm-libs stage is therefore restricted to Instinct/CDNA families only.
Technical Details
restrict_dist_families_regex = "dcgpu|^gfx9"to filter dist families to Instinct/CDNA onlyexclude_familyto the rccl test configuration to prevent test failures if RDNA runners gainfurther test coverage in the future
The [artifacts.rccl] entry in BUILD_TOPOLOGY.toml intentionally remains target-specific so the kpack splitter can later split the monolithic artifact by architecture.
Test Plan
CI run on PR.
Test Result
Pending.
Submission Checklist