Add support to multi-plane NICs and multi-NIC per PE to DeepEP v1 by aahouzi · Pull Request #650 · deepseek-ai/DeepEP

aahouzi · 2026-05-28T16:18:01Z

Description

On multi-plane NICs (e.g. CX-8 dual-plane) or multi-NIC per PE, enabling NVSHMEM's NVSHMEM_IBGDA_ENABLE_MULTI_PORT=1 was not enough to actually distribute traffic across planes in DeepEP, and reach full line rate of CX8, even though shmem_put_bw perftest was able to reach line rate of CX8 NICs.

After some investigation, it seems that the DeepEP v1 kernels only addressed the first half of QP pool, so all RDMA traffic stayed on a single plane even when NVSHMEM had correctly allocated QPs on all available planes. So we end up with this scenario on a DGX B300 NVL8 + CX8 dual-plane cluster where only plane p0 of each of the 8 NICs is being utilized by DeepEP:

Solution

This PR fixes the QP selection logic in both legacy DeepEP v1 kernels (Internode + LL), so traffic is distributed equally across all available planes/NICs per PE. Also, the PR should work well for multi-NIC per PE scenarios.

With multi-port enabled, NVSHMEM allocates num_rc_per_pe × num_devs QPs per peer, laid out as [NIC 0 QPs | NIC 1 QPs | ... | NIC N-1 QPs]. The new QP selection logic:

Internode: Different channels bind to different NICs for their whole lifetime, distributing the traffic across all NICs assigned per PE, and also preserving RC ordering.
LL: Binding the QP choice to expert_local_idx, and since different warp groups have different expert indices, traffic ends up spread across all NICs assigned per PE naturally. This requires num_local_experts >= num_devs since with fewer local experts than NICs, some NICs stay unused.

The fix also is fully backward compatible, so if a user doesn't provide NVSHMEM_IBGDA_ENABLE_MULTI_PORT=1, the qp_id reverts back to the original value it had for 1 NIC per PE.

Also, users are not required to provide any NIC to PE mapping via NVSHMEM_HCA_PE_MAPPING, as the default PCI-path of NVSHMEM can correctly detect when 2 NICs are closer to the same PE and assign it correctly whenever the user provides NVSHMEM_IBGDA_ENABLE_MULTI_PORT=1.

Results

Setup: DGX B300 NVL8 with CX-8 dual-plane
Internode test for 2N and 8N:

LL test for 4N and 8N across num_local_experts={4,8}

NIC counters during the run for all planes are fully utilized and balanced:

Requirements

NVSHMEM ≥ 3.7, which is now released
Multi-port feature enabled via: NVSHMEM_IBGDA_ENABLE_MULTI_PORT=1
This PR builds on top of #288 to actually distribute traffic across the num_rc_per_pe × num_devs QP pool.

aahouzi and others added 2 commits May 25, 2026 17:45

Add support for multi-plane NICs / multi-NIC per PE

20aa88b

Merge branch 'deepseek-ai:main' into multi-nic-fix

532d7e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support to multi-plane NICs and multi-NIC per PE to DeepEP v1#650

Add support to multi-plane NICs and multi-NIC per PE to DeepEP v1#650
aahouzi wants to merge 2 commits into
deepseek-ai:mainfrom
aahouzi:multi-nic-fix

aahouzi commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aahouzi commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Results

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aahouzi commented May 28, 2026 •

edited

Loading