Fix batch invariant mode: using NCCL tree based all-reduce by wwwjn · Pull Request #2994 · pytorch/torchtitan

wwwjn · 2026-04-16T05:03:49Z

As titled. Previously we used ring based algorithm which is not deterministic

Align torchtitan's batch-invariant NCCL settings with vLLM's (vllm/model_executor/layers/batch_invariant.py) to achieve bitwise identity between trainer and generator. Key changes: - NCCL_ALGO: Ring -> allreduce:tree - Add NCCL_LAUNCH_MODE=GROUP, NCCL_P2P_NET_DISABLE=1, NCCL_NTHREADS=1, NCCL_SOCKET_NTHREADS=1

tianyu-l

shall we error out if SP + batch invariance are used together?

SP uses reduce-scatter which only supports Ring algorithm in NCCL. Unlike allreduce (pinned to tree for cross-node determinism), reduce-scatter's Ring has not been validated for cross-node bitwise determinism. Disable SP in the batch-invariant config and error out if both are enabled together.

acisseJZhong · 2026-04-16T22:23:51Z

-    os.environ["NCCL_MIN_NCHANNELS"] = "1"  # Single channel to avoid split interleaving
-    os.environ["NCCL_MAX_NCHANNELS"] = "1"
-    os.environ["NCCL_PROTO"] = "Simple"  # LL/LL128 may reorder reductions
+    # Reference: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/batch_invariant.py


for batch inv, do we need to set s.environ["VLLM_ALLREDUCE_USE_SYMM_MEM"] = "0"? https://github.com/vllm-project/vllm/blob/219bb5b8c0dcc6a5d5f894e9168fa5b8c2f8255a/vllm/model_executor/layers/batch_invariant.py#L1031

No, this is controlling vllm's all-reduce kernel, which we are not relying on

acisseJZhong · 2026-04-16T22:24:26Z

 When enabled, batch-invariant mode will:
 - Replaces `mm`, `addmm`, `log_softmax`, and `mean.dim` with Triton kernels that use a fixed tile iteration order (via [batch_invariant_ops](https://github.com/thinking-machines-lab/batch_invariant_ops))
- Forces NCCL to use Ring all-reduce with a single channel for deterministic inter-GPU collectives
+- Forces deterministic NCCL collectives (single channel, simple protocol, tree allreduce) matching vLLM's settings


curious previously why we see Ring also deterministic?

Because previously we only tested on TP=2 with smaller sequence length, within a single node. For larger size test, eg, cross node test, Tree algorithm is deterministic to best of my knowledge

joecummings · 2026-04-16T22:30:34Z

+                if self.trainer.parallelism.enable_sequence_parallel:
+                    raise ValueError(
+                        "batch_invariant mode doesn't support SP now. "
+                        "SP uses reduce-scatter which only supports Ring in NCCL "


Doesn't FSDP also need reduce-scatter?

For bit-wise identity check, we comparing forward results between trainer and generator. When FSDP is enabled, the RS is applied during backward to sync gradients, which won't affect forward results

batch variant mode is about forward only, anything bwd is not required

inference is fwd only

there is no hope to achieve batch invariant for backward because the batch dim will be reduced too

wwwjn added 2 commits April 15, 2026 15:13

fix CI

fcb0eee

wwwjn requested review from fegin, tianyu-l and wconstab as code owners April 16, 2026 05:03

pytorch-bot Bot added the ciflow/8gpu label Apr 16, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 16, 2026

tianyu-l reviewed Apr 16, 2026

View reviewed changes

wwwjn added 3 commits April 16, 2026 14:58

update

962afd7

update

a4f3ec4

acisseJZhong reviewed Apr 16, 2026

View reviewed changes

joecummings reviewed Apr 16, 2026

View reviewed changes

tianyu-l approved these changes Apr 16, 2026

View reviewed changes

Merge branch 'main' into fix-batch-invariant

596f91f

wwwjn merged commit 4984c14 into main Apr 21, 2026
13 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix batch invariant mode: using NCCL tree based all-reduce#2994

Fix batch invariant mode: using NCCL tree based all-reduce#2994
wwwjn merged 6 commits intomainfrom
fix-batch-invariant

wwwjn commented Apr 16, 2026

Uh oh!

tianyu-l left a comment

Uh oh!

acisseJZhong Apr 16, 2026

Uh oh!

wwwjn Apr 16, 2026

Uh oh!

acisseJZhong Apr 16, 2026

Uh oh!

wwwjn Apr 16, 2026

Uh oh!

joecummings Apr 16, 2026

Uh oh!

wwwjn Apr 16, 2026

Uh oh!

tianyu-l Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wwwjn commented Apr 16, 2026

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

acisseJZhong Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

wwwjn Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

acisseJZhong Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

wwwjn Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

joecummings Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

wwwjn Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants