Fix DTensor attr handling in make_fx tracer by tugsbayasgalan · Pull Request #2998 · pytorch/torchtitan

tugsbayasgalan · 2026-04-16T23:10:55Z

Fix graph_trainer make_fx tracing for DTensor-backed module state by avoiding non-tensor DTensor attrs in the
traced graph inputs.

Previously, our subclass unwrap logic flattened every attribute returned by tensor_flatten(). For DTensor
this included non-tensor attrs like device_mesh, which leaked into the graph signature as untyped placeholders.

bobrenjc93 · 2026-04-17T02:09:48Z

@tugsbayasgalan for CooR we do need to have device meshes as inputs into the graph. The way CooR works is we hoist device meshes as inputs to the graph and then use custom ops to extract process groups such that the graph is the same across all ranks but at runtime different ranks use different process groups. I wonder if a more principled approach would be to have a DCE pass?

cc @aorenste

tugsbayasgalan · 2026-04-17T16:36:21Z

@bobrenjc93 Specifically for this PR, the bug was we were incorrectly lifting DeviceMesh as input because we incorrectly thought they were tensors. But yes, i do think we should find a more principled way to handle this. Probably something like this (https://github.com/pytorch/pytorch/blob/665a8750269104209a9e0f1ce35e642db0c31b4f/torch/_functorch/_aot_autograd/subclass_utils.py#L256). Basically, we should reuse subclass wrapping/unwrapping from AOTAutograd as much as possible. It has custom logic to handle opaque type objects like DeviceMesh seperately.

Fix DTensor attr handling in make_fx tracer

cce6b49

tugsbayasgalan requested review from SherlockNoMad, aditvenk, tianyu-l, xmfan and yiming0416 as code owners April 16, 2026 23:10

pytorch-bot Bot added the ciflow/8gpu label Apr 16, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DTensor attr handling in make_fx tracer#2998

Fix DTensor attr handling in make_fx tracer#2998
tugsbayasgalan wants to merge 1 commit intomainfrom
tugsuu/no_dtensor_static_args

tugsbayasgalan commented Apr 16, 2026

Uh oh!

bobrenjc93 commented Apr 17, 2026

Uh oh!

tugsbayasgalan commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tugsbayasgalan commented Apr 16, 2026

Uh oh!

bobrenjc93 commented Apr 17, 2026

Uh oh!

tugsbayasgalan commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tugsbayasgalan commented Apr 17, 2026 •

edited

Loading