Skip to content

[GraphTrainer][AutoDev] Remove compile_with_inductor annotation from qwen3 FlexAttention#3019

Merged
SherlockNoMad merged 1 commit intomainfrom
graph_trainer/align-flexattention-annotations
Apr 20, 2026
Merged

[GraphTrainer][AutoDev] Remove compile_with_inductor annotation from qwen3 FlexAttention#3019
SherlockNoMad merged 1 commit intomainfrom
graph_trainer/align-flexattention-annotations

Conversation

@SherlockNoMad
Copy link
Copy Markdown
Contributor

Summary

  • Remove the compile_with_inductor annotation on FlexAttention.forward in the qwen3 graph_trainer parallelize module to align with llama3 and deepseek_v3 variants, which do not have this annotation.

Why

The qwen3 annotate_qwen3 function tagged FlexAttention.forward with {"compile_with_inductor": "flex_attention"} metadata, but the llama3 and deepseek_v3 graph_trainer variants do not annotate FlexAttention this way. Since FlexAttention is a shared component (torchtitan/models/common/attention.py), annotating its forward method in one model variant but not others causes a global mutation that persists across all models in the same process, which could cause subtle behavioral divergence depending on model initialization order.

This PR removes the annotation from qwen3 so all three graph_trainer model variants are consistent.

Test plan

  • Verified module imports successfully (from torchtitan.experiments.graph_trainer.qwen3.parallelize import annotate_qwen3, parallelize_qwen3)
  • Pre-commit linting passes (flake8, ufmt, codespell, pydoclint all pass)
  • Self-reviewed diff: only the FlexAttention annotation, its import, and its docstring entry were removed; EP and AC annotations are untouched
  • GPU integration tests (requires H100 cluster)

…xAttention

The qwen3 graph_trainer parallelize.py annotated FlexAttention.forward
with compile_with_inductor metadata, but the llama3 and deepseek_v3
variants do not have this annotation. This divergence could cause subtle
issues when FlexAttention is shared across models.

Remove the annotation from qwen3 to align all graph_trainer model
variants.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 17, 2026
@SherlockNoMad SherlockNoMad marked this pull request as ready for review April 20, 2026 16:42
@SherlockNoMad SherlockNoMad merged commit dd0cbe6 into main Apr 20, 2026
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants