Skip to content

[ignore-for-now][llm_trainer] Add experiment for LLM-driven model optimization#3006

Closed
bobrenjc93 wants to merge 9 commits intogh/bobrenjc93/43/basefrom
gh/bobrenjc93/43/head
Closed

[ignore-for-now][llm_trainer] Add experiment for LLM-driven model optimization#3006
bobrenjc93 wants to merge 9 commits intogh/bobrenjc93/43/basefrom
gh/bobrenjc93/43/head

Conversation

@bobrenjc93
Copy link
Copy Markdown
Contributor

@bobrenjc93 bobrenjc93 commented Apr 17, 2026

Stack from ghstack (oldest at bottom):

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:

  • flattener: traces via make_fx, writes standalone Python files
    per rank, verifies bitwise equivalence, copies baseline to
    optimized_models/
  • benchmarker: compares optimized vs candidate models for bitwise
    correctness and MFU, promotes only if >=1% faster on N consecutive
    runs (default 3)
  • Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
    ergonomic torchrun invocation
  • INSTRUCTIONS.md guide for LLMs

Directory structure uses targets// where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.


ghstack-source-id: b011200
Pull-Request: #3006
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 17, 2026
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: e5feea0
Pull-Request: #3006
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: b817ccb
Pull-Request: #3006
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: 90f76c6
Pull-Request: #3006
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: 3a2e782
Pull-Request: #3006
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: c3775b1
Pull-Request: #3006
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 17, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: e8114a9
Pull-Request: #3006
[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Apr 19, 2026
…imization

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

ghstack-source-id: 2c324d2
Pull-Request: #3006
…n model optimization"

Adds the llm_trainer experiment which traces a model's full
forward+backward training step into a flat sequence of ATen ops,
then provides benchmarking infra for an LLM to iteratively optimize
the generated code while maintaining bitwise correctness.

Key components:
- flattener: traces via make_fx, writes standalone Python files
  per rank, verifies bitwise equivalence, copies baseline to
  optimized_models/
- benchmarker: compares optimized vs candidate models for bitwise
  correctness and MFU, promotes only if >=1% faster on N consecutive
  runs (default 3)
- Shell script wrappers (run_flattener.sh, run_benchmarker.sh) for
  ergonomic torchrun invocation
- INSTRUCTIONS.md guide for LLMs

Directory structure uses targets/<fingerprint>/ where fingerprint
encodes both hardware label and parallelism config (e.g.
h100-sm90_tp2_fsdp4). Promoted files get an MFU comment header for
self-documenting optimization history.

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant