Skip to content

Add ExportRecipe support for Arm targets#19527

Draft
rascani wants to merge 1 commit into
pytorch:mainfrom
rascani:arm-export-recipes
Draft

Add ExportRecipe support for Arm targets#19527
rascani wants to merge 1 commit into
pytorch:mainfrom
rascani:arm-export-recipes

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 12, 2026

Summary

Introduces ArmRecipeProvider and ArmRecipeType so callers can use the existing ExportRecipe abstraction to target Ethos-U, TOSA, and VGF instead of going through aot_arm_compiler.py. Shape mirrors the sibling XNNPACK / QNN providers; the provider auto-registers on import of backends/arm/recipes/.

Eight recipes ship: Ethos-U55/U65/U85 INT8 (with macs, system_config, memory_mode, extra_flags, config_ini kwargs), TOSA FP / INT8 / A16W8, and VGF FP / INT8. Cortex-M is not yet supported via recipes — its no-partitioner flow needs a different pipeline shape and is left for a follow-up.

Faithfulness to the CLI: INT8 and A16W8 paths wire ReplaceQuantNodesPass through LoweringRecipe.edge_manager_transform_passes and override pipeline_stages to insert EDGE_PROGRAM_MANAGER_TRANSFORM after TO_EDGE_TRANSFORM_AND_LOWER, matching
aot_arm_compiler.py:200-201. The pass is skipped for VGF and FP, also matching the CLI gate. Ethos-U extra_flags are prepended with --verbose-operators --verbose-cycle-estimate to mirror aot_arm_compiler.py:479-484. Unknown kwargs raise ValueError (vs. XNNPACK/QNN which warn) — intentional for a new provider so typos like mac=128 fail fast rather than silently producing a wrong-target binary.

Enabling the post-partition hook required uncommenting the existing TODO at EdgeProgramManagerTransformStage.valid_predecessor_stages to also accept TO_EDGE_TRANSFORM_AND_LOWER. The stage's run() method already handles a partitioned EdgeProgramManager correctly.

A pre-existing circular import between tosa.backend and ethosu.backend surfaces when executorch.backends.arm.vgf is loaded without ethosu already in sys.modules. The provider primes ethosu before importing vgf, the same workaround aot_arm_compiler.py uses implicitly through its module-level import order.

Test plan

Tests live in backends/arm/test/recipes/test_arm_recipes.py:

  • Registration suite runs anywhere (no Arm SDK deps).
  • TOSA / VGF / Ethos-U construction suites skip cleanly if the corresponding SDK piece isn't installed.
  • AOT round-trip suite exports _AddModule (TOSA FP) and _ConvReluModule (TOSA INT8) and asserts the right delegation shape — full delegation for FP; for INT8, ≥1 DelegateCall plus cortex_m::quantize_per_tensor / cortex_m::dequantize_per_tensor boundary kernels, which verifies ReplaceQuantNodesPass actually ran.

CI hookup adds a test_pytest_recipes matrix entry to unittest-arm-backend-with-no-deps in pull.yml (Ethos-U tests skip via the Vela guard) and to test-arm-backend-ethos-u in trunk.yml (full SDK available; all tests run).

Authored with Claude Code.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Introduces `ArmRecipeProvider` and `ArmRecipeType` so callers can use the
existing `ExportRecipe` abstraction to target Ethos-U, TOSA, and VGF
instead of going through `aot_arm_compiler.py`. Shape mirrors the
sibling XNNPACK / QNN providers; the provider auto-registers on import
of `backends/arm/recipes/`.

Eight recipes ship: Ethos-U55/U65/U85 INT8 (with `macs`,
`system_config`, `memory_mode`, `extra_flags`, `config_ini` kwargs),
TOSA FP / INT8 / A16W8, and VGF FP / INT8. Cortex-M is not yet
supported via recipes — its no-partitioner flow needs a different
pipeline shape and is left for a follow-up.

Faithfulness to the CLI: INT8 and A16W8 paths wire
`ReplaceQuantNodesPass` through `LoweringRecipe.edge_manager_transform_passes`
and override `pipeline_stages` to insert `EDGE_PROGRAM_MANAGER_TRANSFORM`
after `TO_EDGE_TRANSFORM_AND_LOWER`, matching
`aot_arm_compiler.py:200-201`. The pass is skipped for VGF and FP, also
matching the CLI gate. Ethos-U `extra_flags` are prepended with
`--verbose-operators --verbose-cycle-estimate` to mirror
`aot_arm_compiler.py:479-484`. Unknown kwargs raise `ValueError` (vs.
XNNPACK/QNN which warn) — intentional for a new provider so typos like
`mac=128` fail fast rather than silently producing a wrong-target
binary.

Enabling the post-partition hook required uncommenting the existing
TODO at `EdgeProgramManagerTransformStage.valid_predecessor_stages` to
also accept `TO_EDGE_TRANSFORM_AND_LOWER`. The stage's `run()` method
already handles a partitioned `EdgeProgramManager` correctly.

A pre-existing circular import between `tosa.backend` and
`ethosu.backend` surfaces when `executorch.backends.arm.vgf` is loaded
without `ethosu` already in `sys.modules`. The provider primes `ethosu`
before importing `vgf`, the same workaround `aot_arm_compiler.py` uses
implicitly through its module-level import order.

Tests live in `backends/arm/test/recipes/test_arm_recipes.py`:
- Registration suite runs anywhere (no Arm SDK deps).
- TOSA / VGF / Ethos-U construction suites skip cleanly if the
  corresponding SDK piece isn't installed.
- AOT round-trip suite exports `_AddModule` (TOSA FP) and
  `_ConvReluModule` (TOSA INT8) and asserts the right delegation
  shape — full delegation for FP; for INT8, ≥1 `DelegateCall` plus
  `cortex_m::quantize_per_tensor` / `cortex_m::dequantize_per_tensor`
  boundary kernels, which verifies `ReplaceQuantNodesPass` actually
  ran.

CI hookup adds a `test_pytest_recipes` matrix entry to
`unittest-arm-backend-with-no-deps` in pull.yml (Ethos-U tests skip via
the Vela guard) and to `test-arm-backend-ethos-u` in trunk.yml (full
SDK available; all tests run).

Authored with Claude Code.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19527

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 565b95f with merge base 9e4e497 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 12, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant