Add a16w8 MHA softmax FVP coverage for Ethos-U85 (#19493) by Ninja91 · Pull Request #19493 · pytorch/executorch

Ninja91 · 2026-05-12T03:55:48Z

Summary:

Adds a16w8 (int16 IO + int8 weights) coverage for torch.softmax in a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (amax → sub → exp → sum → reciprocal → mul).

What's added

MultiHeadAttentionSoftmax is a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window). test_mha_softmax_a16w8_{u55,u85}_INT sweeps 7 pre-softmax input ranges from [-0.01, 0.01] to [-30, 30], covering realistic post-1/√d attention logits.

Tolerances

atol=0.003 (single value). Calibrated from observed FVP max-abs softmax error at qtol=0, sized at ~1.5× the worst-case observed value across the sweep. rtol and qtol use framework defaults.

XFAIL handling

The U85 a16w8 cases are wrapped with pytest.mark.xfail(strict=False) referencing the upstream Vela report:

https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23

strict=False keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS).

Differential Revision: D103734699

pytorch-bot · 2026-05-12T03:55:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19493

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull jobs on OSDC in pull requests shadow mode

❌ 3 New Failures, 1 Cancelled Job, 1 Pending, 8 Unrelated Failures

As of commit db06d0b with merge base f1062a7 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / macos / macos-job (gh)
exir/tests/test_memory_planning.py::TestMisc::test_multiple_pools_1
pull / unittest-editable / linux / linux-job (gh)
exir/tests/test_memory_planning.py::TestMisc::test_multiple_pools_1
trunk / test-arm-backend-ethos-u (test_pytest_ops_ethos_u55) / linux-job (gh)
RuntimeError: Command docker exec -t 5ae01902e311980bf37ea89311c6a08c410f299d2f3ebbe374585db0a8810966 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / test-custom-ops-macos (cmake) / macos-job (gh)
##[error]The operation was canceled.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / test-selective-build-macos (cmake) / macos-job (gh) (matched macos rule in flaky-rules.json)
File doesn't exist
trunk / unittest-release / linux / linux-job (gh) (similar failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest / windows / windows-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv2_model
trunk / unittest-release / windows / windows-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-12T03:55:57Z

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103734699.

github-actions · 2026-05-12T03:56:40Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Adds new Arm backend test coverage for torch.softmax in an MHA-like shape under a16w8 quantization, including a sweep over realistic attention-logit ranges and an expected-failure annotation for a known Ethos-U85/Vela numerics issue.

Changes:

Add ops/test_softmax.py to the Arm Bazel test target list.
Introduce an MHA-shaped softmax module and range-sweep INT tests for Ethos-U55 and Ethos-U85 (U85 marked xfail with strict=False).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
backends/arm/test/targets.bzl	Adds softmax tests to the Arm test suite target list.
backends/arm/test/ops/test_softmax.py	Adds a16w8 MHA-shaped softmax sweep tests for U55/U85, with U85 xfail tied to Vela issue #23.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zingo

Thanks!

Summary: Adds a16w8 (int16 IO + int8 weights) coverage for `torch.softmax` in a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (`amax → sub → exp → sum → reciprocal → mul`). ## What's added `MultiHeadAttentionSoftmax` is a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window). `test_mha_softmax_a16w8_{u55,u85}_INT` sweeps 7 pre-softmax input ranges from `[-0.01, 0.01]` to `[-30, 30]`, covering realistic post-`1/√d` attention logits. ## Tolerances `atol=0.003` (single value). Calibrated from observed FVP max-abs softmax error at `qtol=0`, sized at ~1.5× the worst-case observed value across the sweep. `rtol` and `qtol` use framework defaults. ## XFAIL handling The U85 a16w8 cases are wrapped with `pytest.mark.xfail(strict=False)` referencing the upstream Vela report: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23 `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS). Differential Revision: D103734699

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Summary: Adds a16w8 (int16 IO + int8 weights) coverage for `torch.softmax` in a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (`amax → sub → exp → sum → reciprocal → mul`). ## What's added `MultiHeadAttentionSoftmax` is a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window). `test_mha_softmax_a16w8_{u55,u85}_INT` sweeps 7 pre-softmax input ranges from `[-0.01, 0.01]` to `[-30, 30]`, covering realistic post-`1/√d` attention logits. ## Tolerances `atol=0.003` (single value). Calibrated from observed FVP max-abs softmax error at `qtol=0`, sized at ~1.5× the worst-case observed value across the sweep. `rtol` and `qtol` use framework defaults. ## XFAIL handling The U85 a16w8 cases are wrapped with `pytest.mark.xfail(strict=False)` referencing the upstream Vela report: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23 `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS). Differential Revision: D103734699

digantdesai

Review automatically exported from Phabricator review in Meta.

Differential Revision: D103734699 Pull Request resolved: pytorch#19493

Copilot AI review requested due to automatic review settings May 12, 2026 03:55

Ninja91 requested a review from digantdesai as a code owner May 12, 2026 03:55

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026

meta-codesync Bot added fb-exported meta-exported labels May 12, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 12, 2026

Copilot started reviewing on behalf of Ninja91 May 12, 2026 03:56 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread backends/arm/test/ops/test_softmax.py Outdated

Ninja91 added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label May 12, 2026

zingo approved these changes May 12, 2026

View reviewed changes

Ninja91 force-pushed the export-D103734699 branch from dc79f31 to 3f89f16 Compare May 12, 2026 06:09

meta-codesync Bot changed the title ~~Add a16w8 MHA softmax FVP coverage for Ethos-U85~~ Add a16w8 MHA softmax FVP coverage for Ethos-U85 (#19493) May 12, 2026

Copilot AI review requested due to automatic review settings May 12, 2026 14:21

Ninja91 force-pushed the export-D103734699 branch from 3f89f16 to f3370d3 Compare May 12, 2026 14:21

Copilot started reviewing on behalf of Ninja91 May 12, 2026 14:22 View session

Ninja91 force-pushed the export-D103734699 branch from f3370d3 to 7908d4e Compare May 12, 2026 14:23

Copilot AI reviewed May 12, 2026

View reviewed changes

Ninja91 force-pushed the export-D103734699 branch from 7908d4e to db06d0b Compare May 12, 2026 14:32

digantdesai approved these changes May 12, 2026

View reviewed changes

meta-codesync Bot merged commit 8020fe0 into pytorch:main May 12, 2026
427 of 447 checks passed

usamahz pushed a commit to usamahz/executorch that referenced this pull request May 13, 2026

Add a16w8 MHA softmax FVP coverage for Ethos-U85 (pytorch#19493)

fd420af

Differential Revision: D103734699 Pull Request resolved: pytorch#19493

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a16w8 MHA softmax FVP coverage for Ethos-U85 (#19493)#19493

Add a16w8 MHA softmax FVP coverage for Ethos-U85 (#19493)#19493
meta-codesync[bot] merged 1 commit into
pytorch:mainfrom
Ninja91:export-D103734699

Ninja91 commented May 12, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 12, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

zingo left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

digantdesai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ninja91 commented May 12, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's added

Tolerances

XFAIL handling

Uh oh!

pytorch-bot Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19493

❗ 1 Active SEVs

❌ 3 New Failures, 1 Cancelled Job, 1 Pending, 8 Unrelated Failures

Uh oh!

meta-codesync Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

zingo left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ninja91 commented May 12, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 12, 2026 •

edited

Loading

This PR needs a `release notes:` label