Add a16w8 MHA softmax FVP coverage for Ethos-U85 (#19493)#19493
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19493
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 1 Cancelled Job, 1 Pending, 8 Unrelated FailuresAs of commit db06d0b with merge base f1062a7 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103734699. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Adds new Arm backend test coverage for torch.softmax in an MHA-like shape under a16w8 quantization, including a sweep over realistic attention-logit ranges and an expected-failure annotation for a known Ethos-U85/Vela numerics issue.
Changes:
- Add
ops/test_softmax.pyto the Arm Bazel test target list. - Introduce an MHA-shaped softmax module and range-sweep INT tests for Ethos-U55 and Ethos-U85 (U85 marked xfail with
strict=False).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| backends/arm/test/targets.bzl | Adds softmax tests to the Arm test suite target list. |
| backends/arm/test/ops/test_softmax.py | Adds a16w8 MHA-shaped softmax sweep tests for U55/U85, with U85 xfail tied to Vela issue #23. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dc79f31 to
3f89f16
Compare
Summary:
Adds a16w8 (int16 IO + int8 weights) coverage for `torch.softmax` in a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (`amax → sub → exp → sum → reciprocal → mul`).
## What's added
`MultiHeadAttentionSoftmax` is a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window). `test_mha_softmax_a16w8_{u55,u85}_INT` sweeps 7 pre-softmax input ranges from `[-0.01, 0.01]` to `[-30, 30]`, covering realistic post-`1/√d` attention logits.
## Tolerances
`atol=0.003` (single value). Calibrated from observed FVP max-abs softmax error at `qtol=0`, sized at ~1.5× the worst-case observed value across the sweep. `rtol` and `qtol` use framework defaults.
## XFAIL handling
The U85 a16w8 cases are wrapped with `pytest.mark.xfail(strict=False)` referencing the upstream Vela report:
https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23
`strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS).
Differential Revision: D103734699
3f89f16 to
f3370d3
Compare
Summary:
Adds a16w8 (int16 IO + int8 weights) coverage for `torch.softmax` in a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (`amax → sub → exp → sum → reciprocal → mul`).
## What's added
`MultiHeadAttentionSoftmax` is a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window). `test_mha_softmax_a16w8_{u55,u85}_INT` sweeps 7 pre-softmax input ranges from `[-0.01, 0.01]` to `[-30, 30]`, covering realistic post-`1/√d` attention logits.
## Tolerances
`atol=0.003` (single value). Calibrated from observed FVP max-abs softmax error at `qtol=0`, sized at ~1.5× the worst-case observed value across the sweep. `rtol` and `qtol` use framework defaults.
## XFAIL handling
The U85 a16w8 cases are wrapped with `pytest.mark.xfail(strict=False)` referencing the upstream Vela report:
https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23
`strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS).
Differential Revision: D103734699
f3370d3 to
7908d4e
Compare
Summary:
Adds a16w8 (int16 IO + int8 weights) coverage for `torch.softmax` in a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (`amax → sub → exp → sum → reciprocal → mul`).
## What's added
`MultiHeadAttentionSoftmax` is a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window). `test_mha_softmax_a16w8_{u55,u85}_INT` sweeps 7 pre-softmax input ranges from `[-0.01, 0.01]` to `[-30, 30]`, covering realistic post-`1/√d` attention logits.
## Tolerances
`atol=0.003` (single value). Calibrated from observed FVP max-abs softmax error at `qtol=0`, sized at ~1.5× the worst-case observed value across the sweep. `rtol` and `qtol` use framework defaults.
## XFAIL handling
The U85 a16w8 cases are wrapped with `pytest.mark.xfail(strict=False)` referencing the upstream Vela report:
https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23
`strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS).
Differential Revision: D103734699
7908d4e to
db06d0b
Compare
digantdesai
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
Differential Revision: D103734699 Pull Request resolved: pytorch#19493
Summary:
Adds a16w8 (int16 IO + int8 weights) coverage for
torch.softmaxin a multi-head-attention shape. Sweeps pre-softmax input ranges to surface a known Ethos-U85 numerics issue: int16 ReduceSum produces silent zero output, which propagates through the standard softmax decomposition (amax → sub → exp → sum → reciprocal → mul).What's added
MultiHeadAttentionSoftmaxis a generic MHA-shaped softmax (H=4 heads, M=1 query token, W=16 K/V window).test_mha_softmax_a16w8_{u55,u85}_INTsweeps 7 pre-softmax input ranges from[-0.01, 0.01]to[-30, 30], covering realistic post-1/√dattention logits.Tolerances
atol=0.003(single value). Calibrated from observed FVP max-abs softmax error atqtol=0, sized at ~1.5× the worst-case observed value across the sweep.rtolandqtoluse framework defaults.XFAIL handling
The U85 a16w8 cases are wrapped with
pytest.mark.xfail(strict=False)referencing the upstream Vela report:https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23
strict=Falsekeeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream fix lands (cases XPASS).Differential Revision: D103734699