Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974) by apullin · Pull Request #18974 · pytorch/executorch

apullin · 2026-04-17T14:47:56Z

Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137

pytorch-bot · 2026-04-17T14:48:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18974

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Awaiting Approval, 2 New Failures, 5 Cancelled Jobs, 2 Pending, 1 Unrelated Failure

As of commit 82b6c7e with merge base 9207001 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t 43d66ccf5db7554600f1dd5c08fb59b46178d8cd1ed33678437c0a5ae8b421b4 /exec failed with exit code 139
trunk / unittest-release / linux / linux-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_qs8_conv2d_relu_multi_users

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / test-lora-linux / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh)
##[error]The operation was canceled.
trunk / test-models-linux-aarch64 (phi_4_mini, portable, linux.arm64.m7g.4xlarge) / linux-job (gh)
##[error]The operation was canceled.
trunk / test-models-macos-coreml (ic3) / macos-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Apple / build-benchmark-app / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 65

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-17T14:48:05Z

@apullin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99857137.

digantdesai

Review automatically exported from Phabricator review in Meta.

github-actions · 2026-04-17T14:48:46Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD. The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace the graph on FakeTensors for shape propagation, but aten.bmm rejects int8/int16 FakeTensors, causing failures for any quantized mm ops. Since mm→bmm is a pure shape transformation (adding a batch dim of 1), we can set the output metadata directly: unsqueeze the mm's FakeTensor for the bmm node, and use the original for the squeeze. No need to re-execute the op. Reviewed By: digantdesai Differential Revision: D99857137

pytorch-bot · 2026-04-21T16:57:42Z

~~Workflows were awaiting approval.~~ CI has now been triggered for the ciflow labels on this PR.

Summary: This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD. The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace the graph on FakeTensors for shape propagation, but aten.bmm rejects int8/int16 FakeTensors, causing failures for any quantized mm ops. Since mm→bmm is a pure shape transformation (adding a batch dim of 1), we can set the output metadata directly: unsqueeze the mm's FakeTensor for the bmm node, and use the original for the squeeze. No need to re-execute the op. Reviewed By: digantdesai Differential Revision: D99857137

apullin requested a review from digantdesai as a code owner April 17, 2026 14:47

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 17, 2026

digantdesai approved these changes Apr 17, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops~~ Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974) Apr 20, 2026

apullin force-pushed the export-D99857137 branch from 316e474 to 7802809 Compare April 20, 2026 14:47

apullin force-pushed the export-D99857137 branch 2 times, most recently from b4a1625 to 5439a12 Compare April 21, 2026 16:57

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 21, 2026

apullin force-pushed the export-D99857137 branch from 5439a12 to a28c6cc Compare April 26, 2026 18:24

apullin force-pushed the export-D99857137 branch from a28c6cc to 82b6c7e Compare April 28, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974)#18974

Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974)#18974
apullin wants to merge 1 commit intopytorch:mainfrom
apullin:export-D99857137

apullin commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

digantdesai left a comment

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

pytorch-bot Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

apullin commented Apr 17, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18974

❌ 3 Awaiting Approval, 2 New Failures, 5 Cancelled Jobs, 2 Pending, 1 Unrelated Failure

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 17, 2026

This PR needs a release notes: label

Uh oh!

pytorch-bot Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

apullin commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Apr 17, 2026 •

edited

Loading

This PR needs a `release notes:` label

pytorch-bot Bot commented Apr 21, 2026 •

edited

Loading