Skip to content

Arm backend: add VGF PT2E linear quantization modes for LLM export#19029

Open
xingguo01 wants to merge 3 commits into
pytorch:mainfrom
xingguo01:arm-backend-llm-export
Open

Arm backend: add VGF PT2E linear quantization modes for LLM export#19029
xingguo01 wants to merge 3 commits into
pytorch:mainfrom
xingguo01:arm-backend-llm-export

Conversation

@xingguo01
Copy link
Copy Markdown
Collaborator

@xingguo01 xingguo01 commented Apr 21, 2026

  • add vgf_16a8w/8a8w PT2E quantization modes
  • add backend.vgf.quantize_scope for full vs linear VGF quantization
  • wire the VGF config through the LLM export and quantizer selection path
  • add coverage in export_llama_lib tests for the new VGF PT2E modes

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani

- add vgf_16a8w PT2E quantization modes
- add backend.vgf.quantize_scope for full
  vs linear VGF quantization
- wire the VGF config through the LLM export
  and quantizer selection path
- add coverage in export_llama_lib tests
  for the new VGF PT2E modes

Signed-off-by: Xingguo Li <xingguo.li@arm.com>
Change-Id: Ie8fe849b4856321308d6d526248a7a4760ddc573
Copilot AI review requested due to automatic review settings April 21, 2026 17:02
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19029

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 2 Cancelled Jobs, 10 Unrelated Failures

As of commit 7d65ac8 with merge base d6f1625 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2026
@xingguo01 xingguo01 added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Arm VGF backend PT2E quantization support for LLM export, including a new 16a8w mode gated on INT16 TOSA extension support and a configurable quantization scope (full-model vs Linear-only), plus test coverage for the new behavior.

Changes:

  • Add vgf_16a8w PT2E quantization mode and enforce INT16 compile spec extension when selected.
  • Introduce backend.vgf.quantize_scope (full vs linear) and apply it when constructing the VGF quantizer.
  • Wire new VGF settings through llama export CLI/config and add unit tests for scope + INT16 gating.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
extension/llm/export/quantizer_lib.py Extends VGF quantizer selection for vgf_16a8w and adds scope-based application (global vs Linear-only).
extension/llm/export/config/llm_config.py Adds vgf_16a8w enum value and introduces VgfQuantizeScope + config wiring from CLI args.
examples/models/llama/export_llama_lib.py Exposes VGF PT2E modes and VGF scope/compile-spec CLI flags; passes scope into VGF quantizer creation.
examples/models/llama/tests/test_export_llama_lib.py Adds coverage for VGF linear-only scope and INT16 compile spec enforcement for vgf_16a8w.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"vgf_8a8w",
"vgf_16a8w",
],
help="Use PT2E quantization. Comma separated options. e.g. xnnpack_dynamic (for per channel 8 bit weight), xnnpack_dynamic_qc4 (for per channel 4 bit weight), embedding.",
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --pt2e_quantize argparse option is defined with a fixed set of choices, so it only accepts a single value, but the help text says it supports "Comma separated options" (and even mentions embedding, which is not a valid choice). This is user-facing and likely to confuse; either update the help text to reflect single-choice behavior, or switch the argument parsing to accept a comma-separated list (and adjust LlmConfig/Pt2eQuantize parsing accordingly).

Suggested change
help="Use PT2E quantization. Comma separated options. e.g. xnnpack_dynamic (for per channel 8 bit weight), xnnpack_dynamic_qc4 (for per channel 4 bit weight), embedding.",
help="Use a single PT2E quantization mode, e.g. xnnpack_dynamic (per-channel 8-bit weight) or xnnpack_dynamic_qc4 (per-channel 4-bit weight).",

Copilot uses AI. Check for mistakes.
@xingguo01
Copy link
Copy Markdown
Collaborator Author

@digantdesai @SS-JIA highly appreciated if you can have a look

Copy link
Copy Markdown
Collaborator

@zingo zingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly touch Arm backend stuff, and no need for Buck2 fixes

@zingo zingo added the help wanted Extra attention is needed label May 12, 2026
)
parser.add_argument("-V", "--vulkan", action="store_true")
parser.add_argument("--vulkan-force-fp16", action="store_true")
parser.add_argument("--vgf", action="store_true")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit ```suggestion
parser.add_argument("--arm-vgf", action="store_true")

rationale is not everyone knows what vgf is

@digantdesai
Copy link
Copy Markdown
Contributor

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @digantdesai's task in 1m 59s —— View job


Review of PR #19029: Arm backend VGF PT2E linear quantization modes for LLM export

  • Read PR diff and changed files
  • Analyze code for correctness, style, and potential issues
  • Post review feedback

Overall: Clean, well-structured PR. The new quantization modes are properly gated, the scope abstraction is sensible, and the tests cover the key paths (full vs linear scope, INT16 enforcement). A few observations below:


1. Reviewer feedback: --vgf--arm-vgf (digantdesai's comment)

I agree with @digantdesai's suggestion at export_llama_lib.py:461. "VGF" is Arm-specific terminology and not widely recognizable. Prefixing with --arm-vgf (and correspondingly --arm-vgf-compile-spec, --arm-vgf-quantize-scope) would be more discoverable for users scanning --help. The other Arm backends (TOSA, Ethos-U) don't have this prefix either, but those are more established names — vgf is new enough that the disambiguation helps.

2. from_args silently accepts vgf_compile_spec even when --vgf is not set

In llm_config.py:828-829:

if hasattr(args, "vgf_compile_spec"):
    llm_config.backend.vgf.compile_spec = args.vgf_compile_spec

Because the argparse default is "TOSA-1.0+INT", this unconditionally overwrites the config's compile spec whether or not --vgf is passed. It's harmless today (both defaults match), but fragile if the defaults diverge. Consider gating on args.vgf or at least args.vgf_compile_spec != parser.get_default(...). Low priority since it's consistent with how other backends handle this pattern in this file.

3. Copilot's note on --pt2e_quantize help text is valid but pre-existing

The help text says "Comma separated options" but the argument uses choices= which only accepts a single value. This is a pre-existing issue unrelated to this PR — mentioning for awareness but not blocking.

4. Test coverage is good

The three new tests cover:

  • vgf_8a8w + linear scope → verifies module_type_config is set
  • vgf_16a8w without INT16 → verifies ValueError raised
  • vgf_16a8w with INT16 → verifies success

The existing test for vgf_8a8w + full scope is also correctly extended with assertions on global_config and empty module_type_config. This gives solid boundary coverage.

5. Minor: error message could be more actionable

At quantizer_lib.py:388-389:

"vgf_16a8w requires a VGF compile spec with INT16 support, "
"for example TOSA-1.0+INT+int16."

This is clear and actionable — good.

6. Code quality

  • The VgfQuantizeScope enum is a clean str, Enum which serializes nicely via .value
  • The scope dispatch in get_vgf_quantizer handles invalid values with a clear error
  • Import of get_symmetric_a16w8_quantization_config is properly deferred inside the function (matching existing pattern)

Summary: This is a solid, focused change. The only actionable feedback is the --arm-vgf naming suggestion from digantdesai. The code is correct and well-tested. LGTM once the naming nit is addressed.


@zingo zingo removed the help wanted Extra attention is needed label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants