Add LogitProcessor interface for pre-sampling logit transforms (#19517) by kirklandsign · Pull Request #19517 · pytorch/executorch

kirklandsign · 2026-05-12T18:04:44Z

Summary:

Introduces a LogitProcessor abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:

LogitProcessor (new): pure virtual interface with a single process(float*, int32_t) method, placed in extension/llm/sampler/.
TextTokenGenerator: gains add_logit_processor(), clear_logit_processors(), and num_logit_processors(). The processor chain runs after the model step and before logits_to_token(). When no processors are registered, behavior is identical to before.
apply_logit_processors_(): private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring logits_to_token), and invokes each processor in order.
Buck: logit_processor.h exported from the sampler target; text_token_generator gains a direct dep on sampler; test target added.

Processors must be configured before calling generate() — concurrent modification during generation is not safe.

Differential Revision: D104767967

pytorch-bot · 2026-05-12T18:04:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19517

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull jobs on OSDC in pull requests shadow mode

❌ 4 New Failures, 2 Unrelated Failures, 1 Unclassified Failure

As of commit 6ebfdf6 with merge base 9e36d62 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest / macos / macos-job (gh)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest-editable / linux / linux-job (gh)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest-editable / macos / macos-job (gh)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

Check Labels / Check labels (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: GraphQL query

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
exir/tests/test_joint_graph.py::TestJointGraph::test_joint_graph

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-12T18:04:54Z

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104767967.

github-actions · 2026-05-12T18:05:40Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

kirklandsign · 2026-05-12T18:06:50Z

@claude review

claude · 2026-05-12T18:07:38Z

Claude finished @kirklandsign's task in 1m 31s —— View job

Code Review: Add LogitProcessor interface for pre-sampling logit transforms

Overall this is a clean, well-scoped addition. The interface is minimal, the integration point is correct, and tests cover the key behaviors. A few observations:

Design — looks good

The LogitProcessor interface (extension/llm/sampler/logit_processor.h) is appropriately minimal — a single pure virtual process(float*, int32_t) method. The ET_EXPERIMENTAL annotation is a sensible choice for a new API surface.
Using std::shared_ptr for processor ownership in TextTokenGenerator is reasonable given processors may be shared across generators or managed externally.

Potential issues

apply_logit_processors_ handles dim==3 but not dim>3 (text_token_generator.h:217-219): The guard checks dim >= 2 but only has special-case offset logic for dim==3. If a model produced a 4D tensor, the logits pointer would point to the beginning of the last dimension without the correct sequence offset. This mirrors sample_from_logits in util.h:48-50 which also only handles dim==3, so this is consistent with existing behavior — but worth noting.
Float-only restriction (text_token_generator.h:209-213): The processor chain enforces ScalarType::Float, but sample_from_logits supports Half, BFloat16, and UInt16. If a model produces non-Float logits, the processor chain will return an error rather than silently skip. This is the safer choice (fail loudly), but callers using quantized logit outputs won't be able to use processors without an explicit cast. A comment noting this design decision would be helpful for future maintainers.
No test for the integration path (apply_logit_processors_): The unit tests in test_logit_processor.cpp test the LogitProcessor interface in isolation (which is good), but there's no integration test verifying apply_logit_processors_ correctly extracts the last-position logits from a 3D tensor and passes them to the chain. This is the trickiest logic in the PR. Consider adding a test that constructs a mock Tensor with shape [1, seq_len, vocab] and verifies only the final position's logits are modified.
Minor: prev_token assigned but not used (text_token_generator.h:128): prev_token = cur_token is set before the processor chain runs but isn't used by the processors. This is pre-existing code (used later for decode), just noting it's not a new issue.

Buck targets

logit_processor.h is exported from the sampler target (sampler/targets.bzl:10) — correct placement.
text_token_generator gains a direct dep on sampler (runner/targets.bzl:71) — already present, so the new #include is covered.
Test target (sampler/test/targets.bzl:27-34) uses the non-aten sampler dep, which is appropriate since the test doesn't need torch tensors.

Nits

text_token_generator.h:45: The null check (if (processor)) silently drops null shared_ptrs. This is fine as defensive programming, but an ET_CHECK_MSG would surface misuse earlier during development. Matter of taste.

Summary

Clean PR, well-designed interface. The main suggestion is adding an integration test for the 3D tensor offset logic in apply_logit_processors_. The Float-only restriction is a reasonable starting point that can be relaxed later if needed.

Copilot

Pull request overview

This PR introduces a LogitProcessor interface to enable in-place mutation of model logits between the decoder forward pass and sampling, allowing features like constrained decoding and logit biasing without changing the core generation loop.

Changes:

Adds LogitProcessor interface under extension/llm/sampler/.
Extends TextTokenGenerator with a configurable processor chain applied pre-sampling.
Updates Buck targets to export the new header and adds a unit test for the interface.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
extension/llm/sampler/test/test_logit_processor.cpp	Adds unit tests validating basic `LogitProcessor` behavior and ordering semantics.
extension/llm/sampler/test/targets.bzl	Adds a Buck test target for the new logit processor tests.
extension/llm/sampler/targets.bzl	Exports `logit_processor.h` from the sampler library target.
extension/llm/sampler/logit_processor.h	Introduces the `LogitProcessor` pure virtual interface.
extension/llm/runner/text_token_generator.h	Adds processor registration APIs and applies processor chain to logits before sampling.
extension/llm/runner/targets.bzl	Adds runner dependency on the sampler target (for `LogitProcessor`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    const auto vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
+    if (logits_tensor.dim() == 3) {
+      const auto num_tokens = logits_tensor.size(1);


+    auto* logits = logits_tensor.mutable_data_ptr<float>();
+    const auto vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
+    if (logits_tensor.dim() == 3) {
+      const auto num_tokens = logits_tensor.size(1);
+      logits += (num_tokens - 1) * vocab_size;
+    }
+    for (auto& processor : logit_processors_) {
+      processor->process(logits, static_cast<int32_t>(vocab_size));
+    }


+    ET_CHECK_OR_RETURN_ERROR(
+        logits_tensor.scalar_type() == ::executorch::aten::ScalarType::Float,
+        InvalidArgument,
+        "LogitProcessor chain only supports Float logits; got dtype %d",
+        static_cast<int>(logits_tensor.scalar_type()));


+      if (!logit_processors_.empty()) {
+        ET_CHECK_OK_OR_RETURN_ERROR(apply_logit_processors_(logits_tensor));
+      }


+   * @param vocab_size  Number of logits in the buffer (size of the model's
+   *                    output vocabulary for the current step).
+   */
+  virtual void process(float* logits, int32_t vocab_size) = 0;


Summary: Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop. Changes: - `LogitProcessor` (new): pure virtual interface with a single `process(float*, int32_t)` method, placed in `extension/llm/sampler/`. - `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before. - `apply_logit_processors_()`: private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order. - Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added. Processors must be configured before calling `generate()` — concurrent modification during generation is not safe. Differential Revision: D104767967

Copilot AI review requested due to automatic review settings May 12, 2026 18:04

kirklandsign requested review from larryliu0820 and mergennachin as code owners May 12, 2026 18:04

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026

meta-codesync Bot added fb-exported meta-exported labels May 12, 2026

Copilot started reviewing on behalf of kirklandsign May 12, 2026 18:05 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Add LogitProcessor interface for pre-sampling logit transforms~~ Add LogitProcessor interface for pre-sampling logit transforms (#19517) May 12, 2026

meta-codesync Bot force-pushed the export-D104767967 branch from 3b3862f to 6ebfdf6 Compare May 12, 2026 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LogitProcessor interface for pre-sampling logit transforms (#19517)#19517

Add LogitProcessor interface for pre-sampling logit transforms (#19517)#19517
kirklandsign wants to merge 1 commit into
mainfrom
export-D104767967

kirklandsign commented May 12, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 12, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

kirklandsign commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kirklandsign commented May 12, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19517

❗ 1 Active SEVs

❌ 4 New Failures, 2 Unrelated Failures, 1 Unclassified Failure

Uh oh!

meta-codesync Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

This PR needs a release notes: label

Uh oh!

kirklandsign commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Add LogitProcessor interface for pre-sampling logit transforms

Design — looks good

Potential issues

Buck targets

Nits

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kirklandsign commented May 12, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 12, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented May 12, 2026 •

edited

Loading