Skip to content

Add LogitProcessor interface for pre-sampling logit transforms (#19517)#19517

Open
kirklandsign wants to merge 1 commit into
mainfrom
export-D104767967
Open

Add LogitProcessor interface for pre-sampling logit transforms (#19517)#19517
kirklandsign wants to merge 1 commit into
mainfrom
export-D104767967

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

@kirklandsign kirklandsign commented May 12, 2026

Summary:

Introduces a LogitProcessor abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:

  • LogitProcessor (new): pure virtual interface with a single process(float*, int32_t) method, placed in extension/llm/sampler/.
  • TextTokenGenerator: gains add_logit_processor(), clear_logit_processors(), and num_logit_processors(). The processor chain runs after the model step and before logits_to_token(). When no processors are registered, behavior is identical to before.
  • apply_logit_processors_(): private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring logits_to_token), and invokes each processor in order.
  • Buck: logit_processor.h exported from the sampler target; text_token_generator gains a direct dep on sampler; test target added.

Processors must be configured before calling generate() — concurrent modification during generation is not safe.

Differential Revision: D104767967

Copilot AI review requested due to automatic review settings May 12, 2026 18:04
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19517

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 4 New Failures, 2 Unrelated Failures, 1 Unclassified Failure

As of commit 6ebfdf6 with merge base 9e36d62 (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

  • Check Labels / Check labels (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
    RuntimeError: GraphQL query

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 12, 2026

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104767967.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@kirklandsign
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Claude finished @kirklandsign's task in 1m 31s —— View job


Code Review: Add LogitProcessor interface for pre-sampling logit transforms

  • Read changed files
  • Analyze interface design
  • Check integration in TextTokenGenerator
  • Review tests
  • Post review feedback

Overall this is a clean, well-scoped addition. The interface is minimal, the integration point is correct, and tests cover the key behaviors. A few observations:

Design — looks good

  • The LogitProcessor interface (extension/llm/sampler/logit_processor.h) is appropriately minimal — a single pure virtual process(float*, int32_t) method. The ET_EXPERIMENTAL annotation is a sensible choice for a new API surface.
  • Using std::shared_ptr for processor ownership in TextTokenGenerator is reasonable given processors may be shared across generators or managed externally.

Potential issues

  1. apply_logit_processors_ handles dim==3 but not dim>3 (text_token_generator.h:217-219): The guard checks dim >= 2 but only has special-case offset logic for dim==3. If a model produced a 4D tensor, the logits pointer would point to the beginning of the last dimension without the correct sequence offset. This mirrors sample_from_logits in util.h:48-50 which also only handles dim==3, so this is consistent with existing behavior — but worth noting.

  2. Float-only restriction (text_token_generator.h:209-213): The processor chain enforces ScalarType::Float, but sample_from_logits supports Half, BFloat16, and UInt16. If a model produces non-Float logits, the processor chain will return an error rather than silently skip. This is the safer choice (fail loudly), but callers using quantized logit outputs won't be able to use processors without an explicit cast. A comment noting this design decision would be helpful for future maintainers.

  3. No test for the integration path (apply_logit_processors_): The unit tests in test_logit_processor.cpp test the LogitProcessor interface in isolation (which is good), but there's no integration test verifying apply_logit_processors_ correctly extracts the last-position logits from a 3D tensor and passes them to the chain. This is the trickiest logic in the PR. Consider adding a test that constructs a mock Tensor with shape [1, seq_len, vocab] and verifies only the final position's logits are modified.

  4. Minor: prev_token assigned but not used (text_token_generator.h:128): prev_token = cur_token is set before the processor chain runs but isn't used by the processors. This is pre-existing code (used later for decode), just noting it's not a new issue.

Buck targets

  • logit_processor.h is exported from the sampler target (sampler/targets.bzl:10) — correct placement.
  • text_token_generator gains a direct dep on sampler (runner/targets.bzl:71) — already present, so the new #include is covered.
  • Test target (sampler/test/targets.bzl:27-34) uses the non-aten sampler dep, which is appropriate since the test doesn't need torch tensors.

Nits

  • text_token_generator.h:45: The null check (if (processor)) silently drops null shared_ptrs. This is fine as defensive programming, but an ET_CHECK_MSG would surface misuse earlier during development. Matter of taste.

Summary

Clean PR, well-designed interface. The main suggestion is adding an integration test for the 3D tensor offset logic in apply_logit_processors_. The Float-only restriction is a reasonable starting point that can be relaxed later if needed.


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a LogitProcessor interface to enable in-place mutation of model logits between the decoder forward pass and sampling, allowing features like constrained decoding and logit biasing without changing the core generation loop.

Changes:

  • Adds LogitProcessor interface under extension/llm/sampler/.
  • Extends TextTokenGenerator with a configurable processor chain applied pre-sampling.
  • Updates Buck targets to export the new header and adds a unit test for the interface.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
extension/llm/sampler/test/test_logit_processor.cpp Adds unit tests validating basic LogitProcessor behavior and ordering semantics.
extension/llm/sampler/test/targets.bzl Adds a Buck test target for the new logit processor tests.
extension/llm/sampler/targets.bzl Exports logit_processor.h from the sampler library target.
extension/llm/sampler/logit_processor.h Introduces the LogitProcessor pure virtual interface.
extension/llm/runner/text_token_generator.h Adds processor registration APIs and applies processor chain to logits before sampling.
extension/llm/runner/targets.bzl Adds runner dependency on the sampler target (for LogitProcessor).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +216 to +218
const auto vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
if (logits_tensor.dim() == 3) {
const auto num_tokens = logits_tensor.size(1);
Comment on lines +215 to +223
auto* logits = logits_tensor.mutable_data_ptr<float>();
const auto vocab_size = logits_tensor.size(logits_tensor.dim() - 1);
if (logits_tensor.dim() == 3) {
const auto num_tokens = logits_tensor.size(1);
logits += (num_tokens - 1) * vocab_size;
}
for (auto& processor : logit_processors_) {
processor->process(logits, static_cast<int32_t>(vocab_size));
}
Comment on lines +209 to +213
ET_CHECK_OR_RETURN_ERROR(
logits_tensor.scalar_type() == ::executorch::aten::ScalarType::Float,
InvalidArgument,
"LogitProcessor chain only supports Float logits; got dtype %d",
static_cast<int>(logits_tensor.scalar_type()));
Comment on lines +130 to +132
if (!logit_processors_.empty()) {
ET_CHECK_OK_OR_RETURN_ERROR(apply_logit_processors_(logits_tensor));
}
Comment on lines +43 to +46
* @param vocab_size Number of logits in the buffer (size of the model's
* output vocabulary for the current step).
*/
virtual void process(float* logits, int32_t vocab_size) = 0;
Summary:

Introduces a `LogitProcessor` abstract interface that allows callers to mutate logits in place between the model forward pass and the sampler. This enables grammar-constrained decoding, logit biasing, repetition penalties, and similar pre-sampling transforms without modifying the core generation loop.

Changes:
- `LogitProcessor` (new): pure virtual interface with a single `process(float*, int32_t)` method, placed in `extension/llm/sampler/`.
- `TextTokenGenerator`: gains `add_logit_processor()`, `clear_logit_processors()`, and `num_logit_processors()`. The processor chain runs after the model step and before `logits_to_token()`. When no processors are registered, behavior is identical to before.
- `apply_logit_processors_()`: private helper that validates Float dtype, advances to the last-position logits for 3D tensors (mirroring `logits_to_token`), and invokes each processor in order.
- Buck: `logit_processor.h` exported from the sampler target; `text_token_generator` gains a direct dep on sampler; test target added.

Processors must be configured before calling `generate()` — concurrent modification during generation is not safe.

Differential Revision: D104767967
@meta-codesync meta-codesync Bot changed the title Add LogitProcessor interface for pre-sampling logit transforms Add LogitProcessor interface for pre-sampling logit transforms (#19517) May 12, 2026
@meta-codesync meta-codesync Bot force-pushed the export-D104767967 branch from 3b3862f to 6ebfdf6 Compare May 12, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants