Skip to content

Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961)#18961

Open
kirklandsign wants to merge 1 commit intomainfrom
export-D100892465
Open

Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961)#18961
kirklandsign wants to merge 1 commit intomainfrom
export-D100892465

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

@kirklandsign kirklandsign commented Apr 17, 2026

Summary:

Restructure extension/asr/runner/ to support both Seq2Seq and Transducer ASR architectures:

  1. Rename AsrRunnerSeq2SeqRunner (with backward-compat alias in runner.h)

    • runner.cppseq2seq_runner.cpp, runner.hseq2seq_runner.h
    • Old runner.h kept as a thin include redirect for backward compatibility
  2. Add TransducerRunner for RNN-T/TDT models

    • Extracts the ~190-line greedy decode loop from examples/models/parakeet/main.cpp
    • Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
    • Supports both standard RNN-T (duration=1) and TDT (variable durations)
    • Returns vector<Token> with frame offsets for downstream timestamp computation
    • Includes preprocess() method for models with bundled preprocessor
    • Exposes tokenizer via tokenizer() getter for downstream use
  3. Unify Token type: parakeet::Token is now an alias for asr::Token

  4. Migrate consumers

    • Whisper main.cpp: AsrRunnerSeq2SeqRunner
    • Parakeet main.cpp: inline decode → TransducerRunner::transcribe()
    • Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in examples/models/parakeet/.

Differential Revision: D100892465

Copilot AI review requested due to automatic review settings April 17, 2026 00:10
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18961

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 2 Unrelated Failures

As of commit 828bce5 with merge base 35b9054 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 17, 2026

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100892465.

@kirklandsign
Copy link
Copy Markdown
Contributor Author

@claude review

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 17, 2026

Claude finished @kirklandsign's task in 1m 28s —— View job


Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

  • Read all changed files and gather context
  • Review TransducerRunner implementation
  • Review Seq2SeqRunner rename and backward compatibility
  • Review consumer migrations (Whisper, Parakeet, Android JNI)
  • Review build system changes
  • Post review feedback

Summary

Well-structured refactoring that cleanly separates Seq2Seq (Whisper) and Transducer (RNN-T/TDT) ASR architectures. The extraction of ~190 lines of greedy decode logic from examples/models/parakeet/main.cpp into a reusable TransducerRunner is a good move. The backward-compat layer for AsrRunner is handled correctly.

Issues

1. stats_ is never reset between calls in TransducerRunner::transcribe()

transducer_runner.cpp:245-271 — The stats_ member accumulates across multiple transcribe() calls without being reset. If a user calls transcribe() twice, model_load_start_ms / model_load_end_ms from the first load() persist, but inference_start_ms gets overwritten. This produces misleading statistics for the second call. The Seq2SeqRunner has the same pre-existing pattern, so this is not a regression, but worth noting for both runners.

2. Redundant model loading for timestamp metadata in parakeet/main.cpp

examples/models/parakeet/main.cpp:195-227 — A second Module is created just to read sample_rate, window_stride, and encoder_subsampling_factor constant methods. These are read via execute() on an empty input vector, which requires loading the full module again. Since TransducerRunner already loads the same module and reads similar metadata (blank_id, num_rnn_layers, pred_hidden), consider either:

  • Exposing a module() accessor on TransducerRunner for downstream constant_method queries, or
  • Having TransducerRunner::load_model_metadata() read these extra fields too (perhaps as optional metadata).

This duplicates the module load cost at runtime.

3. BUCK file has duplicated load statement

extension/asr/runner/BUCK:8,13load(":targets.bzl", "define_common_targets") appears twice. This appears to be intentional (one for non_fbcode_target, one for fbcode_target), but the duplicate load on line 13 is redundant since Starlark load is idempotent. Not a bug, but slightly unusual.

Minor / Nits

4. preprocess() return type constructs TensorPtr manually

transducer_runner.cpp:234-235:

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));

This works but is inconsistent with how TensorPtr is typically created elsewhere in the codebase (e.g., via from_blob or the tensor_ptr_maker utilities). It also moves from a local Tensor into a shared_ptr, which is fine functionally.

5. Potential out-of-bounds in encoder output access

transducer_runner.cpp:268:

int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];

No bounds check on the tensor having at least one element. If the encoder returns a 0-element length tensor, this would be undefined behavior. The check at line 261-265 verifies both outputs exist and are tensors, but not that the length tensor is non-empty.

6. g_proj shape assumes it matches proj_dim

transducer_runner.cpp:372-375 — The g_proj tensor is created with shape {1, 1, proj_dim} where proj_dim comes from the encoder's output. If the decoder's g_proj output has a different hidden dimension than the encoder's f_proj, this would silently produce incorrect results. In practice these likely always match for transducer architectures, but the assumption is implicit.

7. Missing #include <optional> in transducer_runner.h

transducer_runner.h:79 uses std::optional<std::string> but doesn't include <optional>. It likely gets pulled in transitively through other headers, but for correctness it should be explicitly included.

Looks Good

  • Backward-compat alias strategy (runner.h → thin redirect, AsrRunner = Seq2SeqRunner) is clean and low-risk.
  • Android JNI layer correctly uses the alias (asr::AsrRunner) so no API breakage.
  • Token type unification (parakeet::Token → alias for asr::Token) avoids duplication.
  • Build system (CMake and Buck/Bazel targets) properly separates the two runners as independent compilation units.
  • The greedy decode loop in TransducerRunner::transcribe() faithfully reproduces the original logic from parakeet/main.cpp with proper error handling via ET_CHECK_OK_OR_RETURN_ERROR.
  • TDT duration handling with bounds checking (transducer_runner.cpp:389-396) is correct.
  • The max_symbols_per_step guard (transducer_runner.cpp:446-449) prevents infinite loops on pathological inputs.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures extension/asr/runner/ to support both encoder-decoder (Seq2Seq, e.g. Whisper) and Transducer (RNN‑T/TDT) ASR architectures by introducing separate runner implementations and migrating example/JNI consumers.

Changes:

  • Renames AsrRunner to Seq2SeqRunner (with backward-compatible runner.h redirect and type aliases).
  • Adds TransducerRunner implementing greedy transducer decoding and optional bundled-preprocessor execution.
  • Migrates Whisper/Parakeet examples and Android JNI includes to the new runner headers, and updates build files (CMake/Buck).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
extension/asr/runner/transducer_runner.h Defines new TransducerRunner API, Token type, and TransducerConfig.
extension/asr/runner/transducer_runner.cpp Implements model loading, optional preprocessor invocation, and transducer greedy decode loop.
extension/asr/runner/seq2seq_runner.h Defines Seq2SeqRunner API and backward-compatible AsrRunner aliases.
extension/asr/runner/seq2seq_runner.cpp Renames implementation from AsrRunner to Seq2SeqRunner and updates includes.
extension/asr/runner/runner.h Becomes a thin backward-compat include redirect to seq2seq_runner.h.
extension/asr/runner/targets.bzl Adds Buck targets for seq2seq_runner and transducer_runner libraries.
extension/asr/runner/CMakeLists.txt Builds ASR runner library from seq2seq_runner.cpp + transducer_runner.cpp.
extension/asr/runner/BUCK Adds migration BUCK wrapper for shared targets.
extension/android/jni/jni_layer_asr.cpp Updates include to the new Seq2Seq runner header.
examples/models/whisper/main.cpp Migrates usage from AsrRunner/AsrTranscribeConfig to Seq2SeqRunner/Seq2SeqTranscribeConfig.
examples/models/parakeet/types.h Switches Parakeet Token to alias shared asr::Token.
examples/models/parakeet/main.cpp Replaces inline greedy decode with TransducerRunner preprocess()+transcribe().
examples/models/parakeet/CMakeLists.txt Links Parakeet example against extension_asr_runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


#include <cstdint>
#include <functional>
#include <memory>
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transducer_runner.h uses std::optional/std::nullopt in the TransducerRunner constructor but does not include . Add the missing include so this header compiles when included on its own (including from transducer_runner.cpp).

Suggested change
#include <memory>
#include <memory>
#include <optional>

Copilot uses AI. Check for mistakes.
Comment thread extension/asr/runner/BUCK Outdated
Comment on lines +13 to +15
load(":targets.bzl", "define_common_targets")


Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second load(":targets.bzl", "define_common_targets") appears redundant (the symbol is already loaded above). Consider removing the duplicate to simplify the BUCK file.

Suggested change
load(":targets.bzl", "define_common_targets")

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +121
ET_CHECK_OR_RETURN_ERROR(
blank_id_result.ok(),
Internal,
"Model must export 'blank_id' as a constant_method.");
blank_id_ = blank_id_result.get()[0].toInt();
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_model_metadata() indexes blank_id_result.get()[0] without verifying the constant_method returned at least one value (and the expected scalar type). If the model returns an empty vector or a tensor scalar, this can crash. Consider validating output count/type before calling toInt().

Copilot uses AI. Check for mistakes.
Comment on lines +247 to +251
// --- Encode ---
int64_t mel_len_value = preprocessed_features->size(1);
std::vector<int64_t> mel_len_data = {mel_len_value};
auto mel_len = ::executorch::extension::from_blob(
mel_len_data.data(), {1}, ::executorch::aten::ScalarType::Long);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transcribe() derives mel_len_value from preprocessed_features->size(1). If the preprocessor pads the mel tensor and provides an explicit mel_len, this will pass the padded length to the encoder. Consider threading the true mel_len through (e.g., from preprocess()) instead of inferring it from the tensor shape.

Copilot uses AI. Check for mistakes.
std::vector<::executorch::runtime::EValue>{sos_token, h, c});
ET_CHECK_OK_OR_RETURN_ERROR(decoder_init_result.error());

auto& init_outputs = decoder_init_result.get();
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder_init_result.get() is indexed at [0..2] without checking the output count/types. If decoder_step returns fewer outputs (or non-tensors), this will read out of bounds. Add an explicit output validation similar to the encoder check before indexing.

Suggested change
auto& init_outputs = decoder_init_result.get();
auto& init_outputs = decoder_init_result.get();
if (init_outputs.size() < 3 || !init_outputs[0].isTensor() ||
!init_outputs[1].isTensor() || !init_outputs[2].isTensor()) {
ET_LOG(
Error,
"Method %s returned invalid outputs: expected at least 3 tensor outputs, got %zu",
kDecoderStepMethodName,
init_outputs.size());
return Error::InvalidArgument;
}

Copilot uses AI. Check for mistakes.
struct Token {
uint64_t id;
int64_t start_offset; // Frame index in the encoder output
int64_t duration; // TDT duration value (0 for standard RNN-T)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token::duration is documented as "0 for standard RNN-T", but the implementation uses dur=1 when durations is empty (and TransducerConfig also describes standard RNN-T as duration==1). Please reconcile the docs/contract so downstream timestamp code can interpret duration consistently.

Suggested change
int64_t duration; // TDT duration value (0 for standard RNN-T)
int64_t duration; // Duration in encoder frames (1 for standard RNN-T)

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +51
// TDT duration values. Empty means standard RNN-T (duration always 1).
// If the model exports a "durations" constant_method, it will be used
// instead of this field.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransducerConfig says a model-exported "durations" constant_method will override config.durations, but TransducerRunner::load()/load_model_metadata() never reads such a constant_method. Either implement that override or update the comment to match actual behavior.

Suggested change
// TDT duration values. Empty means standard RNN-T (duration always 1).
// If the model exports a "durations" constant_method, it will be used
// instead of this field.
// TDT duration values used by the decode loop. Empty means standard
// RNN-T (duration always 1).

Copilot uses AI. Check for mistakes.
#pragma once

#include <functional>
#include <memory>
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq2seq_runner.h uses std::optional in the Seq2SeqRunner constructor signature but doesn't include . It currently relies on transitive includes; please add to keep the header self-contained and avoid include-order fragility.

Suggested change
#include <memory>
#include <memory>
#include <optional>

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +235
Internal,
"Preprocessor returned unexpected output.");

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess() only consumes outputs[0] from the preprocessor, but the header documents the method as returning (mel, mel_len). If mel_len is provided (or mel is padded), dropping it can cause transcribe() to pass an incorrect length to the encoder. Consider consuming outputs[1] when present and using it to compute/override mel_len.

Copilot uses AI. Check for mistakes.
Comment on lines +382 to +386
int64_t k = joint_result.get()[0].toTensor().const_data_ptr<int64_t>()[0];

// Compute frame advance duration
int64_t dur = 1; // default for standard RNN-T
if (is_tdt) {
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joint_result.get() is indexed at [0] (and [1] when is_tdt) without validating the number/type of outputs. Add a guard that checks output count and that the expected outputs are tensors/scalars before reading const_data_ptr<int64_t>().

Copilot uses AI. Check for mistakes.
kirklandsign added a commit that referenced this pull request Apr 17, 2026
Summary:
Pull Request resolved: #18961

Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures:

1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h)
   - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h`
   - Old `runner.h` kept as a thin include redirect for backward compatibility

2. **Add** `TransducerRunner` for RNN-T/TDT models
   - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp`
   - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
   - Supports both standard RNN-T (duration=1) and TDT (variable durations)
   - Returns `vector<Token>` with frame offsets for downstream timestamp computation
   - Includes `preprocess()` method for models with bundled preprocessor
   - Exposes tokenizer via `tokenizer()` getter for downstream use

3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token`

4. **Migrate consumers**
   - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner`
   - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()`
   - Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`.

Differential Revision: D100892465
@meta-codesync meta-codesync Bot changed the title Add TransducerRunner and rename AsrRunner to Seq2SeqRunner Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961) Apr 17, 2026
kirklandsign added a commit that referenced this pull request Apr 17, 2026
Summary:
Pull Request resolved: #18961

Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures:

1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h)
   - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h`
   - Old `runner.h` kept as a thin include redirect for backward compatibility

2. **Add** `TransducerRunner` for RNN-T/TDT models
   - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp`
   - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
   - Supports both standard RNN-T (duration=1) and TDT (variable durations)
   - Returns `vector<Token>` with frame offsets for downstream timestamp computation
   - Includes `preprocess()` method for models with bundled preprocessor
   - Exposes tokenizer via `tokenizer()` getter for downstream use

3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token`

4. **Migrate consumers**
   - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner`
   - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()`
   - Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`.

Differential Revision: D100892465
kirklandsign added a commit that referenced this pull request Apr 21, 2026
Summary:
Pull Request resolved: #18961

Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures:

1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h)
   - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h`
   - Old `runner.h` kept as a thin include redirect for backward compatibility

2. **Add** `TransducerRunner` for RNN-T/TDT models
   - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp`
   - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
   - Supports both standard RNN-T (duration=1) and TDT (variable durations)
   - Returns `vector<Token>` with frame offsets for downstream timestamp computation
   - Includes `preprocess()` method for models with bundled preprocessor
   - Exposes tokenizer via `tokenizer()` getter for downstream use

3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token`

4. **Migrate consumers**
   - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner`
   - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()`
   - Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`.

Differential Revision: D100892465
Copilot AI review requested due to automatic review settings April 21, 2026 19:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 24 to +31
set(runner_deps executorch_core extension_module extension_tensor
tokenizers::tokenizers
)

# Define runner library
add_library(extension_asr_runner STATIC runner.cpp)
add_library(
extension_asr_runner STATIC seq2seq_runner.cpp transducer_runner.cpp
)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extension_asr_runner depends on non-header-only LLM runner utilities (e.g., load_tokenizer/get_eos_ids from llm_runner_helper.cpp and Sampler from sampler.cpp). The CMake target currently links only executorch_core/extension_module/extension_tensor/tokenizers, so consumers that link just extension_asr_runner may hit undefined references. Consider adding extension_llm_runner (and extension_llm_sampler for Seq2SeqRunner) to runner_deps/target_link_libraries to match the BUCK deps in extension/asr/runner/targets.bzl.

Copilot uses AI. Check for mistakes.
* Required module methods:
* - "encoder": (mel, mel_len) -> (f_proj, encoded_len)
* - "decoder_step": (token, h, c) -> (g_proj, h', c')
* - "joint": (f_t, g_proj) -> (token_id, duration_idx)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transducer_runner.h docs say the "joint" method returns (token_id, duration_idx), but transducer_runner.cpp treats duration_idx as optional (it only requires a second output when config.durations is non-empty). Please update the required method signature comment to match actual behavior (e.g., (token_id) for standard RNN-T, and (token_id, duration_idx) for TDT).

Suggested change
* - "joint": (f_t, g_proj) -> (token_id, duration_idx)
* - "joint": (f_t, g_proj) -> (token_id) for standard RNN-T,
* or (token_id, duration_idx) for TDT

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +234
int64_t audio_len = raw_audio->numel();
std::vector<int64_t> audio_len_data = {audio_len};
auto audio_len_tensor = ::executorch::extension::from_blob(
audio_len_data.data(), {1}, ::executorch::aten::ScalarType::Long);

auto result = module_->execute(
kPreprocessorMethodName,
std::vector<::executorch::runtime::EValue>{raw_audio, audio_len_tensor});
ET_CHECK_OK_OR_RETURN_ERROR(result.error());

auto& outputs = result.get();
ET_CHECK_OR_RETURN_ERROR(
!outputs.empty() && outputs[0].isTensor(),
Internal,
"Preprocessor returned unexpected output.");

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess() calls the model's "preprocessor" which (per the header comment) returns (mel, mel_len), but the implementation only validates/returns outputs[0] and discards mel_len. transcribe() then synthesizes mel_len from features->size(1), which can be wrong if the preprocessor pads to a fixed length. Consider returning both (features, features_len) from preprocess(), or accepting an explicit mel_len/length tensor in transcribe(), and validate the preprocessor output arity accordingly.

Copilot uses AI. Check for mistakes.
enc_outputs.size());

auto f_proj = enc_outputs[0].toTensor(); // [B, T, joint_hidden]
int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoded_len is read from the encoder's second output and then used to index into f_proj as if it were <= f_proj.size(1). If encoded_len is larger than the actual time dimension (e.g., corrupted model output), the decode loop will read past f_proj's buffer. Add a sanity check that encoded_len is within [0, f_proj.size(1)] before using it.

Suggested change
int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];
int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];
ET_CHECK_OR_RETURN_ERROR(
encoded_len >= 0 && encoded_len <= f_proj.size(1),
InvalidArgument,
"Encoder returned invalid encoded_len=%ld for f_proj time dimension=%ld",
static_cast<long>(encoded_len),
static_cast<long>(f_proj.size(1)));

Copilot uses AI. Check for mistakes.
init_outputs.size());
auto g_proj_init = init_outputs[0].toTensor();
auto new_h_init = init_outputs[1].toTensor();
auto new_c_init = init_outputs[2].toTensor();
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These memcpy() calls assume the decoder_step outputs (h, c) have exactly the expected byte sizes derived from num_rnn_layers_/pred_hidden_ and the input dtypes. If the model exports different shapes/dtypes, this can read/write out of bounds. Please validate output tensor numel()/nbytes() matches the destination buffers before memcpy (and return an error if mismatched).

Suggested change
auto new_c_init = init_outputs[2].toTensor();
auto new_c_init = init_outputs[2].toTensor();
size_t new_h_init_num_bytes =
static_cast<size_t>(new_h_init.numel()) *
::executorch::runtime::elementSize(new_h_init.scalar_type());
size_t new_c_init_num_bytes =
static_cast<size_t>(new_c_init.numel()) *
::executorch::runtime::elementSize(new_c_init.scalar_type());
ET_CHECK_OR_RETURN_ERROR(
new_h_init.scalar_type() == h_dtype,
InvalidArgument,
"Method %s returned h with unexpected dtype.",
kDecoderStepMethodName);
ET_CHECK_OR_RETURN_ERROR(
new_c_init.scalar_type() == c_dtype,
InvalidArgument,
"Method %s returned c with unexpected dtype.",
kDecoderStepMethodName);
ET_CHECK_OR_RETURN_ERROR(
new_h_init_num_bytes == h_data.size(),
InvalidArgument,
"Method %s returned h with %zu bytes, expected %zu.",
kDecoderStepMethodName,
new_h_init_num_bytes,
h_data.size());
ET_CHECK_OR_RETURN_ERROR(
new_c_init_num_bytes == c_data.size(),
InvalidArgument,
"Method %s returned c with %zu bytes, expected %zu.",
kDecoderStepMethodName,
new_c_init_num_bytes,
c_data.size());

Copilot uses AI. Check for mistakes.
Comment on lines +460 to +466
auto new_g_proj = outputs[0].toTensor();
auto new_h = outputs[1].toTensor();
auto new_c = outputs[2].toTensor();
std::memcpy(h_data.data(), new_h.const_data_ptr(), h_data.size());
std::memcpy(c_data.data(), new_c.const_data_ptr(), c_data.size());
std::memcpy(
g_proj_data.data(), new_g_proj.const_data_ptr(), g_proj_data.size());
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These memcpy() calls assume new_h/new_c/new_g_proj have byte sizes matching the preallocated h_data/c_data/g_proj_data buffers. If the model returns tensors with different shapes (or if proj_dim differs), this can cause out-of-bounds reads/writes. Please check tensor nbytes() (or numel()*elem_size) against the destination buffer sizes and return an error on mismatch.

Copilot uses AI. Check for mistakes.
Comment on lines +6 to 12
#include <executorch/extension/asr/runner/transducer_runner.h>

namespace parakeet {

// Matches output type of tokenizers::Tokenizer methods
// Use the shared Token type from the ASR runner.
using Token = ::executorch::extension::asr::Token;
using TokenId = uint64_t;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/models/parakeet/types.h now includes transducer_runner.h just to alias the Token struct. That header pulls in Module, tokenizer, and stats headers, which couples this example's basic types to the full runner implementation and can noticeably increase rebuild costs. Consider moving executorch::extension::asr::Token into a small dedicated header (e.g., token.h) with minimal includes, and include that here instead.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for consolidating parakeet runner!

@kirklandsign
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @kirklandsign's task in 1m 33s —— View job


Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

  • Read all changed files
  • Review TransducerRunner implementation (transducer_runner.h/cpp)
  • Review Seq2SeqRunner rename and backward compatibility
  • Review consumer migrations (Whisper, Parakeet, Android JNI)
  • Review build system changes (CMake, Buck, targets.bzl)
  • Post review feedback

Good restructuring overall. The separation of Seq2Seq and Transducer runners is clean, and the extraction from parakeet/main.cpp into a reusable runner is a meaningful improvement. The backward-compat alias strategy is low-risk. Noting a few items worth addressing:

Issues

1. Unsafe memcpy from model outputs without size validationtransducer_runner.cpp:332-333,463-466

The memcpy calls copy decoder outputs (h, c, g_proj) into pre-allocated buffers using the buffer size (h_data.size()) as the byte count, but read from the model tensor via const_data_ptr() without verifying the model tensor's actual byte size matches. If the model returns tensors with fewer elements than expected (different num_rnn_layers or pred_hidden than metadata claims), this reads out of bounds from the source tensor. A bounds check on new_h.nbytes() >= h_data.size() before each memcpy would prevent this.

2. Standard RNN-T decode loop advances time on non-blank emissionstransducer_runner.cpp:468

In the non-blank branch, t += dur is executed unconditionally (where dur = 1 for standard RNN-T). Standard RNN-T greedy decoding typically holds the time index fixed on label emissions and only advances on blank, with a max_symbols_per_step guard to prevent infinite loops on a single frame. The current logic advances time on every non-blank token in standard RNN-T mode, which means the decoder can only emit one token per encoder frame. This is correct for TDT but may not match expected RNN-T behavior. If Parakeet-TDT is the only intended consumer, this is fine — but the class name and docs say "RNN-T, TDT, HAT" generically.

3. g_proj shape assumes encoder and decoder hidden dims matchtransducer_runner.cpp:377-380

g_proj is created with shape {1, 1, proj_dim} where proj_dim comes from the encoder's output dimension (f_proj.sizes()[2]). If the decoder's projection output has a different hidden dimension, the from_blob will wrap the wrong number of bytes. This assumption holds for standard transducer architectures but is implicit — a shape mismatch would silently produce incorrect joint network inputs.

4. preprocess() discards mel_len from preprocessor outputtransducer_runner.cpp:226-233

The header documents the preprocessor as returning (mel, mel_len), but preprocess() only returns outputs[0] (the mel tensor). transcribe() then infers mel_len from preprocessed_features->size(1) at line 246. If the preprocessor pads the mel tensor to a fixed length, the inferred length will be wrong, causing the encoder to process padding frames. Consider returning or threading through the actual mel_len.

5. Redundant module load for timestamp metadata in parakeet/main.cpp:194-226

A second Module is constructed and fully loaded just to read sample_rate, window_stride, and encoder_subsampling_factor constant methods. Since TransducerRunner already loads the same .pte and reads similar metadata, this doubles the module load cost. Exposing a module() accessor or a generic read_constant_method() on the runner would avoid this.

6. CMake runner_deps missing LLM runner helper dependencyCMakeLists.txt:24-26

The Buck targets in targets.bzl list //executorch/extension/llm/runner:runner_lib as a dependency (which provides load_tokenizer, get_eos_ids, time_in_ms, etc.), and seq2seq_runner.cpp also needs sampler. The CMake target only links executorch_core extension_module extension_tensor tokenizers::tokenizers. Depending on how consumers link, this could cause undefined reference errors for load_tokenizer, time_in_ms, logits_to_token, Sampler, etc. The Parakeet CMakeLists works around this by also linking extension_llm_runner, but that shouldn't be required as a transitive dependency workaround.

Minor

  • transducer_runner.cpp:266encoded_len is read from enc_outputs[1] without checking the tensor is non-empty. A zero-element tensor would be UB via const_data_ptr<int64_t>()[0].
  • transducer_runner.h:39-40 — The Token::duration doc says "For standard RNN-T: always 1" but TransducerConfig doc and the decode loop produce dur=1 for non-blank and variable dur for blank. This is consistent in practice but the comment on Token::duration could be clearer that it represents frames consumed by that emission.
  • transducer_runner.h:66 — The docstring says joint returns (token_id, duration_idx), but duration_idx is only present for TDT. The doc should note this is TDT-only.

Looks Good

  • Backward-compat runner.h redirect and AsrRunner/AsrTranscribeConfig aliases are clean.
  • Android JNI correctly uses the alias — no API breakage.
  • Token type unification with parakeet::Token aliasing asr::Token avoids duplication.
  • load_model_metadata() with the read_int_constant lambda is well-structured with proper validation.
  • Joint output validation at transducer_runner.cpp:388-397 correctly varies the expected output count for TDT vs standard.
  • max_symbols_per_step guard at transducer_runner.cpp:472-474 prevents infinite loops.
  • Buck/Bazel targets properly separate the two runners as independent compilation units.

View job

Summary:
Pull Request resolved: #18961

Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures:

1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h)
   - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h`
   - Old `runner.h` kept as a thin include redirect for backward compatibility

2. **Add** `TransducerRunner` for RNN-T/TDT models
   - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp`
   - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
   - Supports both standard RNN-T (duration=1) and TDT (variable durations)
   - Returns `vector<Token>` with frame offsets for downstream timestamp computation
   - Includes `preprocess()` method for models with bundled preprocessor
   - Exposes tokenizer via `tokenizer()` getter for downstream use

3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token`

4. **Migrate consumers**
   - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner`
   - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()`
   - Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`.

Differential Revision: D100892465
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants