Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961) by kirklandsign · Pull Request #18961 · pytorch/executorch

kirklandsign · 2026-04-17T00:10:22Z

Summary:

Restructure extension/asr/runner/ to support both Seq2Seq and Transducer ASR architectures:

Rename AsrRunner → Seq2SeqRunner (with backward-compat alias in runner.h)
- runner.cpp → seq2seq_runner.cpp, runner.h → seq2seq_runner.h
- Old runner.h kept as a thin include redirect for backward compatibility
Add TransducerRunner for RNN-T/TDT models
- Extracts the ~190-line greedy decode loop from examples/models/parakeet/main.cpp
- Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
- Supports both standard RNN-T (duration=1) and TDT (variable durations)
- Returns vector<Token> with frame offsets for downstream timestamp computation
- Includes preprocess() method for models with bundled preprocessor
- Exposes tokenizer via tokenizer() getter for downstream use
Unify Token type: parakeet::Token is now an alias for asr::Token
Migrate consumers
- Whisper main.cpp: AsrRunner → Seq2SeqRunner
- Parakeet main.cpp: inline decode → TransducerRunner::transcribe()
- Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in examples/models/parakeet/.

Differential Revision: D100892465

pytorch-bot · 2026-04-17T00:10:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18961

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 3 New Failures, 2 Unrelated Failures

As of commit 828bce5 with merge base 35b9054 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner (gh)
>>> Lint for extension/asr/runner/transducer_runner.cpp:
pull / test-parakeet-xnnpack-linux / linux-job (gh)
RuntimeError: Command docker exec -t 3faeb9e612df47462a9e4d229837767b498f76859a67c968bf24cda00126dcd4 /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model

FLAKY - The following job failed but was likely due to flakiness present on trunk:

MLX / test-mlx-parakeet / test-mlx-parakeet (gh) (similar failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

MLX / test-mlx-qwen35-moe / test-mlx-qwen35-moe (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-17T00:11:04Z

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100892465.

kirklandsign · 2026-04-17T00:11:25Z

@claude review

github-actions · 2026-04-17T00:11:44Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

claude · 2026-04-17T00:12:13Z

Claude finished @kirklandsign's task in 1m 28s —— View job

Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

Read all changed files and gather context
Review TransducerRunner implementation
Review Seq2SeqRunner rename and backward compatibility
Review consumer migrations (Whisper, Parakeet, Android JNI)
Review build system changes
Post review feedback

Summary

Well-structured refactoring that cleanly separates Seq2Seq (Whisper) and Transducer (RNN-T/TDT) ASR architectures. The extraction of ~190 lines of greedy decode logic from examples/models/parakeet/main.cpp into a reusable TransducerRunner is a good move. The backward-compat layer for AsrRunner is handled correctly.

Issues

1. stats_ is never reset between calls in TransducerRunner::transcribe()

transducer_runner.cpp:245-271 — The stats_ member accumulates across multiple transcribe() calls without being reset. If a user calls transcribe() twice, model_load_start_ms / model_load_end_ms from the first load() persist, but inference_start_ms gets overwritten. This produces misleading statistics for the second call. The Seq2SeqRunner has the same pre-existing pattern, so this is not a regression, but worth noting for both runners.

2. Redundant model loading for timestamp metadata in parakeet/main.cpp

examples/models/parakeet/main.cpp:195-227 — A second Module is created just to read sample_rate, window_stride, and encoder_subsampling_factor constant methods. These are read via execute() on an empty input vector, which requires loading the full module again. Since TransducerRunner already loads the same module and reads similar metadata (blank_id, num_rnn_layers, pred_hidden), consider either:

Exposing a module() accessor on TransducerRunner for downstream constant_method queries, or
Having TransducerRunner::load_model_metadata() read these extra fields too (perhaps as optional metadata).

This duplicates the module load cost at runtime.

3. BUCK file has duplicated load statement

extension/asr/runner/BUCK:8,13 — load(":targets.bzl", "define_common_targets") appears twice. This appears to be intentional (one for non_fbcode_target, one for fbcode_target), but the duplicate load on line 13 is redundant since Starlark load is idempotent. Not a bug, but slightly unusual.

Minor / Nits

4. preprocess() return type constructs TensorPtr manually

transducer_runner.cpp:234-235:

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));

This works but is inconsistent with how TensorPtr is typically created elsewhere in the codebase (e.g., via from_blob or the tensor_ptr_maker utilities). It also moves from a local Tensor into a shared_ptr, which is fine functionally.

5. Potential out-of-bounds in encoder output access

transducer_runner.cpp:268:

int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];

No bounds check on the tensor having at least one element. If the encoder returns a 0-element length tensor, this would be undefined behavior. The check at line 261-265 verifies both outputs exist and are tensors, but not that the length tensor is non-empty.

6. g_proj shape assumes it matches proj_dim

transducer_runner.cpp:372-375 — The g_proj tensor is created with shape {1, 1, proj_dim} where proj_dim comes from the encoder's output. If the decoder's g_proj output has a different hidden dimension than the encoder's f_proj, this would silently produce incorrect results. In practice these likely always match for transducer architectures, but the assumption is implicit.

7. Missing #include <optional> in transducer_runner.h

transducer_runner.h:79 uses std::optional<std::string> but doesn't include <optional>. It likely gets pulled in transitively through other headers, but for correctness it should be explicitly included.

Looks Good

Backward-compat alias strategy (runner.h → thin redirect, AsrRunner = Seq2SeqRunner) is clean and low-risk.
Android JNI layer correctly uses the alias (asr::AsrRunner) so no API breakage.
Token type unification (parakeet::Token → alias for asr::Token) avoids duplication.
Build system (CMake and Buck/Bazel targets) properly separates the two runners as independent compilation units.
The greedy decode loop in TransducerRunner::transcribe() faithfully reproduces the original logic from parakeet/main.cpp with proper error handling via ET_CHECK_OK_OR_RETURN_ERROR.
TDT duration handling with bounds checking (transducer_runner.cpp:389-396) is correct.
The max_symbols_per_step guard (transducer_runner.cpp:446-449) prevents infinite loops on pathological inputs.

Copilot

Pull request overview

This PR restructures extension/asr/runner/ to support both encoder-decoder (Seq2Seq, e.g. Whisper) and Transducer (RNN‑T/TDT) ASR architectures by introducing separate runner implementations and migrating example/JNI consumers.

Changes:

Renames AsrRunner to Seq2SeqRunner (with backward-compatible runner.h redirect and type aliases).
Adds TransducerRunner implementing greedy transducer decoding and optional bundled-preprocessor execution.
Migrates Whisper/Parakeet examples and Android JNI includes to the new runner headers, and updates build files (CMake/Buck).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
extension/asr/runner/transducer_runner.h	Defines new TransducerRunner API, Token type, and TransducerConfig.
extension/asr/runner/transducer_runner.cpp	Implements model loading, optional preprocessor invocation, and transducer greedy decode loop.
extension/asr/runner/seq2seq_runner.h	Defines Seq2SeqRunner API and backward-compatible `AsrRunner` aliases.
extension/asr/runner/seq2seq_runner.cpp	Renames implementation from AsrRunner to Seq2SeqRunner and updates includes.
extension/asr/runner/runner.h	Becomes a thin backward-compat include redirect to seq2seq_runner.h.
extension/asr/runner/targets.bzl	Adds Buck targets for seq2seq_runner and transducer_runner libraries.
extension/asr/runner/CMakeLists.txt	Builds ASR runner library from seq2seq_runner.cpp + transducer_runner.cpp.
extension/asr/runner/BUCK	Adds migration BUCK wrapper for shared targets.
extension/android/jni/jni_layer_asr.cpp	Updates include to the new Seq2Seq runner header.
examples/models/whisper/main.cpp	Migrates usage from AsrRunner/AsrTranscribeConfig to Seq2SeqRunner/Seq2SeqTranscribeConfig.
examples/models/parakeet/types.h	Switches Parakeet Token to alias shared asr::Token.
examples/models/parakeet/main.cpp	Replaces inline greedy decode with TransducerRunner preprocess()+transcribe().
examples/models/parakeet/CMakeLists.txt	Links Parakeet example against extension_asr_runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T00:19:23Z

+
+#include <cstdint>
+#include <functional>
+#include <memory>


transducer_runner.h uses std::optional/std::nullopt in the TransducerRunner constructor but does not include . Add the missing include so this header compiles when included on its own (including from transducer_runner.cpp).

Suggested change

#include <memory>

#include <memory>

#include <optional>

Copilot · 2026-04-17T00:19:24Z

+load(":targets.bzl", "define_common_targets")
+
+


This second load(":targets.bzl", "define_common_targets") appears redundant (the symbol is already loaded above). Consider removing the duplicate to simplify the BUCK file.

Suggested change

load(":targets.bzl", "define_common_targets")

Copilot · 2026-04-17T00:19:24Z

+  ET_CHECK_OR_RETURN_ERROR(
+      blank_id_result.ok(),
+      Internal,
+      "Model must export 'blank_id' as a constant_method.");
+  blank_id_ = blank_id_result.get()[0].toInt();


load_model_metadata() indexes blank_id_result.get()[0] without verifying the constant_method returned at least one value (and the expected scalar type). If the model returns an empty vector or a tensor scalar, this can crash. Consider validating output count/type before calling toInt().

Copilot · 2026-04-17T00:19:24Z

+  // --- Encode ---
+  int64_t mel_len_value = preprocessed_features->size(1);
+  std::vector<int64_t> mel_len_data = {mel_len_value};
+  auto mel_len = ::executorch::extension::from_blob(
+      mel_len_data.data(), {1}, ::executorch::aten::ScalarType::Long);


transcribe() derives mel_len_value from preprocessed_features->size(1). If the preprocessor pads the mel tensor and provides an explicit mel_len, this will pass the padded length to the encoder. Consider threading the true mel_len through (e.g., from preprocess()) instead of inferring it from the tensor shape.

Copilot · 2026-04-17T00:19:24Z

+      std::vector<::executorch::runtime::EValue>{sos_token, h, c});
+  ET_CHECK_OK_OR_RETURN_ERROR(decoder_init_result.error());
+
+  auto& init_outputs = decoder_init_result.get();


decoder_init_result.get() is indexed at [0..2] without checking the output count/types. If decoder_step returns fewer outputs (or non-tensors), this will read out of bounds. Add an explicit output validation similar to the encoder check before indexing.

Suggested change

auto& init_outputs = decoder_init_result.get();

auto& init_outputs = decoder_init_result.get();

if (init_outputs.size() < 3 || !init_outputs[0].isTensor() ||

!init_outputs[1].isTensor() || !init_outputs[2].isTensor()) {

ET_LOG(

Error,

"Method %s returned invalid outputs: expected at least 3 tensor outputs, got %zu",

kDecoderStepMethodName,

init_outputs.size());

return Error::InvalidArgument;

}

Copilot · 2026-04-17T00:19:25Z

+struct Token {
+  uint64_t id;
+  int64_t start_offset; // Frame index in the encoder output
+  int64_t duration; // TDT duration value (0 for standard RNN-T)


Token::duration is documented as "0 for standard RNN-T", but the implementation uses dur=1 when durations is empty (and TransducerConfig also describes standard RNN-T as duration==1). Please reconcile the docs/contract so downstream timestamp code can interpret duration consistently.

Suggested change

int64_t duration; // TDT duration value (0 for standard RNN-T)

int64_t duration; // Duration in encoder frames (1 for standard RNN-T)

Copilot · 2026-04-17T00:19:25Z

+  // TDT duration values. Empty means standard RNN-T (duration always 1).
+  // If the model exports a "durations" constant_method, it will be used
+  // instead of this field.


TransducerConfig says a model-exported "durations" constant_method will override config.durations, but TransducerRunner::load()/load_model_metadata() never reads such a constant_method. Either implement that override or update the comment to match actual behavior.

Suggested change

// TDT duration values. Empty means standard RNN-T (duration always 1).

// If the model exports a "durations" constant_method, it will be used

// instead of this field.

// TDT duration values used by the decode loop. Empty means standard

// RNN-T (duration always 1).

Copilot · 2026-04-17T00:19:26Z

+#pragma once
+
+#include <functional>
+#include <memory>


seq2seq_runner.h uses std::optional in the Seq2SeqRunner constructor signature but doesn't include . It currently relies on transitive includes; please add to keep the header self-contained and avoid include-order fragility.

Suggested change

#include <memory>

#include <memory>

#include <optional>

Copilot · 2026-04-17T00:19:26Z

+      Internal,
+      "Preprocessor returned unexpected output.");
+
+  auto mel = outputs[0].toTensor();
+  return std::make_shared<::executorch::aten::Tensor>(std::move(mel));


preprocess() only consumes outputs[0] from the preprocessor, but the header documents the method as returning (mel, mel_len). If mel_len is provided (or mel is padded), dropping it can cause transcribe() to pass an incorrect length to the encoder. Consider consuming outputs[1] when present and using it to compute/override mel_len.

Copilot · 2026-04-17T00:19:26Z

+    int64_t k = joint_result.get()[0].toTensor().const_data_ptr<int64_t>()[0];
+
+    // Compute frame advance duration
+    int64_t dur = 1; // default for standard RNN-T
+    if (is_tdt) {


joint_result.get() is indexed at [0] (and [1] when is_tdt) without validating the number/type of outputs. Add a guard that checks output count and that the expected outputs are tensors/scalars before reading const_data_ptr<int64_t>().

Summary: Pull Request resolved: #18961 Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures: 1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h) - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h` - Old `runner.h` kept as a thin include redirect for backward compatibility 2. **Add** `TransducerRunner` for RNN-T/TDT models - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp` - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods - Supports both standard RNN-T (duration=1) and TDT (variable durations) - Returns `vector<Token>` with frame offsets for downstream timestamp computation - Includes `preprocess()` method for models with bundled preprocessor - Exposes tokenizer via `tokenizer()` getter for downstream use 3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token` 4. **Migrate consumers** - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner` - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()` - Android JNI: updated include (uses backward-compat alias, no API change) Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`. Differential Revision: D100892465

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T19:36:18Z

 set(runner_deps executorch_core extension_module extension_tensor
                tokenizers::tokenizers
 )

 # Define runner library
-add_library(extension_asr_runner STATIC runner.cpp)
+add_library(
+  extension_asr_runner STATIC seq2seq_runner.cpp transducer_runner.cpp
+)


extension_asr_runner depends on non-header-only LLM runner utilities (e.g., load_tokenizer/get_eos_ids from llm_runner_helper.cpp and Sampler from sampler.cpp). The CMake target currently links only executorch_core/extension_module/extension_tensor/tokenizers, so consumers that link just extension_asr_runner may hit undefined references. Consider adding extension_llm_runner (and extension_llm_sampler for Seq2SeqRunner) to runner_deps/target_link_libraries to match the BUCK deps in extension/asr/runner/targets.bzl.

Copilot · 2026-04-21T19:36:19Z

+ * Required module methods:
+ *  - "encoder":      (mel, mel_len) -> (f_proj, encoded_len)
+ *  - "decoder_step": (token, h, c) -> (g_proj, h', c')
+ *  - "joint":        (f_t, g_proj) -> (token_id, duration_idx)


transducer_runner.h docs say the "joint" method returns (token_id, duration_idx), but transducer_runner.cpp treats duration_idx as optional (it only requires a second output when config.durations is non-empty). Please update the required method signature comment to match actual behavior (e.g., (token_id) for standard RNN-T, and (token_id, duration_idx) for TDT).

Suggested change

* - "joint": (f_t, g_proj) -> (token_id, duration_idx)

* - "joint": (f_t, g_proj) -> (token_id) for standard RNN-T,

* or (token_id, duration_idx) for TDT

Copilot · 2026-04-21T19:36:19Z

+  int64_t audio_len = raw_audio->numel();
+  std::vector<int64_t> audio_len_data = {audio_len};
+  auto audio_len_tensor = ::executorch::extension::from_blob(
+      audio_len_data.data(), {1}, ::executorch::aten::ScalarType::Long);
+
+  auto result = module_->execute(
+      kPreprocessorMethodName,
+      std::vector<::executorch::runtime::EValue>{raw_audio, audio_len_tensor});
+  ET_CHECK_OK_OR_RETURN_ERROR(result.error());
+
+  auto& outputs = result.get();
+  ET_CHECK_OR_RETURN_ERROR(
+      !outputs.empty() && outputs[0].isTensor(),
+      Internal,
+      "Preprocessor returned unexpected output.");
+
+  auto mel = outputs[0].toTensor();
+  return std::make_shared<::executorch::aten::Tensor>(std::move(mel));
+}


preprocess() calls the model's "preprocessor" which (per the header comment) returns (mel, mel_len), but the implementation only validates/returns outputs[0] and discards mel_len. transcribe() then synthesizes mel_len from features->size(1), which can be wrong if the preprocessor pads to a fixed length. Consider returning both (features, features_len) from preprocess(), or accepting an explicit mel_len/length tensor in transcribe(), and validate the preprocessor output arity accordingly.

Copilot · 2026-04-21T19:36:19Z

+      enc_outputs.size());
+
+  auto f_proj = enc_outputs[0].toTensor(); // [B, T, joint_hidden]
+  int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];


encoded_len is read from the encoder's second output and then used to index into f_proj as if it were <= f_proj.size(1). If encoded_len is larger than the actual time dimension (e.g., corrupted model output), the decode loop will read past f_proj's buffer. Add a sanity check that encoded_len is within [0, f_proj.size(1)] before using it.

Suggested change

int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];

int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];

ET_CHECK_OR_RETURN_ERROR(

encoded_len >= 0 && encoded_len <= f_proj.size(1),

InvalidArgument,

"Encoder returned invalid encoded_len=%ld for f_proj time dimension=%ld",

static_cast<long>(encoded_len),

static_cast<long>(f_proj.size(1)));

Copilot · 2026-04-21T19:36:20Z

+      init_outputs.size());
+  auto g_proj_init = init_outputs[0].toTensor();
+  auto new_h_init = init_outputs[1].toTensor();
+  auto new_c_init = init_outputs[2].toTensor();


These memcpy() calls assume the decoder_step outputs (h, c) have exactly the expected byte sizes derived from num_rnn_layers_/pred_hidden_ and the input dtypes. If the model exports different shapes/dtypes, this can read/write out of bounds. Please validate output tensor numel()/nbytes() matches the destination buffers before memcpy (and return an error if mismatched).

Suggested change

auto new_c_init = init_outputs[2].toTensor();

auto new_c_init = init_outputs[2].toTensor();

size_t new_h_init_num_bytes =

static_cast<size_t>(new_h_init.numel()) *

::executorch::runtime::elementSize(new_h_init.scalar_type());

size_t new_c_init_num_bytes =

static_cast<size_t>(new_c_init.numel()) *

::executorch::runtime::elementSize(new_c_init.scalar_type());

ET_CHECK_OR_RETURN_ERROR(

new_h_init.scalar_type() == h_dtype,

InvalidArgument,

"Method %s returned h with unexpected dtype.",

kDecoderStepMethodName);

ET_CHECK_OR_RETURN_ERROR(

new_c_init.scalar_type() == c_dtype,

InvalidArgument,

"Method %s returned c with unexpected dtype.",

kDecoderStepMethodName);

ET_CHECK_OR_RETURN_ERROR(

new_h_init_num_bytes == h_data.size(),

InvalidArgument,

"Method %s returned h with %zu bytes, expected %zu.",

kDecoderStepMethodName,

new_h_init_num_bytes,

h_data.size());

ET_CHECK_OR_RETURN_ERROR(

new_c_init_num_bytes == c_data.size(),

InvalidArgument,

"Method %s returned c with %zu bytes, expected %zu.",

kDecoderStepMethodName,

new_c_init_num_bytes,

c_data.size());

Copilot · 2026-04-21T19:36:20Z

+      auto new_g_proj = outputs[0].toTensor();
+      auto new_h = outputs[1].toTensor();
+      auto new_c = outputs[2].toTensor();
+      std::memcpy(h_data.data(), new_h.const_data_ptr(), h_data.size());
+      std::memcpy(c_data.data(), new_c.const_data_ptr(), c_data.size());
+      std::memcpy(
+          g_proj_data.data(), new_g_proj.const_data_ptr(), g_proj_data.size());


These memcpy() calls assume new_h/new_c/new_g_proj have byte sizes matching the preallocated h_data/c_data/g_proj_data buffers. If the model returns tensors with different shapes (or if proj_dim differs), this can cause out-of-bounds reads/writes. Please check tensor nbytes() (or numel()*elem_size) against the destination buffer sizes and return an error on mismatch.

Copilot · 2026-04-21T19:36:20Z

+#include <executorch/extension/asr/runner/transducer_runner.h>
+
 namespace parakeet {

-// Matches output type of tokenizers::Tokenizer methods
+// Use the shared Token type from the ASR runner.
+using Token = ::executorch::extension::asr::Token;
 using TokenId = uint64_t;


examples/models/parakeet/types.h now includes transducer_runner.h just to alias the Token struct. That header pulls in Module, tokenizer, and stats headers, which couples this example's basic types to the full runner implementation and can noticeably increase rebuild costs. Consider moving executorch::extension::asr::Token into a small dedicated header (e.g., token.h) with minimal includes, and include that here instead.

larryliu0820

Thank you for consolidating parakeet runner!

kirklandsign · 2026-04-24T00:12:30Z

@claude review

claude · 2026-04-24T00:13:10Z

Claude finished @kirklandsign's task in 1m 33s —— View job

Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

Read all changed files
Review TransducerRunner implementation (transducer_runner.h/cpp)
Review Seq2SeqRunner rename and backward compatibility
Review consumer migrations (Whisper, Parakeet, Android JNI)
Review build system changes (CMake, Buck, targets.bzl)
Post review feedback

Good restructuring overall. The separation of Seq2Seq and Transducer runners is clean, and the extraction from parakeet/main.cpp into a reusable runner is a meaningful improvement. The backward-compat alias strategy is low-risk. Noting a few items worth addressing:

Issues

1. Unsafe memcpy from model outputs without size validation — transducer_runner.cpp:332-333,463-466

The memcpy calls copy decoder outputs (h, c, g_proj) into pre-allocated buffers using the buffer size (h_data.size()) as the byte count, but read from the model tensor via const_data_ptr() without verifying the model tensor's actual byte size matches. If the model returns tensors with fewer elements than expected (different num_rnn_layers or pred_hidden than metadata claims), this reads out of bounds from the source tensor. A bounds check on new_h.nbytes() >= h_data.size() before each memcpy would prevent this.

2. Standard RNN-T decode loop advances time on non-blank emissions — transducer_runner.cpp:468

In the non-blank branch, t += dur is executed unconditionally (where dur = 1 for standard RNN-T). Standard RNN-T greedy decoding typically holds the time index fixed on label emissions and only advances on blank, with a max_symbols_per_step guard to prevent infinite loops on a single frame. The current logic advances time on every non-blank token in standard RNN-T mode, which means the decoder can only emit one token per encoder frame. This is correct for TDT but may not match expected RNN-T behavior. If Parakeet-TDT is the only intended consumer, this is fine — but the class name and docs say "RNN-T, TDT, HAT" generically.

3. g_proj shape assumes encoder and decoder hidden dims match — transducer_runner.cpp:377-380

g_proj is created with shape {1, 1, proj_dim} where proj_dim comes from the encoder's output dimension (f_proj.sizes()[2]). If the decoder's projection output has a different hidden dimension, the from_blob will wrap the wrong number of bytes. This assumption holds for standard transducer architectures but is implicit — a shape mismatch would silently produce incorrect joint network inputs.

4. preprocess() discards mel_len from preprocessor output — transducer_runner.cpp:226-233

The header documents the preprocessor as returning (mel, mel_len), but preprocess() only returns outputs[0] (the mel tensor). transcribe() then infers mel_len from preprocessed_features->size(1) at line 246. If the preprocessor pads the mel tensor to a fixed length, the inferred length will be wrong, causing the encoder to process padding frames. Consider returning or threading through the actual mel_len.

5. Redundant module load for timestamp metadata in parakeet/main.cpp:194-226

A second Module is constructed and fully loaded just to read sample_rate, window_stride, and encoder_subsampling_factor constant methods. Since TransducerRunner already loads the same .pte and reads similar metadata, this doubles the module load cost. Exposing a module() accessor or a generic read_constant_method() on the runner would avoid this.

6. CMake runner_deps missing LLM runner helper dependency — CMakeLists.txt:24-26

The Buck targets in targets.bzl list //executorch/extension/llm/runner:runner_lib as a dependency (which provides load_tokenizer, get_eos_ids, time_in_ms, etc.), and seq2seq_runner.cpp also needs sampler. The CMake target only links executorch_core extension_module extension_tensor tokenizers::tokenizers. Depending on how consumers link, this could cause undefined reference errors for load_tokenizer, time_in_ms, logits_to_token, Sampler, etc. The Parakeet CMakeLists works around this by also linking extension_llm_runner, but that shouldn't be required as a transitive dependency workaround.

Minor

transducer_runner.cpp:266 — encoded_len is read from enc_outputs[1] without checking the tensor is non-empty. A zero-element tensor would be UB via const_data_ptr<int64_t>()[0].
transducer_runner.h:39-40 — The Token::duration doc says "For standard RNN-T: always 1" but TransducerConfig doc and the decode loop produce dur=1 for non-blank and variable dur for blank. This is consistent in practice but the comment on Token::duration could be clearer that it represents frames consumed by that emission.
transducer_runner.h:66 — The docstring says joint returns (token_id, duration_idx), but duration_idx is only present for TDT. The doc should note this is TDT-only.

Looks Good

Backward-compat runner.h redirect and AsrRunner/AsrTranscribeConfig aliases are clean.
Android JNI correctly uses the alias — no API breakage.
Token type unification with parakeet::Token aliasing asr::Token avoids duplication.
load_model_metadata() with the read_int_constant lambda is well-structured with proper validation.
Joint output validation at transducer_runner.cpp:388-397 correctly varies the expected output count for TDT vs standard.
max_symbols_per_step guard at transducer_runner.cpp:472-474 prevents infinite loops.
Buck/Bazel targets properly separate the two runners as independent compilation units.

View job

Summary: Pull Request resolved: #18961 Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures: 1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h) - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h` - Old `runner.h` kept as a thin include redirect for backward compatibility 2. **Add** `TransducerRunner` for RNN-T/TDT models - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp` - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods - Supports both standard RNN-T (duration=1) and TDT (variable durations) - Returns `vector<Token>` with frame offsets for downstream timestamp computation - Includes `preprocess()` method for models with bundled preprocessor - Exposes tokenizer via `tokenizer()` getter for downstream use 3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token` 4. **Migrate consumers** - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner` - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()` - Android JNI: updated include (uses backward-compat alias, no API change) Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`. Differential Revision: D100892465

Copilot AI review requested due to automatic review settings April 17, 2026 00:10

kirklandsign requested review from larryliu0820 and lucylq as code owners April 17, 2026 00:10

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026

Copilot started reviewing on behalf of kirklandsign April 17, 2026 00:10 View session

meta-codesync Bot added fb-exported meta-exported labels Apr 17, 2026

Copilot AI reviewed Apr 17, 2026

View reviewed changes

kirklandsign force-pushed the export-D100892465 branch from abbbb8f to a29f1d7 Compare April 17, 2026 00:21

meta-codesync Bot changed the title ~~Add TransducerRunner and rename AsrRunner to Seq2SeqRunner~~ Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961) Apr 17, 2026

kirklandsign force-pushed the export-D100892465 branch from a29f1d7 to 67e6800 Compare April 17, 2026 00:28

Copilot AI review requested due to automatic review settings April 21, 2026 19:28

kirklandsign force-pushed the export-D100892465 branch from 67e6800 to ec281a0 Compare April 21, 2026 19:28

Copilot started reviewing on behalf of kirklandsign April 21, 2026 19:29 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

larryliu0820 approved these changes Apr 21, 2026

View reviewed changes

kirklandsign force-pushed the export-D100892465 branch from ec281a0 to 828bce5 Compare April 27, 2026 23:13

-  auto& init_outputs = decoder_init_result.get();
+  auto& init_outputs = decoder_init_result.get();
+  if (init_outputs.size() < 3 || !init_outputs[0].isTensor() ||
+      !init_outputs[1].isTensor() || !init_outputs[2].isTensor()) {
+    ET_LOG(
+        Error,
+        "Method %s returned invalid outputs: expected at least 3 tensor outputs, got %zu",
+        kDecoderStepMethodName,
+        init_outputs.size());
+    return Error::InvalidArgument;
+  }

	int64_t duration; // TDT duration value (0 for standard RNN-T)
	int64_t duration; // Duration in encoder frames (1 for standard RNN-T)

	* - "joint": (f_t, g_proj) -> (token_id, duration_idx)
	* - "joint": (f_t, g_proj) -> (token_id) for standard RNN-T,
	* or (token_id, duration_idx) for TDT

-  int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];
+  int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];
+  ET_CHECK_OR_RETURN_ERROR(
+      encoded_len >= 0 && encoded_len <= f_proj.size(1),
+      InvalidArgument,
+      "Encoder returned invalid encoded_len=%ld for f_proj time dimension=%ld",
+      static_cast<long>(encoded_len),
+      static_cast<long>(f_proj.size(1)));

-  auto new_c_init = init_outputs[2].toTensor();
+  auto new_c_init = init_outputs[2].toTensor();
+  size_t new_h_init_num_bytes =
+      static_cast<size_t>(new_h_init.numel()) *
+      ::executorch::runtime::elementSize(new_h_init.scalar_type());
+  size_t new_c_init_num_bytes =
+      static_cast<size_t>(new_c_init.numel()) *
+      ::executorch::runtime::elementSize(new_c_init.scalar_type());
+  ET_CHECK_OR_RETURN_ERROR(
+      new_h_init.scalar_type() == h_dtype,
+      InvalidArgument,
+      "Method %s returned h with unexpected dtype.",
+      kDecoderStepMethodName);
+  ET_CHECK_OR_RETURN_ERROR(
+      new_c_init.scalar_type() == c_dtype,
+      InvalidArgument,
+      "Method %s returned c with unexpected dtype.",
+      kDecoderStepMethodName);
+  ET_CHECK_OR_RETURN_ERROR(
+      new_h_init_num_bytes == h_data.size(),
+      InvalidArgument,
+      "Method %s returned h with %zu bytes, expected %zu.",
+      kDecoderStepMethodName,
+      new_h_init_num_bytes,
+      h_data.size());
+  ET_CHECK_OR_RETURN_ERROR(
+      new_c_init_num_bytes == c_data.size(),
+      InvalidArgument,
+      "Method %s returned c with %zu bytes, expected %zu.",
+      kDecoderStepMethodName,
+      new_c_init_num_bytes,
+      c_data.size());

Conversation

kirklandsign commented Apr 17, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18961

❗ 1 Active SEVs

❌ 3 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

kirklandsign commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

This PR needs a release notes: label

Uh oh!

claude Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

Summary

Issues

Minor / Nits

Looks Good

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

kirklandsign commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Apr 17, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Apr 17, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading