feat: voice-activity streaming mode & inner-vad for speech-to-text module by IgorSwat · Pull Request #1160 · software-mansion/react-native-executorch

IgorSwat · 2026-05-20T13:09:00Z

Description

This PR introduces changes focused on voice-activity-detection module and it's utilization within the library:

Native side VAD streaming - introduces a continuous voice-activity-detection mechanism with user-friendly callback system. Example usage from demo app:

  await model.stream({
    onSpeechBegin: () => {...},
    onSpeechEnd: () => {...},
    options: {...},
  });

VAD x STT integration - adds an option to utilize voice-activity-detection within the speech-to-text module, significantly improving the effective performance of the STT.
Demo apps: introduces new screen in the speech demo app: VoiceActivityDetectionScreen and changes the behavior of SpeechToTextScreen, adding a toggle to switch the VAD submodule for STT on/off.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

To test the VAD streaming: run the VoiceActivityDetectionScreen within the Speech demo app.
To test the VAD & STT integration: run the SpeechToTextScreen within the Speech demo app, with VAD toggle on.

Screenshots

Related issues

#1118

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

msluszniak · 2026-05-20T15:37:41Z

+inline constexpr size_t kMinSpeechDuration = 25;                 // 250 ms
+inline constexpr size_t kMinSilenceDuration = 10;                // 100 ms
+inline constexpr size_t kSpeechPad = 3;                          // 30 ms


Why these are in 10s of ms while other constants above are in miliseconds?

All the constants in this file are in 10s of ms.

Then why these are named kWindowSizeMs and kHopLengthMs, this ms suffix almost screams to me: "This value is in milliseconds". If not, then I would be very surprised ;p

I mean, constants like kWindowSizeMs are indeed in milliseconds, but the other ones like kModelInputMin are in tens of milliseconds.

Yeah exactly, and that was my original question, Why these are in 10s of ms while other constants above are in miliseconds? Why is that? Can't we unify this?

IgorSwat requested review from chmjkb and msluszniak May 20, 2026 13:09

IgorSwat force-pushed the @is/vad-streaming branch from 694fe4f to 1c2411e Compare May 20, 2026 13:15

IgorSwat changed the base branch from main to @is/speech-to-text-ultimate May 20, 2026 13:26

chmjkb requested changes May 20, 2026

View reviewed changes

msluszniak reviewed May 20, 2026

View reviewed changes

IgorSwat force-pushed the @is/speech-to-text-ultimate branch from 02113ff to 6bba141 Compare May 20, 2026 15:46

msluszniak reviewed May 20, 2026

View reviewed changes

Comment thread ...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp

chmjkb requested changes May 21, 2026

View reviewed changes

Comment thread apps/speech/screens/SpeechToTextScreen.tsx

Comment thread apps/speech/screens/VoiceActivityDetectionScreen.tsx

Base automatically changed from @is/speech-to-text-ultimate to main May 21, 2026 08:20

IgorSwat added 8 commits May 21, 2026 10:34

Add CoreML whisper models

c1a1e97

Update urls & audio-api

8780a78

Add CoreML whisper models

3a97a75

Implement VAD streaming

b94d5f8

Integrate VAD with STT

0d2dbf1

Fix wrong include issue

b8cf8fa

Rebase with other PR changes

4782eda

Bump audio-api version

0ea858d

IgorSwat force-pushed the @is/vad-streaming branch from 1c2411e to 0ea858d Compare May 21, 2026 08:55

IgorSwat added 2 commits May 21, 2026 11:38

Apply review suggestions

790fb9c

Fix demo app keyboard behavior

dc5113d

msluszniak assigned IgorSwat May 21, 2026

msluszniak added the feature PRs that implement a new feature label May 21, 2026

Update demos & change default STT model for iOS simulator

177ce98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: voice-activity streaming mode & inner-vad for speech-to-text module#1160

feat: voice-activity streaming mode & inner-vad for speech-to-text module#1160
IgorSwat wants to merge 11 commits into
mainfrom
@is/vad-streaming

IgorSwat commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak May 20, 2026

Uh oh!

IgorSwat May 21, 2026

Uh oh!

msluszniak May 21, 2026

Uh oh!

IgorSwat May 21, 2026

Uh oh!

msluszniak May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

IgorSwat commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak May 20, 2026

Choose a reason for hiding this comment

Uh oh!

IgorSwat May 21, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak May 21, 2026

Choose a reason for hiding this comment

Uh oh!

IgorSwat May 21, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IgorSwat commented May 20, 2026 •

edited

Loading

msluszniak May 21, 2026 •

edited

Loading