feat: voice-activity streaming mode & inner-vad for speech-to-text module#1160
feat: voice-activity streaming mode & inner-vad for speech-to-text module#1160IgorSwat wants to merge 11 commits into
Conversation
694fe4f to
1c2411e
Compare
| inline constexpr size_t kMinSpeechDuration = 25; // 250 ms | ||
| inline constexpr size_t kMinSilenceDuration = 10; // 100 ms | ||
| inline constexpr size_t kSpeechPad = 3; // 30 ms |
There was a problem hiding this comment.
Why these are in 10s of ms while other constants above are in miliseconds?
There was a problem hiding this comment.
All the constants in this file are in 10s of ms.
There was a problem hiding this comment.
Then why these are named kWindowSizeMs and kHopLengthMs, this ms suffix almost screams to me: "This value is in milliseconds". If not, then I would be very surprised ;p
There was a problem hiding this comment.
I mean, constants like kWindowSizeMs are indeed in milliseconds, but the other ones like kModelInputMin are in tens of milliseconds.
There was a problem hiding this comment.
Yeah exactly, and that was my original question, Why these are in 10s of ms while other constants above are in miliseconds? Why is that? Can't we unify this?
02113ff to
6bba141
Compare
1c2411e to
0ea858d
Compare
Description
This PR introduces changes focused on voice-activity-detection module and it's utilization within the library:
VoiceActivityDetectionScreenand changes the behavior ofSpeechToTextScreen, adding a toggle to switch the VAD submodule for STT on/off.Introduces a breaking change?
Type of change
Tested on
Testing instructions
VoiceActivityDetectionScreenwithin theSpeechdemo app.SpeechToTextScreenwithin theSpeechdemo app, with VAD toggle on.Screenshots
Related issues
#1118
Checklist
Additional notes