Skip to content

[Examples] Enhance real-time transcription with VAD, word timestamps, and CLI options#2701

Open
vasanthrpjan1-boop wants to merge 2 commits into
openai:mainfrom
vasanthrpjan1-boop:examples/usage
Open

[Examples] Enhance real-time transcription with VAD, word timestamps, and CLI options#2701
vasanthrpjan1-boop wants to merge 2 commits into
openai:mainfrom
vasanthrpjan1-boop:examples/usage

Conversation

@vasanthrpjan1-boop

Copy link
Copy Markdown

This PR builds upon #2696 and significantly enhances the real-time transcription example with production-ready features.

New Features

Voice Activity Detection (VAD)

  • Only transcribes when speech is detected, saving compute resources
  • Configurable energy threshold (--energy-threshold)
  • Automatic speech segmentation based on silence detection

Word-Level Timestamps

  • --word-timestamps flag shows timing for each word
  • Useful for subtitling and precise audio alignment

Speaker Change Detection (Experimental)

  • --detect-speakers provides hints when speaker changes are detected
  • Based on pause pattern analysis

Audio Device Selection

  • --list-devices to show available microphones
  • --device-id to select a specific input device

Enhanced User Experience

  • Live audio level visualization with color-coded bar
  • Beautiful terminal UI with box-drawn headers
  • Duplicate transcript filtering using similarity scoring
  • Transcript saving with optional timestamps (--output, --timestamps)

Usage Examples

Basic usage

python examples/real_time_transcription.py

With word timestamps

python examples/real_time_transcription.py --word-timestamps

Save transcript with timestamps

python examples/real_time_transcription.py --output notes.txt --timestamps

Use specific model and language

python examples/real_time_transcription.py --model small --language es## Changes

  • examples/real_time_transcription.py - Complete rewrite with new features
  • README.md - Updated documentation with usage examples

Aditi-M007 and others added 2 commits November 29, 2025 20:57
…amps, and more

Enhanced real_time_transcription.py:
- Voice Activity Detection (VAD) for efficient speech segmentation
- CLI arguments for model, language, device, output file
- Audio device selection with --list-devices
- Live audio level visualization with color-coded bar
- Word-level timestamps (--word-timestamps)
- Speaker change detection hints (--detect-speakers)
- Duplicate transcript filtering using similarity scoring
- Beautiful terminal UI with box-drawn headers

Updated README.md:
- Documented new real-time transcription features
- Added quick start examples for common use cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants