Speech-to-text dictation tool for Linux/Wayland, designed to run via keyboard shortcut. Press your hotkey to start recording, press again to stop. Your speech is transcribed locally using faster-whisper, optionally cleaned up by Claude, and copied to your clipboard.
- Toggle-based recording: First invocation starts recording, second stops and processes
- Local transcription: Uses faster-whisper (Whisper implementation in CTranslate2) - no cloud API for transcription
- AI text cleanup: Optional grammar and punctuation cleanup via Claude CLI
- Direct typing: Optionally type text directly at cursor position (in addition to clipboard)
- Multi-language support: Transcribe in any language Whisper supports
- Desktop notifications: Progress updates via system notifications
- Configurable: Customize Whisper model, audio device, and prompt template
Install these via your package manager:
# Fedora
sudo dnf install portaudio-devel wl-clipboard libnotify
# For --type flag (optional)
sudo dnf install wtype
# Ubuntu/Debian
sudo apt install portaudio19-dev wl-clipboard libnotify-bin
# For --type flag (optional)
sudo apt install wtypeThe Claude CLI is required for text cleanup (unless using --raw mode):
# Install via npm
npm install -g @anthropic-ai/claude-code
# Authenticate
claude loginPython 3.11+ is required. Using uv is recommended.
Clone the repository:
git clone https://github.com/yourusername/dictate.git
cd dictateNo additional installation steps needed - uv handles Python dependencies automatically.
This tool is designed to be triggered via a keyboard shortcut (see Setting Up a Keyboard Shortcut below). The same shortcut both starts and stops recording.
- First press: Starts recording (shows notification)
- Second press: Stops recording → transcribes → processes → copies to clipboard
When testing from a terminal, run in the background with & (otherwise Ctrl+C would kill the recording process):
# Start recording
uv run ./dictate.py &
# Stop and process (run again in any terminal)
uv run ./dictate.py| Argument | Short | Description |
|---|---|---|
--language |
-l |
Language code for transcription (e.g., de, fr, es). Default: en |
--raw |
-r |
Skip Claude processing, output raw transcription only |
--type |
-t |
Type text at cursor position via wtype (in addition to clipboard) |
# Transcribe in German
uv run ./dictate.py -l de
# Raw transcription without AI cleanup
uv run ./dictate.py --raw
# Type directly at cursor AND copy to clipboard
uv run ./dictate.py --type
# Combine flags
uv run ./dictate.py -l de -r -tAdd to your hyprland.conf or keybinds config:
# Dictate - English (Super+Ctrl+Alt+E to toggle)
bind = $mainMod CTRL ALT, e, exec, uv run ~/code/dictate/dictate.py --type
# Dictate - German
bind = $mainMod CTRL ALT, d, exec, uv run ~/code/dictate/dictate.py --type -l de
# Raw mode (no Claude processing)
bind = $mainMod SHIFT CTRL ALT, e, exec, uv run ~/code/dictate/dictate.py --type --rawAdd to your ~/.config/sway/config:
# Dictate - English (Super+Ctrl+Alt+E to toggle)
bindsym $mod+Ctrl+Alt+e exec uv run ~/code/dictate/dictate.py --type
# Dictate - German
bindsym $mod+Ctrl+Alt+d exec uv run ~/code/dictate/dictate.py --type -l de- Open Settings → Keyboard → Keyboard Shortcuts → View and Customize Shortcuts
- Scroll to Custom Shortcuts and click +
- Set:
- Name:
Dictate - Command:
uv run /full/path/to/dictate.py --type - Shortcut: Your preferred key (e.g.,
Super+Ctrl+Alt+E)
- Name:
Configure your DE's keybinding system to run the script. The same shortcut toggles recording on/off.
Create or edit config.toml in the project directory:
[whisper]
model_size = "base" # tiny, base, small, medium, large-v2, large-v3
device = "cpu" # cpu or cuda
compute_type = "int8" # int8, int16, float16, float32
language = "en" # Default language
[audio]
# Uncomment to specify a specific audio input device index
# device = 0
[agent]
# Custom prompt template (uses {text} placeholder)
# prompt_template = """Your custom prompt here: {text}"""| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny |
75 MB | Fastest | Lower |
base |
142 MB | Fast | Good |
small |
466 MB | Medium | Better |
medium |
1.5 GB | Slower | High |
large-v3 |
3 GB | Slowest | Highest |
For CUDA acceleration, set device = "cuda" and use compute_type = "float16".
- First run: Creates a PID file (
/tmp/dictate.pid) and starts recording audio - Second run: Detects running instance via PID file, sends
SIGUSR1signal to stop - Processing pipeline:
- Audio is resampled to 16kHz
- faster-whisper transcribes the audio locally
- Claude CLI cleans up grammar/punctuation (unless
--raw) - Result is copied to clipboard (and typed if
--type)
- Notifications: Desktop notifications show progress at each stage
List available audio devices and set the correct index in config.toml:
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
info = p.get_device_info_by_index(i)
if info["maxInputChannels"] > 0:
print(f"{i}: {info['name']}")Ensure you're authenticated: claude login
Use --raw mode to bypass Claude processing entirely.
Install CUDA-compatible versions:
uv pip install nvidia-cublas-cu12 nvidia-cudnn-cu12MIT