Dictate

Speech-to-text dictation tool for Linux/Wayland, designed to run via keyboard shortcut. Press your hotkey to start recording, press again to stop. Your speech is transcribed locally using faster-whisper, optionally cleaned up by Claude, and copied to your clipboard.

Features

Toggle-based recording: First invocation starts recording, second stops and processes
Local transcription: Uses faster-whisper (Whisper implementation in CTranslate2) - no cloud API for transcription
AI text cleanup: Optional grammar and punctuation cleanup via Claude CLI
Direct typing: Optionally type text directly at cursor position (in addition to clipboard)
Multi-language support: Transcribe in any language Whisper supports
Desktop notifications: Progress updates via system notifications
Configurable: Customize Whisper model, audio device, and prompt template

Requirements

System Dependencies

Install these via your package manager:

# Fedora
sudo dnf install portaudio-devel wl-clipboard libnotify

# For --type flag (optional)
sudo dnf install wtype

# Ubuntu/Debian
sudo apt install portaudio19-dev wl-clipboard libnotify-bin

# For --type flag (optional)
sudo apt install wtype

Claude CLI

The Claude CLI is required for text cleanup (unless using --raw mode):

# Install via npm
npm install -g @anthropic-ai/claude-code

# Authenticate
claude login

Python

Python 3.11+ is required. Using uv is recommended.

Installation

Clone the repository:

git clone https://github.com/yourusername/dictate.git
cd dictate

No additional installation steps needed - uv handles Python dependencies automatically.

Usage

This tool is designed to be triggered via a keyboard shortcut (see Setting Up a Keyboard Shortcut below). The same shortcut both starts and stops recording.

How the Toggle Works

First press: Starts recording (shows notification)
Second press: Stops recording → transcribes → processes → copies to clipboard

Testing from the Shell

When testing from a terminal, run in the background with & (otherwise Ctrl+C would kill the recording process):

# Start recording
uv run ./dictate.py &

# Stop and process (run again in any terminal)
uv run ./dictate.py

Command-Line Arguments

Argument	Short	Description
`--language`	`-l`	Language code for transcription (e.g., `de`, `fr`, `es`). Default: `en`
`--raw`	`-r`	Skip Claude processing, output raw transcription only
`--type`	`-t`	Type text at cursor position via `wtype` (in addition to clipboard)

Examples

# Transcribe in German
uv run ./dictate.py -l de

# Raw transcription without AI cleanup
uv run ./dictate.py --raw

# Type directly at cursor AND copy to clipboard
uv run ./dictate.py --type

# Combine flags
uv run ./dictate.py -l de -r -t

Setting Up a Keyboard Shortcut

Hyprland

Add to your hyprland.conf or keybinds config:

# Dictate - English (Super+Ctrl+Alt+E to toggle)
bind = $mainMod CTRL ALT, e, exec, uv run ~/code/dictate/dictate.py --type

# Dictate - German
bind = $mainMod CTRL ALT, d, exec, uv run ~/code/dictate/dictate.py --type -l de

# Raw mode (no Claude processing)
bind = $mainMod SHIFT CTRL ALT, e, exec, uv run ~/code/dictate/dictate.py --type --raw

Sway

Add to your ~/.config/sway/config:

# Dictate - English (Super+Ctrl+Alt+E to toggle)
bindsym $mod+Ctrl+Alt+e exec uv run ~/code/dictate/dictate.py --type

# Dictate - German
bindsym $mod+Ctrl+Alt+d exec uv run ~/code/dictate/dictate.py --type -l de

GNOME

Open Settings → Keyboard → Keyboard Shortcuts → View and Customize Shortcuts
Scroll to Custom Shortcuts and click +
Set:
- Name: Dictate
- Command: uv run /full/path/to/dictate.py --type
- Shortcut: Your preferred key (e.g., Super+Ctrl+Alt+E)

Other Desktop Environments

Configure your DE's keybinding system to run the script. The same shortcut toggles recording on/off.

Configuration

Create or edit config.toml in the project directory:

[whisper]
model_size = "base"    # tiny, base, small, medium, large-v2, large-v3
device = "cpu"         # cpu or cuda
compute_type = "int8"  # int8, int16, float16, float32
language = "en"        # Default language

[audio]
# Uncomment to specify a specific audio input device index
# device = 0

[agent]
# Custom prompt template (uses {text} placeholder)
# prompt_template = """Your custom prompt here: {text}"""

Whisper Models

Model	Size	Speed	Accuracy
`tiny`	75 MB	Fastest	Lower
`base`	142 MB	Fast	Good
`small`	466 MB	Medium	Better
`medium`	1.5 GB	Slower	High
`large-v3`	3 GB	Slowest	Highest

For CUDA acceleration, set device = "cuda" and use compute_type = "float16".

How It Works

First run: Creates a PID file (/tmp/dictate.pid) and starts recording audio
Second run: Detects running instance via PID file, sends SIGUSR1 signal to stop
Processing pipeline:
- Audio is resampled to 16kHz
- faster-whisper transcribes the audio locally
- Claude CLI cleans up grammar/punctuation (unless --raw)
- Result is copied to clipboard (and typed if --type)
Notifications: Desktop notifications show progress at each stage

Troubleshooting

No audio input detected

List available audio devices and set the correct index in config.toml:

import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    info = p.get_device_info_by_index(i)
    if info["maxInputChannels"] > 0:
        print(f"{i}: {info['name']}")

Claude CLI errors

Ensure you're authenticated: claude login

Use --raw mode to bypass Claude processing entirely.

CUDA not detected

Install CUDA-compatible versions:

uv pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dictate		dictate
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
dictate.py		dictate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dictate

Features

Requirements

System Dependencies

Claude CLI

Python

Installation

Usage

How the Toggle Works

Testing from the Shell

Command-Line Arguments

Examples

Setting Up a Keyboard Shortcut

Hyprland

Sway

GNOME

Other Desktop Environments

Configuration

Whisper Models

How It Works

Troubleshooting

No audio input detected

Claude CLI errors

CUDA not detected

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dictate

Features

Requirements

System Dependencies

Claude CLI

Python

Installation

Usage

How the Toggle Works

Testing from the Shell

Command-Line Arguments

Examples

Setting Up a Keyboard Shortcut

Hyprland

Sway

GNOME

Other Desktop Environments

Configuration

Whisper Models

How It Works

Troubleshooting

No audio input detected

Claude CLI errors

CUDA not detected

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages