VOCIX is a voice dictation application designed to capture speech, transcribe it offline, optionally transform it using large language models (LLMs), and insert the result at the cursor position in any application.
Key Features:
Push-to-talk via global hotkey: Use the Pause key as the default hotkey for hands-free operation across all applications.
Three processing modes: Choose between Clean mode for standard transcription, Business mode for professional language transformation, or Rage mode to de-escalate aggressive text.
Multi-provider LLM integration: Supports Anthropic Claude, OpenAI-compatible APIs, and local Ollama models, allowing customization per mode.
System-wide insertion: Automatically inserts processed text into any application at the cursor position.
User-friendly interface: Features a system tray with a color-coded microphone icon for quick access and mode switching.
Audience & Benefit:
Ideal for professionals who require efficient transcription across various applications. VOCIX enhances productivity by automating dictation and smart text transformation, ensuring seamless integration into daily workflows in browsers, Word, Outlook, or IDEs.
Local voice dictation app for Windows 11 with a global hotkey. Capture speech, transcribe it, transform it intelligently, and insert it system-wide at the cursor position — in any application (browser, Word, Outlook, IDEs, etc.).
Features
Push-to-Talk via global hotkey (default: Pause)
Three modes:
A — Clean: Clean transcription; strips filler words (um, uh, like, ...) with light corrections
B — Business: Rewrites speech into professional business language (LLM-powered)
C — Rage: De-escalates aggressive language into polite phrasing (LLM-powered)
Multi-provider LLM for modes B and C — pick your backend in the settings dialog: Anthropic Claude, any OpenAI-compatible API (OpenAI, Groq, OpenRouter, LM Studio, llama.cpp-server, vLLM via base_url) or local Ollama models. Per-mode override (e.g. Business on cloud Claude, Rage on local Llama). Provider failures fall back to Clean mode and surface an orange toast — no more silent degradation.
Settings dialog in the tray menu — four tabs (Basics / Advanced / Expert / AI Provider) with Test buttons, hotkey capture and per-mode validation
System tray with a colour-coded microphone icon and mode switching
Status overlay with a live VU meter while recording — instant visual feedback that the mic is picking up signal
History of the last 20 dictations in the tray — click an entry to re-insert it (saves your text when the target window has changed)
Usage statistics — words per day/week/total, estimated typing time saved (200 keystrokes/min), distribution across modes
Snippet expansion — your own shortcuts (, , …) inside the dictation are replaced with full text before insertion; Whisper transcripts like "slash sig" are normalised automatically
Auto-update from the tray — new releases are detected in the background; one click downloads the Win-x64 ZIP, verifies the SHA256 and swaps the files automatically
Local processing — speech-to-text runs fully offline (faster-whisper)
Configurable Whisper model — switch between tiny / base / small (default) / medium / large-v3 / large-v3-turbo at runtime via the tray menu (larger = more accurate, slower)
Optional NVIDIA GPU acceleration — auto-detects CUDA when available, falls back to CPU; switchable in the tray menu (Auto / GPU / CPU). Source install only (the packaged ZIP stays CPU-only)
Multilingual UI (German / English) — switchable at runtime via the tray menu, also drives Claude prompts and the Whisper STT language
Optional offline translation to English — toggle in the tray menu: speak in any of ~50 Whisper-supported languages and VOCIX inserts native English text at the cursor, fully offline (no API key needed)
Pulls cuBLAS + cuDNN (~600 MB) and lets ctranslate2 use the GPU. After the
install, pick Beschleunigung → GPU (CUDA) in the tray menu (or set
VOCIX_WHISPER_ACCELERATION=gpu). The packaged Win-x64 ZIP does not include
these libraries — GPU is opt-in for source installs only.
Build the .exe yourself
pip install pyinstaller
build_exe.bat
The result lives in dist\VOCIX\ — the whole folder is portable.
Configuration
The recommended way to configure VOCIX is the Settings dialog (tray icon → Settings). The AI Provider tab carries three slots — Anthropic, OpenAI-compatible and Ollama — each with its own Test button. Pick a default and optionally override per mode (Business / Rage).
For headless setups everything is also available via .env:
# --- LLM providers (modes B and C) -----------------------------------------
# Pick a default provider and optionally override per mode.
VOCIX_LLM_DEFAULT=anthropic # anthropic | openai | ollama
VOCIX_LLM_BUSINESS= # leave empty to use the default
VOCIX_LLM_RAGE=
# Anthropic Claude
VOCIX_LLM_ANTHROPIC_API_KEY=sk-ant-your-key-here
VOCIX_LLM_ANTHROPIC_MODEL=claude-sonnet-4-6
VOCIX_LLM_ANTHROPIC_TIMEOUT=15
# OpenAI-compatible (OpenAI, Groq, OpenRouter, LM Studio, llama.cpp, vLLM …)
VOCIX_LLM_OPENAI_API_KEY=
VOCIX_LLM_OPENAI_BASE_URL=https://api.openai.com/v1
VOCIX_LLM_OPENAI_MODEL=gpt-4o-mini
VOCIX_LLM_OPENAI_TIMEOUT=15
# Ollama (local, no API key)
VOCIX_LLM_OLLAMA_BASE_URL=http://localhost:11434
VOCIX_LLM_OLLAMA_MODEL=llama3.1
VOCIX_LLM_OLLAMA_TIMEOUT=30
# --- App ------------------------------------------------------------------
# Language — controls UI, LLM prompts and Whisper STT (de, en)
# The tray selection (stored in state.json) overrides this value.
VOCIX_LANGUAGE=en
# Whisper model — tiny | base | small (default) | medium | large-v3 | large-v3-turbo
# Tray selection (state.json) overrides this value.
VOCIX_WHISPER_MODEL=small
# Acceleration — auto (GPU if available) | gpu (force CUDA) | cpu (force CPU)
# GPU mode needs `pip install -r requirements-gpu.txt` and is source-install only.
VOCIX_WHISPER_ACCELERATION=auto
# Hotkeys — push-to-talk requires a single key, mode switchers may be combos
VOCIX_HOTKEY_RECORD=pause
VOCIX_HOTKEY_MODE_A=ctrl+shift+1
VOCIX_HOTKEY_MODE_B=ctrl+shift+2
VOCIX_HOTKEY_MODE_C=ctrl+shift+3
# Logging (DEBUG, INFO, WARNING, ERROR)
VOCIX_LOG_LEVEL=INFO
VOCIX_LOG_FILE=vocix.log
# RDP mode (longer clipboard delays)
VOCIX_RDP_MODE=true
Without any configured provider, modes B and C automatically fall back to mode A (Clean). Configurations from VOCIX 1.3.x (ANTHROPIC_API_KEY, ANTHROPIC_MODEL, ANTHROPIC_TIMEOUT) keep working unchanged — saving once in the new settings tab migrates them.
Env precedence: variables already present in the process environment are not overridden by the .env file (default behaviour of python-dotenv). To temporarily override a value, export it before launching the app.
Usage
Shortcut
Action
Pause (hold)
Push-to-talk — speak, release to process
Ctrl+Shift+1
Mode A: Clean transcription
Ctrl+Shift+2
Mode B: Business mode
Ctrl+Shift+3
Mode C: Rage mode
Workflow:
Place the cursor in the target field (e-mail, chat, text editor, …)
Hold Pause and speak
Release — the text is transcribed, transformed and automatically inserted
Tray menu: right-click the tray icon → mode switch, Language / Sprache (English / Deutsch — switches UI, Claude prompts and Whisper STT), Whisper-Modell (tiny … large-v3-turbo at runtime), Beschleunigung (Auto / GPU / CPU — GPU greyed out when no CUDA detected), About (version + repo link), Quit
> Note: tray-menu choices (mode, language, Whisper model, acceleration) are persisted in state.json and override the corresponding .env values on next launch.
Troubleshooting
Problem
Solution
SmartScreen: "Windows protected your PC" on first launch
Click More info → Run anyway. VOCIX is open source and the release ZIP is reproducible from main via build_exe.bat. Code signing is tracked in #12.
Tray icon not visible
Check hidden icons in the taskbar (arrow pointing up)
"VOCIX requires a CPU with AVX support" on startup
Your CPU is older than ~2012 and cannot run CTranslate2. VOCIX will not work on this machine.
Hotkey doesn't respond
Run the app as administrator
Laptop without a Pause key
Set VOCIX_HOTKEY_RECORD=scroll lock (or f7) in .env
"Microphone unavailable"
Check microphone permissions in Windows settings
Modes B/C only return Clean results
Open Settings → AI Provider, configure at least one slot and hit Test
Whisper download fails
Check your internet connection; configure proxy/firewall if needed
MIT License — free to use, including commercially. No warranty.
VOCIX bundles third-party Python libraries in the portable distribution. See
THIRD_PARTY_LICENSES.md for the required copyright
and permission notices (MIT / BSD / HPND / LGPL-3.0).