VoiceMode Configuration Guide¶
VoiceMode provides flexible configuration through environment variables and configuration files, following standard precedence rules while maintaining sensible defaults.
*Note: The python package is called voice-mode but
Quick Start¶
VoiceMode works out of the box with minimal configuration:
Cloud Setup (Easiest)¶
Local Setup¶
# Run local services (Whisper + Kokoro)
voicemode kokoro start
voicemode whisper start
# VoiceMode auto-detects them!
Hybrid Setup (Recommended)¶
# Use local services with cloud fallback
export OPENAI_API_KEY="your-api-key" # Fallback
# Local services auto-detected when running
Configuration System¶
Configuration Precedence¶
VoiceMode follows standard configuration precedence (highest to lowest):
- Environment variables - Always win
- Project config -
./voicemode.envin current directory - User config -
~/.voicemode/voicemode.env - Auto-discovered services - Running local services
- Built-in defaults - Sensible fallbacks
Configuration Files¶
VoiceMode automatically creates ~/.voicemode/voicemode.env on first run with basic settings. This file uses shell export format:
# ~/.voicemode/voicemode.env example
export OPENAI_API_KEY="sk-..."
export VOICEMODE_VOICES="af_sky,nova"
export VOICEMODE_DEBUG=false
MCP Configuration¶
When used as an MCP server, add to your Claude or other MCP client configuration:
{
"mcpServers": {
"voicemode": {
"command": "uvx",
"args": ["voice-mode"],
"env": {
"OPENAI_API_KEY": "your-key-here",
"VOICEMODE_VOICES": "nova,shimmer"
}
}
}
}
Configuration Reference¶
API Keys and Authentication¶
# OpenAI API Key (for cloud TTS/STT)
OPENAI_API_KEY=sk-...
# LiveKit credentials (for room-based voice)
LIVEKIT_API_KEY=devkey # Default for local dev
LIVEKIT_API_SECRET=secret # Default for local dev
Voice Services¶
Text-to-Speech (TTS)¶
# TTS Service URLs (comma-separated, tried in order)
VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1,https://api.openai.com/v1
# Voice preferences (comma-separated)
# OpenAI: alloy, echo, fable, onyx, nova, shimmer
# Kokoro: af_sky, af_sarah, am_adam, bf_emma, etc.
VOICEMODE_VOICES=af_sky,nova,alloy
# TTS Models (comma-separated)
# OpenAI: tts-1, tts-1-hd, gpt-4o-mini-tts
VOICEMODE_TTS_MODELS=tts-1-hd,tts-1
# Default TTS voice and model
VOICEMODE_TTS_VOICE=nova
VOICEMODE_TTS_MODEL=tts-1-hd
# Speech speed (0.25 to 4.0)
VOICEMODE_TTS_SPEED=1.0
Speech-to-Text (STT)¶
# STT Service URLs
VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1,https://api.openai.com/v1
# Whisper configuration
VOICEMODE_WHISPER_MODEL=large-v2 # Model size
VOICEMODE_WHISPER_LANGUAGE=auto # Language detection
VOICEMODE_WHISPER_PORT=2022 # Server port
Audio Configuration¶
# Audio formats
VOICEMODE_AUDIO_FORMAT=pcm # Global default
VOICEMODE_TTS_AUDIO_FORMAT=pcm # TTS-specific
VOICEMODE_STT_AUDIO_FORMAT=mp3 # STT-specific
# Supported formats: pcm, opus, mp3, wav, flac, aac
# Quality settings
VOICEMODE_OPUS_BITRATE=32000 # Opus bitrate (bps)
VOICEMODE_MP3_BITRATE=64k # MP3 bitrate
VOICEMODE_AAC_BITRATE=64k # AAC bitrate
VOICEMODE_SAMPLE_RATE=24000 # Sample rate (Hz)
Audio Feedback¶
# Chimes when recording starts/stops
VOICEMODE_AUDIO_FEEDBACK=true
VOICEMODE_FEEDBACK_STYLE=whisper # or "shout"
# Silence around chimes (for Bluetooth)
VOICEMODE_CHIME_PRE_DELAY=1.0 # Seconds before
VOICEMODE_CHIME_POST_DELAY=0.5 # Seconds after
Voice Activity Detection¶
# VAD Aggressiveness (0-3)
# 0: Least aggressive (captures more)
# 3: Most aggressive (filters more)
VOICEMODE_VAD_AGGRESSIVENESS=2
# Silence detection
VOICEMODE_SILENCE_THRESHOLD=3.0 # Seconds of silence
VOICEMODE_MIN_RECORDING_TIME=0.5 # Minimum recording
VOICEMODE_MAX_RECORDING_TIME=120.0 # Maximum recording
LiveKit Configuration¶
# Server settings
LIVEKIT_URL=ws://127.0.0.1:7880
LIVEKIT_PORT=7880
# Room settings
VOICEMODE_LIVEKIT_ROOM_PREFIX=voicemode
VOICEMODE_LIVEKIT_AUTO_CREATE=true
Local Service Paths¶
# Kokoro TTS
VOICEMODE_KOKORO_PORT=8880
VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro
VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro
# Service directories
VOICEMODE_DATA_DIR=~/.voicemode
VOICEMODE_LOG_DIR=~/.voicemode/logs
VOICEMODE_CACHE_DIR=~/.voicemode/cache
Debugging and Logging¶
# Debug mode (verbose logging, saves all files)
VOICEMODE_DEBUG=true
# Logging levels
VOICEMODE_LOG_LEVEL=info # debug, info, warning, error
VOICEMODE_SAVE_ALL=false # Save all audio files
VOICEMODE_SAVE_RECORDINGS=false # Save input recordings
VOICEMODE_SAVE_TTS=false # Save TTS output
# Event logging
VOICEMODE_EVENT_LOG=false # Log all events
VOICEMODE_CONVERSATION_LOG=false # Log conversations
Development Settings¶
# Skip TTS for faster development
VOICEMODE_SKIP_TTS=false
# Disable specific features
VOICEMODE_DISABLE_SILENCE_DETECTION=false
VOICEMODE_DISABLE_VAD=false
Voice Preferences System¶
VoiceMode supports project-specific voice preferences. Create a .voicemode.env file in your project root:
# Project-specific voices for a game
export VOICEMODE_VOICES="onyx,fable"
export VOICEMODE_TTS_SPEED=0.9
This allows different projects to have different voice settings without changing global configuration.
Service Auto-Discovery¶
VoiceMode automatically discovers running local services:
- Whisper STT: Checks
http://127.0.0.1:2022/v1 - Kokoro TTS: Checks
http://127.0.0.1:8880/v1 - LiveKit: Checks
ws://127.0.0.1:7880
No configuration needed when services run on default ports!
Configuration Philosophy¶
VoiceMode balances MCP compliance with user convenience:
- Host configuration is authoritative - Environment variables always win
- Reasonable defaults - Works without any configuration
- Progressive disclosure - Simple configs for basic use, advanced options available
- File-based convenience - Edit familiar config files instead of multiple host configs
Common Configurations¶
Privacy-Focused Local Setup¶
# No cloud services, everything local
export VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1
export VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1
export VOICEMODE_VOICES=af_sky
High-Quality Cloud Setup¶
# Best quality with OpenAI
export OPENAI_API_KEY=sk-...
export VOICEMODE_TTS_MODEL=tts-1-hd
export VOICEMODE_VOICES=nova,alloy
Troubleshooting Configuration¶
Check Active Configuration¶
# List all configuration keys
voicemode config list
# Get specific settings
voicemode config get VOICEMODE_TTS_VOICE
voicemode config get OPENAI_API_KEY
Configuration Not Working?¶
- Check precedence: Environment variables override files
- Verify syntax: Use
export VAR=valueformat in files - Check permissions: Ensure config files are readable
- Test services: Verify local services are running
- Enable debug: Set
VOICEMODE_DEBUG=truefor details
Reset Configuration¶
# Backup and recreate default config
mv ~/.voicemode/voicemode.env ~/.voicemode/voicemode.env.backup
# Edit the configuration file to reset
voicemode config edit
Security Considerations¶
- Never commit API keys to version control
- Use environment variables for sensitive data in production
- Restrict file permissions:
chmod 600 ~/.voicemode/voicemode.env - Rotate keys regularly if exposed
- Use local services for sensitive audio data