Skip to content

VoiceMode Configuration Guide

VoiceMode provides flexible configuration through environment variables and configuration files, following standard precedence rules while maintaining sensible defaults.

*Note: The python package is called voice-mode but

Quick Start

VoiceMode works out of the box with minimal configuration:

Cloud Setup (Easiest)

# Just need an OpenAI API key
export OPENAI_API_KEY="your-api-key"

Local Setup

# Run local services (Whisper + Kokoro)
voicemode kokoro start
voicemode whisper start
# VoiceMode auto-detects them!
# Use local services with cloud fallback
export OPENAI_API_KEY="your-api-key"  # Fallback
# Local services auto-detected when running

Configuration System

Configuration Precedence

VoiceMode follows standard configuration precedence (highest to lowest):

  1. Environment variables - Always win
  2. Project config - ./voicemode.env in current directory
  3. User config - ~/.voicemode/voicemode.env
  4. Auto-discovered services - Running local services
  5. Built-in defaults - Sensible fallbacks

Configuration Files

VoiceMode automatically creates ~/.voicemode/voicemode.env on first run with basic settings. This file uses shell export format:

# ~/.voicemode/voicemode.env example
export OPENAI_API_KEY="sk-..."
export VOICEMODE_VOICES="af_sky,nova"
export VOICEMODE_DEBUG=false

MCP Configuration

When used as an MCP server, add to your Claude or other MCP client configuration:

{
  "mcpServers": {
    "voicemode": {
      "command": "uvx",
      "args": ["voice-mode"],
      "env": {
        "OPENAI_API_KEY": "your-key-here",
        "VOICEMODE_VOICES": "nova,shimmer"
      }
    }
  }
}

Configuration Reference

API Keys and Authentication

# OpenAI API Key (for cloud TTS/STT)
OPENAI_API_KEY=sk-...

# LiveKit credentials (for room-based voice)
LIVEKIT_API_KEY=devkey          # Default for local dev
LIVEKIT_API_SECRET=secret        # Default for local dev

Voice Services

Text-to-Speech (TTS)

# TTS Service URLs (comma-separated, tried in order)
VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1,https://api.openai.com/v1

# Voice preferences (comma-separated)
# OpenAI: alloy, echo, fable, onyx, nova, shimmer
# Kokoro: af_sky, af_sarah, am_adam, bf_emma, etc.
VOICEMODE_VOICES=af_sky,nova,alloy

# TTS Models (comma-separated)
# OpenAI: tts-1, tts-1-hd, gpt-4o-mini-tts
VOICEMODE_TTS_MODELS=tts-1-hd,tts-1

# Default TTS voice and model
VOICEMODE_TTS_VOICE=nova
VOICEMODE_TTS_MODEL=tts-1-hd

# Speech speed (0.25 to 4.0)
VOICEMODE_TTS_SPEED=1.0

Speech-to-Text (STT)

# STT Service URLs
VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1,https://api.openai.com/v1

# Whisper configuration
VOICEMODE_WHISPER_MODEL=large-v2    # Model size
VOICEMODE_WHISPER_LANGUAGE=auto     # Language detection
VOICEMODE_WHISPER_PORT=2022         # Server port

Audio Configuration

# Audio formats
VOICEMODE_AUDIO_FORMAT=pcm          # Global default
VOICEMODE_TTS_AUDIO_FORMAT=pcm      # TTS-specific
VOICEMODE_STT_AUDIO_FORMAT=mp3      # STT-specific

# Supported formats: pcm, opus, mp3, wav, flac, aac

# Quality settings
VOICEMODE_OPUS_BITRATE=32000        # Opus bitrate (bps)
VOICEMODE_MP3_BITRATE=64k           # MP3 bitrate
VOICEMODE_AAC_BITRATE=64k           # AAC bitrate
VOICEMODE_SAMPLE_RATE=24000         # Sample rate (Hz)

Audio Feedback

# Chimes when recording starts/stops
VOICEMODE_AUDIO_FEEDBACK=true
VOICEMODE_FEEDBACK_STYLE=whisper    # or "shout"

# Silence around chimes (for Bluetooth)
VOICEMODE_CHIME_PRE_DELAY=1.0   # Seconds before
VOICEMODE_CHIME_POST_DELAY=0.5  # Seconds after

Voice Activity Detection

# VAD Aggressiveness (0-3)
# 0: Least aggressive (captures more)
# 3: Most aggressive (filters more)
VOICEMODE_VAD_AGGRESSIVENESS=2

# Silence detection
VOICEMODE_SILENCE_THRESHOLD=3.0     # Seconds of silence
VOICEMODE_MIN_RECORDING_TIME=0.5    # Minimum recording
VOICEMODE_MAX_RECORDING_TIME=120.0  # Maximum recording

LiveKit Configuration

# Server settings
LIVEKIT_URL=ws://127.0.0.1:7880
LIVEKIT_PORT=7880

# Room settings
VOICEMODE_LIVEKIT_ROOM_PREFIX=voicemode
VOICEMODE_LIVEKIT_AUTO_CREATE=true

Local Service Paths

# Kokoro TTS
VOICEMODE_KOKORO_PORT=8880
VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro
VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro

# Service directories
VOICEMODE_DATA_DIR=~/.voicemode
VOICEMODE_LOG_DIR=~/.voicemode/logs
VOICEMODE_CACHE_DIR=~/.voicemode/cache

Debugging and Logging

# Debug mode (verbose logging, saves all files)
VOICEMODE_DEBUG=true

# Logging levels
VOICEMODE_LOG_LEVEL=info           # debug, info, warning, error
VOICEMODE_SAVE_ALL=false           # Save all audio files
VOICEMODE_SAVE_RECORDINGS=false    # Save input recordings
VOICEMODE_SAVE_TTS=false           # Save TTS output

# Event logging
VOICEMODE_EVENT_LOG=false          # Log all events
VOICEMODE_CONVERSATION_LOG=false   # Log conversations

Development Settings

# Skip TTS for faster development
VOICEMODE_SKIP_TTS=false

# Disable specific features
VOICEMODE_DISABLE_SILENCE_DETECTION=false
VOICEMODE_DISABLE_VAD=false

Voice Preferences System

VoiceMode supports project-specific voice preferences. Create a .voicemode.env file in your project root:

# Project-specific voices for a game
export VOICEMODE_VOICES="onyx,fable"
export VOICEMODE_TTS_SPEED=0.9

This allows different projects to have different voice settings without changing global configuration.

Service Auto-Discovery

VoiceMode automatically discovers running local services:

  1. Whisper STT: Checks http://127.0.0.1:2022/v1
  2. Kokoro TTS: Checks http://127.0.0.1:8880/v1
  3. LiveKit: Checks ws://127.0.0.1:7880

No configuration needed when services run on default ports!

Configuration Philosophy

VoiceMode balances MCP compliance with user convenience:

  • Host configuration is authoritative - Environment variables always win
  • Reasonable defaults - Works without any configuration
  • Progressive disclosure - Simple configs for basic use, advanced options available
  • File-based convenience - Edit familiar config files instead of multiple host configs

Common Configurations

Privacy-Focused Local Setup

# No cloud services, everything local
export VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1
export VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1
export VOICEMODE_VOICES=af_sky

High-Quality Cloud Setup

# Best quality with OpenAI
export OPENAI_API_KEY=sk-...
export VOICEMODE_TTS_MODEL=tts-1-hd
export VOICEMODE_VOICES=nova,alloy

Troubleshooting Configuration

Check Active Configuration

# List all configuration keys
voicemode config list

# Get specific settings
voicemode config get VOICEMODE_TTS_VOICE
voicemode config get OPENAI_API_KEY

Configuration Not Working?

  1. Check precedence: Environment variables override files
  2. Verify syntax: Use export VAR=value format in files
  3. Check permissions: Ensure config files are readable
  4. Test services: Verify local services are running
  5. Enable debug: Set VOICEMODE_DEBUG=true for details

Reset Configuration

# Backup and recreate default config
mv ~/.voicemode/voicemode.env ~/.voicemode/voicemode.env.backup
# Edit the configuration file to reset
voicemode config edit

Security Considerations

  • Never commit API keys to version control
  • Use environment variables for sensitive data in production
  • Restrict file permissions: chmod 600 ~/.voicemode/voicemode.env
  • Rotate keys regularly if exposed
  • Use local services for sensitive audio data