Voice Mode Installation Tools¶

Voice Mode now includes MCP tools to automatically install and configure whisper.cpp and kokoro-fastapi, making it easier to set up free, private, open-source voice services.

Overview¶

These tools handle: - System detection (macOS/Linux) - Dependency installation - GPU support configuration - Model downloads - Service configuration

Available Tools¶

install_whisper_cpp¶

Installs whisper.cpp for speech-to-text (STT) functionality.

Features¶

Automatic OS detection (macOS/Linux)
GPU acceleration (Metal on macOS, CUDA on Linux)
Model download management
Build optimization
Service configuration (launchd on macOS, systemd on Linux)
Environment variable support for model selection

Usage¶

# Basic installation with defaults
result = await install_whisper_cpp()

# Custom installation
result = await install_whisper_cpp(
    install_dir="~/my-whisper",
    model="large-v3",
    use_gpu=True,
    force_reinstall=False
)

Parameters¶

install_dir (str, optional): Installation directory (default: ~/.voicemode/whisper.cpp)
model (str, optional): Whisper model to download (default: large-v2)
Available models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v2, large-v3
Note: large-v2 is default for best accuracy (requires ~3GB RAM)
use_gpu (bool, optional): Enable GPU support (default: auto-detect)
force_reinstall (bool, optional): Force reinstallation (default: false)

Return Value¶

{
    "success": True,
    "install_path": "/Users/user/.voicemode/whisper.cpp",
    "model_path": "/Users/user/.voicemode/whisper.cpp/models/ggml-large-v2.bin",
    "gpu_enabled": True,
    "gpu_type": "metal",  # or "cuda" or "cpu"
    "performance_info": {
        "system": "Darwin",
        "gpu_acceleration": "metal",
        "model": "large-v2",
        "binary_path": "/Users/user/.voicemode/whisper.cpp/main",
        "server_port": 2022,
        "server_url": "http://localhost:2022"
    },
    "launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.whisper-server.plist",  # macOS
    "systemd_service": "/home/user/.config/systemd/user/whisper-server.service",  # Linux
    "start_script": "/Users/user/.voicemode/whisper.cpp/start-whisper-server.sh"
}

install_kokoro_fastapi¶

Installs kokoro-fastapi for text-to-speech (TTS) functionality.

Features¶

Python environment management with UV
Automatic model downloads
Service configuration (launchd on macOS, systemd on Linux)
Auto-start capability

Usage¶

# Basic installation with defaults
result = await install_kokoro_fastapi()

# Custom installation
result = await install_kokoro_fastapi(
    install_dir="~/my-kokoro",
    models_dir="~/my-models",
    port=8881,
    auto_start=True,
    install_models=True,
    force_reinstall=False
)

Parameters¶

install_dir (str, optional): Installation directory (default: ~/.voicemode/kokoro-fastapi)
models_dir (str, optional): Models directory (default: ~/.voicemode/kokoro-models)
port (int, optional): Service port (default: 8880)
auto_start (bool, optional): Start service after installation (default: true)
install_models (bool, optional): Download Kokoro models (default: true)
force_reinstall (bool, optional): Force reinstallation (default: false)

Return Value¶

{
    "success": True,
    "install_path": "/home/user/.voicemode/kokoro-fastapi",
    "service_url": "http://127.0.0.1:8880",
    "service_status": "managed_by_systemd",  # Linux
    "service_status": "managed_by_launchd",  # macOS
    "systemd_service": "/home/user/.config/systemd/user/kokoro-fastapi-8880.service",  # Linux
    "launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.kokoro-8880.plist",  # macOS
    "start_script": "/home/user/.voicemode/kokoro-fastapi/start-cpu.sh",
    "message": "Kokoro-fastapi installed. Run: cd /home/user/.voicemode/kokoro-fastapi && ./start-cpu.sh"
}

System Requirements¶

whisper.cpp¶

macOS¶

Xcode Command Line Tools
Homebrew (for cmake)
Metal support (built-in)

Linux¶

Build essentials (gcc, g++, make)
CMake
CUDA toolkit (optional, for NVIDIA GPU support)

kokoro-fastapi¶

All Systems¶

Python 3.10+
Git
~5GB disk space for models
UV package manager (installed automatically if missing)

Integration with Voice Mode¶

After installation, the services integrate automatically with Voice Mode:

whisper.cpp:
Runs automatically on boot (port 2022)
OpenAI-compatible API endpoint
Model selection via VOICEMODE_WHISPER_MODEL environment variable
View installed models: claude resource read whisper://models
kokoro-fastapi:
Automatically detected by Voice Mode's provider registry when running
67 voices available
OpenAI-compatible API endpoint

Examples¶

Complete Setup¶

# Install both services with defaults
whisper_result = await install_whisper_cpp()  # Uses large-v2 by default
kokoro_result = await install_kokoro_fastapi()

# Check installation status
if whisper_result["success"] and kokoro_result["success"]:
    print("Voice services installed successfully!")
    print(f"Whisper: {whisper_result['install_path']}")
    print(f"Whisper server: {whisper_result['performance_info']['server_url']}")
    print(f"Kokoro API: {kokoro_result['service_url']}")

Upgrade Existing Installation¶

# Force reinstall with larger model
result = await install_whisper_cpp(
    model="large-v3",
    force_reinstall=True
)

Custom Configuration¶

# Install kokoro-fastapi on different port
result = await install_kokoro_fastapi(
    port=9000,
    models_dir="/opt/models/kokoro"
)

Troubleshooting¶

Common Issues¶

Missing Dependencies
The tools will report missing dependencies with installation instructions
Follow the provided commands to install required packages
Port Conflicts
If port 8880 is in use, specify a different port for kokoro-fastapi
Check running services: lsof -i :8880
GPU Not Detected
On Linux, ensure NVIDIA drivers and CUDA are installed
Use nvidia-smi to verify GPU availability
Force CPU mode with use_gpu=False if needed
Model Download Failures
Check internet connection
Verify sufficient disk space
Try smaller models first (tiny, base)

Manual Service Management¶

macOS (launchd)¶

# Whisper
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist

# Kokoro
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist

Linux (systemd)¶

# Whisper
systemctl --user start whisper-server
systemctl --user stop whisper-server
systemctl --user status whisper-server

# Kokoro
systemctl --user start kokoro-fastapi-8880
systemctl --user stop kokoro-fastapi-8880
systemctl --user status kokoro-fastapi-8880

Change Whisper Model¶

# Set environment variable before restarting
export VOICEMODE_WHISPER_MODEL=base.en  # or tiny, small, medium, large-v2, large-v3

# Restart service to apply change
# macOS
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist

# Linux
systemctl --user restart whisper-server

Testing¶

Run the test suite to verify installation tools:

cd /path/to/voicemode
python -m pytest tests/test_installers.py -v

# Skip integration tests (no actual installation)
SKIP_INTEGRATION_TESTS=1 python -m pytest tests/test_installers.py -v

Contributing¶

When adding new installation tools:

Create a new function in voice_mode/tools/installers.py
Use the @mcp.tool() decorator
Follow the existing pattern for error handling and return values
Add comprehensive tests in tests/test_installers.py
Update this documentation

Security Notes¶

All installations are performed in user space (no sudo required)
Models are downloaded from official sources
Services bind to localhost only by default
No external network access without explicit configuration