Skip to content

Voice Mode Installation Tools

Voice Mode now includes MCP tools to automatically install and configure whisper.cpp and kokoro-fastapi, making it easier to set up free, private, open-source voice services.

Overview

These tools handle: - System detection (macOS/Linux) - Dependency installation - GPU support configuration - Model downloads - Service configuration

Available Tools

install_whisper_cpp

Installs whisper.cpp for speech-to-text (STT) functionality.

Features

  • Automatic OS detection (macOS/Linux)
  • GPU acceleration (Metal on macOS, CUDA on Linux)
  • Model download management
  • Build optimization
  • Service configuration (launchd on macOS, systemd on Linux)
  • Environment variable support for model selection

Usage

# Basic installation with defaults
result = await install_whisper_cpp()

# Custom installation
result = await install_whisper_cpp(
    install_dir="~/my-whisper",
    model="large-v3",
    use_gpu=True,
    force_reinstall=False
)

Parameters

  • install_dir (str, optional): Installation directory (default: ~/.voicemode/whisper.cpp)
  • model (str, optional): Whisper model to download (default: large-v2)
  • Available models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v2, large-v3
  • Note: large-v2 is default for best accuracy (requires ~3GB RAM)
  • use_gpu (bool, optional): Enable GPU support (default: auto-detect)
  • force_reinstall (bool, optional): Force reinstallation (default: false)

Return Value

{
    "success": True,
    "install_path": "/Users/user/.voicemode/whisper.cpp",
    "model_path": "/Users/user/.voicemode/whisper.cpp/models/ggml-large-v2.bin",
    "gpu_enabled": True,
    "gpu_type": "metal",  # or "cuda" or "cpu"
    "performance_info": {
        "system": "Darwin",
        "gpu_acceleration": "metal",
        "model": "large-v2",
        "binary_path": "/Users/user/.voicemode/whisper.cpp/main",
        "server_port": 2022,
        "server_url": "http://localhost:2022"
    },
    "launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.whisper-server.plist",  # macOS
    "systemd_service": "/home/user/.config/systemd/user/whisper-server.service",  # Linux
    "start_script": "/Users/user/.voicemode/whisper.cpp/start-whisper-server.sh"
}

install_kokoro_fastapi

Installs kokoro-fastapi for text-to-speech (TTS) functionality.

Features

  • Python environment management with UV
  • Automatic model downloads
  • Service configuration (launchd on macOS, systemd on Linux)
  • Auto-start capability

Usage

# Basic installation with defaults
result = await install_kokoro_fastapi()

# Custom installation
result = await install_kokoro_fastapi(
    install_dir="~/my-kokoro",
    models_dir="~/my-models",
    port=8881,
    auto_start=True,
    install_models=True,
    force_reinstall=False
)

Parameters

  • install_dir (str, optional): Installation directory (default: ~/.voicemode/kokoro-fastapi)
  • models_dir (str, optional): Models directory (default: ~/.voicemode/kokoro-models)
  • port (int, optional): Service port (default: 8880)
  • auto_start (bool, optional): Start service after installation (default: true)
  • install_models (bool, optional): Download Kokoro models (default: true)
  • force_reinstall (bool, optional): Force reinstallation (default: false)

Return Value

{
    "success": True,
    "install_path": "/home/user/.voicemode/kokoro-fastapi",
    "service_url": "http://127.0.0.1:8880",
    "service_status": "managed_by_systemd",  # Linux
    "service_status": "managed_by_launchd",  # macOS
    "systemd_service": "/home/user/.config/systemd/user/kokoro-fastapi-8880.service",  # Linux
    "launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.kokoro-8880.plist",  # macOS
    "start_script": "/home/user/.voicemode/kokoro-fastapi/start-cpu.sh",
    "message": "Kokoro-fastapi installed. Run: cd /home/user/.voicemode/kokoro-fastapi && ./start-cpu.sh"
}

System Requirements

whisper.cpp

macOS

  • Xcode Command Line Tools
  • Homebrew (for cmake)
  • Metal support (built-in)

Linux

  • Build essentials (gcc, g++, make)
  • CMake
  • CUDA toolkit (optional, for NVIDIA GPU support)

kokoro-fastapi

All Systems

  • Python 3.10+
  • Git
  • ~5GB disk space for models
  • UV package manager (installed automatically if missing)

Integration with Voice Mode

After installation, the services integrate automatically with Voice Mode:

  1. whisper.cpp:
  2. Runs automatically on boot (port 2022)
  3. OpenAI-compatible API endpoint
  4. Model selection via VOICEMODE_WHISPER_MODEL environment variable
  5. View installed models: claude resource read whisper://models

  6. kokoro-fastapi:

  7. Automatically detected by Voice Mode's provider registry when running
  8. 67 voices available
  9. OpenAI-compatible API endpoint

Examples

Complete Setup

# Install both services with defaults
whisper_result = await install_whisper_cpp()  # Uses large-v2 by default
kokoro_result = await install_kokoro_fastapi()

# Check installation status
if whisper_result["success"] and kokoro_result["success"]:
    print("Voice services installed successfully!")
    print(f"Whisper: {whisper_result['install_path']}")
    print(f"Whisper server: {whisper_result['performance_info']['server_url']}")
    print(f"Kokoro API: {kokoro_result['service_url']}")

Upgrade Existing Installation

# Force reinstall with larger model
result = await install_whisper_cpp(
    model="large-v3",
    force_reinstall=True
)

Custom Configuration

# Install kokoro-fastapi on different port
result = await install_kokoro_fastapi(
    port=9000,
    models_dir="/opt/models/kokoro"
)

Troubleshooting

Common Issues

  1. Missing Dependencies
  2. The tools will report missing dependencies with installation instructions
  3. Follow the provided commands to install required packages

  4. Port Conflicts

  5. If port 8880 is in use, specify a different port for kokoro-fastapi
  6. Check running services: lsof -i :8880

  7. GPU Not Detected

  8. On Linux, ensure NVIDIA drivers and CUDA are installed
  9. Use nvidia-smi to verify GPU availability
  10. Force CPU mode with use_gpu=False if needed

  11. Model Download Failures

  12. Check internet connection
  13. Verify sufficient disk space
  14. Try smaller models first (tiny, base)

Manual Service Management

macOS (launchd)

# Whisper
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist

# Kokoro
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist

Linux (systemd)

# Whisper
systemctl --user start whisper-server
systemctl --user stop whisper-server
systemctl --user status whisper-server

# Kokoro
systemctl --user start kokoro-fastapi-8880
systemctl --user stop kokoro-fastapi-8880
systemctl --user status kokoro-fastapi-8880

Change Whisper Model

# Set environment variable before restarting
export VOICEMODE_WHISPER_MODEL=base.en  # or tiny, small, medium, large-v2, large-v3

# Restart service to apply change
# macOS
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist

# Linux
systemctl --user restart whisper-server

Testing

Run the test suite to verify installation tools:

cd /path/to/voicemode
python -m pytest tests/test_installers.py -v

# Skip integration tests (no actual installation)
SKIP_INTEGRATION_TESTS=1 python -m pytest tests/test_installers.py -v

Contributing

When adding new installation tools:

  1. Create a new function in voice_mode/tools/installers.py
  2. Use the @mcp.tool() decorator
  3. Follow the existing pattern for error handling and return values
  4. Add comprehensive tests in tests/test_installers.py
  5. Update this documentation

Security Notes

  • All installations are performed in user space (no sudo required)
  • Models are downloaded from official sources
  • Services bind to localhost only by default
  • No external network access without explicit configuration