Skip to content

Kokoro Text-to-Speech Setup

Kokoro is a high-quality local text-to-speech service that provides natural-sounding voices in multiple languages. It offers an OpenAI-compatible API that VoiceMode can use as an alternative to cloud-based TTS services.

Quick Start

# Install kokoro service
voice-mode kokoro install

# Start the service
voice-mode kokoro start

# Check status
voice-mode kokoro status

Default endpoint: http://127.0.0.1:8880/v1

Installation Methods

VoiceMode includes an installation tool that handles everything:

# Install kokoro with default settings
voice-mode kokoro install

# Or using Claude Code
claude converse "Please install kokoro-fastapi"

This will: - Clone the kokoro-fastapi repository to ~/.voicemode/kokoro-fastapi - Install UV package manager if needed - Set up automatic startup (systemd on Linux, launchd on macOS) - Start the service on port 8880 - Download models automatically on first use

Manual Installation

Prerequisites

# Ensure Python 3.8+ is installed
python3 --version

# Install uvx
pip install uvx

Download and Run

# Create models directory
mkdir -p ~/Models/kokoro

# Run kokoro-fastapi with uvx
uvx kokoro-fastapi[cpu] serve \
  --host 127.0.0.1 \
  --port 8880 \
  --models-dir ~/Models/kokoro

Models download automatically from Hugging Face on first use.

Available Voices

English Voices

  • American Female: af_sky (default), af_sarah
  • American Male: am_adam, am_michael
  • British Female: bf_emma, bf_isabella
  • British Male: bm_george, bm_lewis

International Voices

  • Spanish: ef_dora (female), em_alex (male)
  • French: ff_siwis (female), fm_gabriel (male)
  • Italian: if_sara (female), im_nicola (male)
  • Portuguese: pf_dora (female), pm_alex (male)
  • Chinese: zf_xiaobei (female), zm_yunjian (male)
  • Japanese: jf_alpha (female), jm_kumo (male)
  • Hindi: hf_alpha (female), hm_omega (male)

Service Configuration

Environment Variables

Configure in ~/.voicemode/voicemode.env:

VOICEMODE_KOKORO_PORT=8880
VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro
VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro
VOICEMODE_KOKORO_DEFAULT_VOICE=af_sky

Service Management

macOS (LaunchAgent)

# Start/stop service
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist

# Enable/disable at startup
launchctl load -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist

# Check status
launchctl list | grep kokoro

Linux (Systemd)

# Start/stop service
systemctl --user start kokoro
systemctl --user stop kokoro

# Enable/disable at startup
systemctl --user enable kokoro
systemctl --user disable kokoro

# Check status and logs
systemctl --user status kokoro
journalctl --user -u kokoro -f

Integration with VoiceMode

VoiceMode automatically detects Kokoro when available:

  1. First: Checks for Kokoro on http://127.0.0.1:8880/v1
  2. Fallback: Uses OpenAI API (requires OPENAI_API_KEY)

Custom Configuration

To use a different endpoint or specify a voice:

export TTS_BASE_URL=http://127.0.0.1:8880/v1
export TTS_VOICE=af_sky  # Optional: specify voice

Or in MCP configuration:

"voice-mode": {
  "env": {
    "TTS_BASE_URL": "http://127.0.0.1:8880/v1",
    "TTS_VOICE": "af_sky"
  }
}

Fully Local Setup

For completely offline voice processing, combine Kokoro with Whisper:

export TTS_BASE_URL=http://127.0.0.1:8880/v1  # Kokoro for TTS
export STT_BASE_URL=http://127.0.0.1:2022/v1  # Whisper for STT
export TTS_VOICE=af_sky                       # Kokoro voice

Service Files

macOS LaunchAgent

Create ~/Library/LaunchAgents/com.voicemode.kokoro.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.voicemode.kokoro</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/uvx</string>
        <string>kokoro-fastapi[cpu]</string>
        <string>serve</string>
        <string>--host</string>
        <string>127.0.0.1</string>
        <string>--port</string>
        <string>8880</string>
        <string>--models-dir</string>
        <string>/Users/YOUR_USERNAME/Models/kokoro</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/usr/local/bin:/usr/bin:/bin</string>
    </dict>
</dict>
</plist>

Linux Systemd Service

Create ~/.config/systemd/user/kokoro.service:

[Unit]
Description=Kokoro Text-to-Speech Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/uvx kokoro-fastapi[cpu] serve \
    --host 127.0.0.1 \
    --port 8880 \
    --models-dir %h/Models/kokoro
Restart=always
RestartSec=10
Environment="PATH=/usr/local/bin:/usr/bin:/bin"

[Install]
WantedBy=default.target

Performance

Kokoro runs locally on your machine: - Generation time: 1-3 seconds for short phrases - CPU usage: Moderate, depends on text length - Memory: ~500MB-1GB depending on loaded models - Disk space: ~300MB per language model

For better performance: - Use CPU version for most systems: kokoro-fastapi[cpu] - GPU version available for CUDA-enabled systems - Adjust cache directory to SSD for faster access

Troubleshooting

Service Won't Start

  • Check if port 8880 is already in use: lsof -i :8880
  • Verify uvx is installed: which uvx
  • Check Python version: python3 --version (requires 3.8+)

Models Not Found

  • Ensure models directory exists and has correct permissions
  • Models download automatically on first request
  • Manual download: https://huggingface.co/hexgrad/Kokoro-82M

Voice Not Working

  • Verify service is running: curl http://127.0.0.1:8880/v1/models
  • Check logs for errors (see service management commands)
  • Try a different voice to rule out model issues

Performance Issues

  • Ensure adequate CPU resources are available
  • Consider using a smaller text chunk size
  • Check disk I/O if models are on slow storage

File Locations

  • Models: ~/Models/kokoro/ or ~/.voicemode/services/kokoro/models/
  • Cache: ~/.voicemode/cache/kokoro/
  • Service Files:
  • macOS: ~/Library/LaunchAgents/com.voicemode.kokoro.plist
  • Linux: ~/.config/systemd/user/kokoro.service
  • Installation: ~/.voicemode/kokoro-fastapi/