Kokoro Text-to-Speech Setup¶

Kokoro is a high-quality local text-to-speech service that provides natural-sounding voices in multiple languages. It offers an OpenAI-compatible API that VoiceMode can use as an alternative to cloud-based TTS services.

Quick Start¶

# Install kokoro service
voice-mode kokoro install

# Start the service
voice-mode kokoro start

# Check status
voice-mode kokoro status

Default endpoint: http://127.0.0.1:8880/v1

Installation Methods¶

Automatic Installation (Recommended)¶

VoiceMode includes an installation tool that handles everything:

# Install kokoro with default settings
voice-mode kokoro install

# Or using Claude Code
claude converse "Please install kokoro-fastapi"

This will: - Clone the kokoro-fastapi repository to ~/.voicemode/kokoro-fastapi - Install UV package manager if needed - Set up automatic startup (systemd on Linux, launchd on macOS) - Start the service on port 8880 - Download models automatically on first use

Manual Installation¶

Prerequisites¶

# Ensure Python 3.8+ is installed
python3 --version

# Install uvx
pip install uvx

Download and Run¶

# Create models directory
mkdir -p ~/Models/kokoro

# Run kokoro-fastapi with uvx
uvx kokoro-fastapi[cpu] serve \
  --host 127.0.0.1 \
  --port 8880 \
  --models-dir ~/Models/kokoro

Models download automatically from Hugging Face on first use.

Available Voices¶

English Voices¶

American Female: af_sky (default), af_sarah
American Male: am_adam, am_michael
British Female: bf_emma, bf_isabella
British Male: bm_george, bm_lewis

International Voices¶

Spanish: ef_dora (female), em_alex (male)
French: ff_siwis (female), fm_gabriel (male)
Italian: if_sara (female), im_nicola (male)
Portuguese: pf_dora (female), pm_alex (male)
Chinese: zf_xiaobei (female), zm_yunjian (male)
Japanese: jf_alpha (female), jm_kumo (male)
Hindi: hf_alpha (female), hm_omega (male)

Service Configuration¶

Environment Variables¶

Configure in ~/.voicemode/voicemode.env:

VOICEMODE_KOKORO_PORT=8880
VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro
VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro
VOICEMODE_KOKORO_DEFAULT_VOICE=af_sky

Service Management¶

macOS (LaunchAgent)¶

# Start/stop service
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist

# Enable/disable at startup
launchctl load -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist

# Check status
launchctl list | grep kokoro

Linux (Systemd)¶

# Start/stop service
systemctl --user start kokoro
systemctl --user stop kokoro

# Enable/disable at startup
systemctl --user enable kokoro
systemctl --user disable kokoro

# Check status and logs
systemctl --user status kokoro
journalctl --user -u kokoro -f

Integration with VoiceMode¶

VoiceMode automatically detects Kokoro when available:

First: Checks for Kokoro on http://127.0.0.1:8880/v1
Fallback: Uses OpenAI API (requires OPENAI_API_KEY)

Custom Configuration¶

To use a different endpoint or specify a voice:

export TTS_BASE_URL=http://127.0.0.1:8880/v1
export TTS_VOICE=af_sky  # Optional: specify voice

Or in MCP configuration:

"voice-mode": {
  "env": {
    "TTS_BASE_URL": "http://127.0.0.1:8880/v1",
    "TTS_VOICE": "af_sky"
  }
}

Fully Local Setup¶

For completely offline voice processing, combine Kokoro with Whisper:

export TTS_BASE_URL=http://127.0.0.1:8880/v1  # Kokoro for TTS
export STT_BASE_URL=http://127.0.0.1:2022/v1  # Whisper for STT
export TTS_VOICE=af_sky                       # Kokoro voice

Service Files¶

macOS LaunchAgent¶

Create ~/Library/LaunchAgents/com.voicemode.kokoro.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.voicemode.kokoro</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/uvx</string>
        <string>kokoro-fastapi[cpu]</string>
        <string>serve</string>
        <string>--host</string>
        <string>127.0.0.1</string>
        <string>--port</string>
        <string>8880</string>
        <string>--models-dir</string>
        <string>/Users/YOUR_USERNAME/Models/kokoro</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/usr/local/bin:/usr/bin:/bin</string>
    </dict>
</dict>
</plist>

Linux Systemd Service¶

Create ~/.config/systemd/user/kokoro.service:

[Unit]
Description=Kokoro Text-to-Speech Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/uvx kokoro-fastapi[cpu] serve \
    --host 127.0.0.1 \
    --port 8880 \
    --models-dir %h/Models/kokoro
Restart=always
RestartSec=10
Environment="PATH=/usr/local/bin:/usr/bin:/bin"

[Install]
WantedBy=default.target

Performance¶

Kokoro runs locally on your machine: - Generation time: 1-3 seconds for short phrases - CPU usage: Moderate, depends on text length - Memory: ~500MB-1GB depending on loaded models - Disk space: ~300MB per language model

For better performance: - Use CPU version for most systems: kokoro-fastapi[cpu] - GPU version available for CUDA-enabled systems - Adjust cache directory to SSD for faster access

Troubleshooting¶

Service Won't Start¶

Check if port 8880 is already in use: lsof -i :8880
Verify uvx is installed: which uvx
Check Python version: python3 --version (requires 3.8+)

Models Not Found¶

Ensure models directory exists and has correct permissions
Models download automatically on first request
Manual download: https://huggingface.co/hexgrad/Kokoro-82M

Voice Not Working¶

Verify service is running: curl http://127.0.0.1:8880/v1/models
Check logs for errors (see service management commands)
Try a different voice to rule out model issues

Performance Issues¶

Ensure adequate CPU resources are available
Consider using a smaller text chunk size
Check disk I/O if models are on slow storage

File Locations¶

Models: ~/Models/kokoro/ or ~/.voicemode/services/kokoro/models/
Cache: ~/.voicemode/cache/kokoro/
Service Files:
macOS: ~/Library/LaunchAgents/com.voicemode.kokoro.plist
Linux: ~/.config/systemd/user/kokoro.service
Installation: ~/.voicemode/kokoro-fastapi/