Kokoro Text-to-Speech Setup¶
Kokoro is a high-quality local text-to-speech service that provides natural-sounding voices in multiple languages. It offers an OpenAI-compatible API that VoiceMode can use as an alternative to cloud-based TTS services.
Quick Start¶
# Install kokoro service
voice-mode kokoro install
# Start the service
voice-mode kokoro start
# Check status
voice-mode kokoro status
Default endpoint: http://127.0.0.1:8880/v1
Installation Methods¶
Automatic Installation (Recommended)¶
VoiceMode includes an installation tool that handles everything:
# Install kokoro with default settings
voice-mode kokoro install
# Or using Claude Code
claude converse "Please install kokoro-fastapi"
This will: - Clone the kokoro-fastapi repository to ~/.voicemode/kokoro-fastapi - Install UV package manager if needed - Set up automatic startup (systemd on Linux, launchd on macOS) - Start the service on port 8880 - Download models automatically on first use
Manual Installation¶
Prerequisites¶
Download and Run¶
# Create models directory
mkdir -p ~/Models/kokoro
# Run kokoro-fastapi with uvx
uvx kokoro-fastapi[cpu] serve \
--host 127.0.0.1 \
--port 8880 \
--models-dir ~/Models/kokoro
Models download automatically from Hugging Face on first use.
Available Voices¶
English Voices¶
- American Female:
af_sky(default),af_sarah - American Male:
am_adam,am_michael - British Female:
bf_emma,bf_isabella - British Male:
bm_george,bm_lewis
International Voices¶
- Spanish:
ef_dora(female),em_alex(male) - French:
ff_siwis(female),fm_gabriel(male) - Italian:
if_sara(female),im_nicola(male) - Portuguese:
pf_dora(female),pm_alex(male) - Chinese:
zf_xiaobei(female),zm_yunjian(male) - Japanese:
jf_alpha(female),jm_kumo(male) - Hindi:
hf_alpha(female),hm_omega(male)
Service Configuration¶
Environment Variables¶
Configure in ~/.voicemode/voicemode.env:
VOICEMODE_KOKORO_PORT=8880
VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro
VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro
VOICEMODE_KOKORO_DEFAULT_VOICE=af_sky
Service Management¶
macOS (LaunchAgent)¶
# Start/stop service
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist
# Enable/disable at startup
launchctl load -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist
# Check status
launchctl list | grep kokoro
Linux (Systemd)¶
# Start/stop service
systemctl --user start kokoro
systemctl --user stop kokoro
# Enable/disable at startup
systemctl --user enable kokoro
systemctl --user disable kokoro
# Check status and logs
systemctl --user status kokoro
journalctl --user -u kokoro -f
Integration with VoiceMode¶
VoiceMode automatically detects Kokoro when available:
- First: Checks for Kokoro on
http://127.0.0.1:8880/v1 - Fallback: Uses OpenAI API (requires
OPENAI_API_KEY)
Custom Configuration¶
To use a different endpoint or specify a voice:
Or in MCP configuration:
Fully Local Setup¶
For completely offline voice processing, combine Kokoro with Whisper:
export TTS_BASE_URL=http://127.0.0.1:8880/v1 # Kokoro for TTS
export STT_BASE_URL=http://127.0.0.1:2022/v1 # Whisper for STT
export TTS_VOICE=af_sky # Kokoro voice
Service Files¶
macOS LaunchAgent¶
Create ~/Library/LaunchAgents/com.voicemode.kokoro.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.voicemode.kokoro</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/uvx</string>
<string>kokoro-fastapi[cpu]</string>
<string>serve</string>
<string>--host</string>
<string>127.0.0.1</string>
<string>--port</string>
<string>8880</string>
<string>--models-dir</string>
<string>/Users/YOUR_USERNAME/Models/kokoro</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/usr/local/bin:/usr/bin:/bin</string>
</dict>
</dict>
</plist>
Linux Systemd Service¶
Create ~/.config/systemd/user/kokoro.service:
[Unit]
Description=Kokoro Text-to-Speech Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/uvx kokoro-fastapi[cpu] serve \
--host 127.0.0.1 \
--port 8880 \
--models-dir %h/Models/kokoro
Restart=always
RestartSec=10
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
[Install]
WantedBy=default.target
Performance¶
Kokoro runs locally on your machine: - Generation time: 1-3 seconds for short phrases - CPU usage: Moderate, depends on text length - Memory: ~500MB-1GB depending on loaded models - Disk space: ~300MB per language model
For better performance: - Use CPU version for most systems: kokoro-fastapi[cpu] - GPU version available for CUDA-enabled systems - Adjust cache directory to SSD for faster access
Troubleshooting¶
Service Won't Start¶
- Check if port 8880 is already in use:
lsof -i :8880 - Verify uvx is installed:
which uvx - Check Python version:
python3 --version(requires 3.8+)
Models Not Found¶
- Ensure models directory exists and has correct permissions
- Models download automatically on first request
- Manual download: https://huggingface.co/hexgrad/Kokoro-82M
Voice Not Working¶
- Verify service is running:
curl http://127.0.0.1:8880/v1/models - Check logs for errors (see service management commands)
- Try a different voice to rule out model issues
Performance Issues¶
- Ensure adequate CPU resources are available
- Consider using a smaller text chunk size
- Check disk I/O if models are on slow storage
File Locations¶
- Models:
~/Models/kokoro/or~/.voicemode/services/kokoro/models/ - Cache:
~/.voicemode/cache/kokoro/ - Service Files:
- macOS:
~/Library/LaunchAgents/com.voicemode.kokoro.plist - Linux:
~/.config/systemd/user/kokoro.service - Installation:
~/.voicemode/kokoro-fastapi/