Audio Format Migration Guide¶
Overview¶
Voice Mode now uses PCM audio format by default for TTS streaming. This change provides:
- Zero encoding latency - No compression overhead for real-time streaming
- Best streaming performance - Direct audio data without conversion
- Maximum compatibility - Works with all audio systems
- Instant playback - No decoding required
For STT uploads and audio saving, compressed formats like Opus are still available.
Important Note: While Opus was originally intended for streaming due to its low-latency design, in practice it requires full buffering before playback. PCM is the only format that truly supports progressive streaming for TTS.
Quick Start¶
For most users, no action is required. Voice Mode will automatically use PCM format for TTS streaming, providing the best real-time performance.
To Use Compressed Formats¶
If you prefer compressed formats (trading latency for smaller file sizes):
Or add to your MCP configuration:
{
"mcpServers": {
"voice-mode": {
"command": "uvx",
"args": ["voice-mode"],
"env": {
"OPENAI_API_KEY": "your-key",
"VOICEMODE_TTS_AUDIO_FORMAT": "opus"
}
}
}
}
Configuration Options¶
Basic Configuration¶
# Set default format for all operations
export VOICEMODE_AUDIO_FORMAT="pcm" # Options: pcm, opus, mp3, wav, flac, aac
# PCM is default for TTS streaming (best performance)
export VOICEMODE_TTS_AUDIO_FORMAT="pcm"
Advanced Configuration¶
# Different formats for TTS and STT
export VOICEMODE_TTS_AUDIO_FORMAT="pcm" # For text-to-speech (default)
export VOICEMODE_STT_AUDIO_FORMAT="opus" # For speech-to-text upload
# Quality settings (for compressed formats)
export VOICEMODE_OPUS_BITRATE="32000" # Opus bitrate (default: 32kbps)
export VOICEMODE_MP3_BITRATE="64k" # MP3 bitrate (default: 64k)
export VOICEMODE_AAC_BITRATE="64k" # AAC bitrate (default: 64k)
Provider Compatibility¶
Voice Mode automatically validates format compatibility with your providers:
| Provider | TTS Formats | STT Formats |
|---|---|---|
| OpenAI | opus, mp3, aac, flac, wav, pcm | mp3, opus, wav, flac, m4a, webm |
| Kokoro (local) | mp3, wav | N/A |
| Whisper.cpp (local) | N/A | wav, mp3, opus, flac, m4a |
If you select an unsupported format, Voice Mode will automatically fallback to a compatible format.
Migration from Existing Setup¶
Checking Your Current Setup¶
If you have existing audio files saved with VOICEMODE_SAVE_AUDIO=true, they are likely in MP3 or Opus format. You can check:
Gradual Migration¶
You can run multiple formats side-by-side:
- Keep existing compressed audio files
- TTS streaming uses PCM for best performance
- STT uploads can use compressed formats
- All formats work seamlessly together
Converting Existing Files¶
To convert existing MP3 files to Opus (optional):
# Using ffmpeg
for file in ~/voicemode_audio/*.mp3; do
ffmpeg -i "$file" -c:a libopus -b:a 32k "${file%.mp3}.opus"
done
Troubleshooting¶
Issue: "Provider doesn't support format"¶
Voice Mode will automatically fallback to a supported format. You'll see a log message like:
Note: PCM is universally supported for streaming.
Issue: "Audio playback issues"¶
Some older systems might have issues with Opus playback. Try:
-
Update your audio libraries:
-
Or switch to a compressed format:
Issue: "Larger file sizes than expected"¶
Opus files might appear larger if saved in an OGG container. The actual audio data is still compressed efficiently.
Format Comparison¶
| Format | File Size* | Quality | Latency | Best For |
|---|---|---|---|---|
| PCM | N/A (streaming) | Uncompressed | Zero | TTS streaming (default) |
| Opus | Smallest (100KB) | Excellent for voice | High (buffering required) | STT uploads, saving |
| MP3 | Medium (500KB) | Good | Low | Wide compatibility |
| AAC | Medium (450KB) | Good | Low | Apple ecosystem |
| FLAC | Large (2MB) | Lossless | Low | Archival |
| WAV | Largest (5MB) | Uncompressed | Zero | Local processing |
*Approximate sizes for 1 minute of speech
Benefits of PCM for Streaming¶
- Zero Latency: No encoding/decoding overhead
- Best Performance: Direct audio playback
- Universal Support: Works on all systems
- Streaming Optimized: No buffering for format conversion
- Real-time Ready: Perfect for live conversations
Benefits of Opus for Uploads¶
- Bandwidth Efficiency: Crucial for cloud API calls
- Small File Size: 50-80% smaller than MP3
- Voice Optimized: Designed for speech
- Wide Platform Support: Works on modern systems
- Future-proof: Active development
Changing Default Formats¶
To change from PCM streaming to compressed formats:
-
Set environment variables:
-
Or update your MCP configuration as shown above
-
Restart your MCP client
PCM provides the best streaming performance, but compressed formats are useful for: - Reducing bandwidth usage - Saving audio files - STT uploads to cloud services