Fast CPU-Only TTS: Sopro Clones Voices in 0.25 RTF
Sopro delivers fast CPU-only text-to-speech with voice cloning capabilities, achieving impressive 0.25 real-time factor performance without requiring GPU
Someone built a surprisingly fast text-to-speech model called Sopro that runs on regular CPUs without needing a GPU.
The interesting bit is the speed - it hits 0.25 RTF, which means generating 30 seconds of audio only takes 7.5 seconds on a CPU. Most TTS models either need serious hardware or take forever to process.
Key specs:
- 169M parameters (pretty small)
- Zero-shot voice cloning with just 3-12 seconds of reference audio
- Streaming support for real-time applications
- Apache 2.0 license (completely open)
The creator admits it’s not perfect - voice cloning can be hit-or-miss and it gets unstable sometimes. Only does English too, since it was trained on a single L40S GPU.
Still, for a side project that runs locally without GPU requirements, it’s a solid option for quick prototypes.
Related Tips
Verity: Local AI Search Engine Like Perplexity
Verity is a local AI search engine that runs entirely on a user's device, providing privacy-focused searches similar to Perplexity without sending data to
ACE-Step 1.5: Free Local Music AI Rivals Suno v4/v5
ACE-Step 1.5 is an open-source music generation AI model that runs locally on consumer hardware, offering quality comparable to commercial services like Suno
MOVA: Open-Source Synchronized Video & Audio Gen
MOVA is an open-source framework that generates synchronized video and audio content simultaneously, enabling coherent multimodal media creation through