Fast CPU-Only TTS: Sopro Clones Voices in 0.25 RTF
Sopro delivers fast CPU-only text-to-speech with voice cloning capabilities, achieving impressive 0.25 real-time factor performance without requiring GPU
Someone built a surprisingly fast text-to-speech model called Sopro that runs on regular CPUs without needing a GPU.
The interesting bit is the speed - it hits 0.25 RTF, which means generating 30 seconds of audio only takes 7.5 seconds on a CPU. Most TTS models either need serious hardware or take forever to process.
Key specs:
- 169M parameters (pretty small)
- Zero-shot voice cloning with just 3-12 seconds of reference audio
- Streaming support for real-time applications
- Apache 2.0 license (completely open)
The creator admits it’s not perfect - voice cloning can be hit-or-miss and it gets unstable sometimes. Only does English too, since it was trained on a single L40S GPU.
Still, for a side project that runs locally without GPU requirements, it’s a solid option for quick prototypes.
Related Tips
DTS: Parallel Beam Search for Dialogue Strategies
The paper presents DTS, a method using parallel beam search to efficiently optimize dialogue strategies by exploring multiple conversation paths simultaneously
Solar 100B CEO Defends Model Against Cloning Claims
Solar 100B CEO addresses allegations that the company's large language model was cloned from competitors, defending the originality of their AI development
Liquid AI Releases LFM2.5: Five 1B Models, One Architecture
Liquid AI announces LFM2.5, a collection of five specialized 1-billion parameter models built on a unified architecture for audio, vision, language, and