Supertonic: 66M Parameter TTS Runs 166x Real-Time Locally
Supertonic is a 66 million parameter text-to-speech model that runs 166 times faster than real-time on local hardware, enabling efficient voice synthesis
Someone found Supertonic, a surprisingly fast text-to-speech model that runs locally with zero network calls.
The speed is wild - RTF 0.006 on an M4 Pro, which means it generates audio 166x faster than real-time. It’s only 66M parameters, so it actually fits on phones and browsers without melting the CPU.
What makes it interesting:
- Works in 5 languages (Korean, Spanish, French, Portuguese, English)
- 10 preset voices to choose from
- Completely offline - good for privacy-sensitive projects
- Commercial use allowed under OpenRAIL-M license
Try it:
- Demo: https://huggingface.co/spaces/Supertone/supertonic-2
- Model: https://huggingface.co/Supertone/supertonic-2
- Code: https://github.com/supertone-inc/supertonic
Pretty solid option if someone needs TTS that doesn’t ping servers or eat bandwidth. The lightweight footprint means it could work for real-time apps where cloud APIs would add too much lag.
Related Tips
GLM-4.7: New Chinese 7B Model with 128k Context
GLM-4.7 is a newly released 7 billion parameter Chinese language model featuring a 128,000 token context window, offering improved performance for long-form
Google Releases Gemma Scope 2 for Model Interpretability
Google releases Gemma Scope 2, an advanced interpretability tool that helps researchers understand and analyze the internal workings of AI language models
DeepSeek-R1: Budget AI Rivaling GPT-4 Performance
DeepSeek-R1 emerges as a budget-friendly AI model that delivers performance comparable to GPT-4, offering advanced reasoning capabilities at a fraction of the