Supertonic: 66M Parameter TTS Runs 166x Real-Time Locally

Someone found Supertonic, a surprisingly fast text-to-speech model that runs locally with zero network calls.

The speed is wild - RTF 0.006 on an M4 Pro, which means it generates audio 166x faster than real-time. It’s only 66M parameters, so it actually fits on phones and browsers without melting the CPU.

What makes it interesting:

Works in 5 languages (Korean, Spanish, French, Portuguese, English)
10 preset voices to choose from
Completely offline - good for privacy-sensitive projects
Commercial use allowed under OpenRAIL-M license

Try it:

Demo: https://huggingface.co/spaces/Supertone/supertonic-2
Model: https://huggingface.co/Supertone/supertonic-2
Code: https://github.com/supertone-inc/supertonic

Pretty solid option if someone needs TTS that doesn’t ping servers or eat bandwidth. The lightweight footprint means it could work for real-time apps where cloud APIs would add too much lag.

Supertonic: 66M Parameter TTS Runs 166x Real-Time Locally

Related Tips

"Take a Deep Breath" Boosts AI Accuracy on Hard Tasks

Free Tool Tests Qwen's Voice Cloning (No GPU Needed)

ACE-Step 1.5: Fast Open-Source Music Generator