chatgpt

Supertonic: 66M Parameter TTS Runs 166x Real-Time Locally

Supertonic is a 66 million parameter text-to-speech model that runs 166 times faster than real-time on local hardware, enabling efficient voice synthesis

Someone found Supertonic, a surprisingly fast text-to-speech model that runs locally with zero network calls.

The speed is wild - RTF 0.006 on an M4 Pro, which means it generates audio 166x faster than real-time. It’s only 66M parameters, so it actually fits on phones and browsers without melting the CPU.

What makes it interesting:

  • Works in 5 languages (Korean, Spanish, French, Portuguese, English)
  • 10 preset voices to choose from
  • Completely offline - good for privacy-sensitive projects
  • Commercial use allowed under OpenRAIL-M license

Try it:

Pretty solid option if someone needs TTS that doesn’t ping servers or eat bandwidth. The lightweight footprint means it could work for real-time apps where cloud APIs would add too much lag.