chatgpt

Kyutai's Hibiki Zero: 3B Speech-to-Speech Model

Kyutai introduces Hibiki Zero, a compact 3-billion-parameter speech-to-speech model that processes and generates audio directly without intermediate text

Someone spotted Kyutai’s new Hibiki Zero model and it’s pretty interesting for voice work. It’s a 3B parameter model that does speech-to-speech translation without converting to text first.

The model handles real conversations better than older approaches - thinks pauses, overlapping speech, and natural rhythm. Works across multiple languages including English, French, Spanish, and Japanese.

Check it out:

At only 3B parameters, it runs on consumer hardware without needing massive GPU setups. Good option if you’re building voice apps and want something that sounds more natural than the usual TTS-then-translate pipeline.