Kyutai's Hibiki Zero: 3B Speech-to-Speech Model

Someone spotted Kyutai’s new Hibiki Zero model and it’s pretty interesting for voice work. It’s a 3B parameter model that does speech-to-speech translation without converting to text first.

The model handles real conversations better than older approaches - thinks pauses, overlapping speech, and natural rhythm. Works across multiple languages including English, French, Spanish, and Japanese.

Check it out:

Model weights: https://huggingface.co/kyutai/hibiki-zero-3b-pytorch-bf16
Audio samples: https://huggingface.co/spaces/kyutai/hibiki-zero-samples
Full blog: https://kyutai.org/blog/2026-02-12-hibiki-zero

At only 3B parameters, it runs on consumer hardware without needing massive GPU setups. Good option if you’re building voice apps and want something that sounds more natural than the usual TTS-then-translate pipeline.

Kyutai's Hibiki Zero: 3B Speech-to-Speech Model

Related Tips

DeepSeek Quietly Tests Updated Model with Recent Knowledge

GPT-OSS 120B Uncensored: Zero Refusals Reported

DeepSeek V4-Lite Spotted with 1M Token Context