Kyutai's Hibiki Zero: 3B Speech-to-Speech Model
Kyutai introduces Hibiki Zero, a compact 3-billion-parameter speech-to-speech model that processes and generates audio directly without intermediate text
Someone spotted Kyutai’s new Hibiki Zero model and it’s pretty interesting for voice work. It’s a 3B parameter model that does speech-to-speech translation without converting to text first.
The model handles real conversations better than older approaches - thinks pauses, overlapping speech, and natural rhythm. Works across multiple languages including English, French, Spanish, and Japanese.
Check it out:
- Model weights: https://huggingface.co/kyutai/hibiki-zero-3b-pytorch-bf16
- Audio samples: https://huggingface.co/spaces/kyutai/hibiki-zero-samples
- Full blog: https://kyutai.org/blog/2026-02-12-hibiki-zero
At only 3B parameters, it runs on consumer hardware without needing massive GPU setups. Good option if you’re building voice apps and want something that sounds more natural than the usual TTS-then-translate pipeline.
Related Tips
DeepSeek Quietly Tests Updated Model with Recent Knowledge
DeepSeek conducts quiet testing of an updated AI model that incorporates more recent knowledge and information, potentially improving its capabilities beyond
GPT-OSS 120B Uncensored: Zero Refusals Reported
GPT-OSS 120B Uncensored is an open-source language model reportedly designed without content restrictions, claiming to fulfill all user requests without
DeepSeek V4-Lite Spotted with 1M Token Context
DeepSeek V4-Lite has been observed featuring a one million token context window, significantly expanding its capability to process and analyze extremely large