KaniTTS2: Fast Local Text-to-Speech with Cloning
KaniTTS2 provides a fast, locally-run text-to-speech system with voice cloning capabilities, enabling users to generate natural-sounding speech from text while
Someone just open-sourced KaniTTS2, a pretty fast text-to-speech model that runs locally and includes voice cloning.
The interesting bits:
- Hits ~0.2 RTF on an RTX 5090 (basically real-time)
- Only needs 3GB VRAM
- Supports English and Spanish, with accents
- They released the full training code, not just inference
Links to grab:
- Pretrained model: https://huggingface.co/nineninesix/kani-tts-2-pt
- English version: https://huggingface.co/nineninesix/kani-tts-2-en
- Training code: https://github.com/nineninesix-ai/kani-tts-2-pretrain
The training code is the cool part - you can actually train your own TTS from scratch for specific languages or accents. They trained theirs in 6 hours on 8x H100s using 10k hours of speech data. Apache 2.0 licensed, so no weird restrictions.
Related Tips
AdaLLM: True FP4 Inference on RTX 4090s Without FP16 Fallbac
AdaLLM enables genuine 4-bit floating-point inference on RTX 4090 GPUs without reverting to 16-bit precision, delivering faster and more memory-efficient large
Chatbot Framework Rebuilt in Rust: 10MB Binary
A chatbot framework originally written in another language has been completely rewritten in Rust, resulting in a remarkably compact 10MB binary that
Femtobot: 10MB Rust Telegram Bot vs 350MB Python
A developer compares building a Telegram bot in Rust versus Python, showing how the Rust version achieves a 10MB binary size compared to Python's 350MB