Qwen3-TTS: Fast Local ElevenLabs Alternative
Qwen3-TTS offers a fast, locally-run text-to-speech solution that serves as an alternative to ElevenLabs, providing high-quality voice synthesis without cloud
Someone found that Qwen3-TTS is basically a local alternative to ElevenLabs that actually sounds human and runs stupid fast.
The cool part: it’s OpenAI-compatible, so existing code works with just a URL swap. Plus it does voice cloning from 3-second clips and follows natural instructions like “make this sound nervous and shaky”.
Quick Docker setup:
docker run --gpus all -p 8880:8880 qwen3-tts-api
Drop-in Python usage:
client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
model="qwen3-tts",
voice="Vivian",
input="Your text here"
)
response.stream_to_file("output.mp3")
Hits ~97ms latency for streaming, which means it starts talking almost instantly. Works with Open-Webui right out of the box.
Related Tips
KaniTTS2: Fast Local Text-to-Speech with Cloning
KaniTTS2 provides a fast, locally-run text-to-speech system with voice cloning capabilities, enabling users to generate natural-sounding speech from text while
AdaLLM: True FP4 Inference on RTX 4090s Without FP16 Fallbac
AdaLLM enables genuine 4-bit floating-point inference on RTX 4090 GPUs without reverting to 16-bit precision, delivering faster and more memory-efficient large
Chatbot Framework Rebuilt in Rust: 10MB Binary
A chatbot framework originally written in another language has been completely rewritten in Rust, resulting in a remarkably compact 10MB binary that