Qwen3-TTS: Fast Local ElevenLabs Alternative

Someone found that Qwen3-TTS is basically a local alternative to ElevenLabs that actually sounds human and runs stupid fast.

The cool part: it’s OpenAI-compatible, so existing code works with just a URL swap. Plus it does voice cloning from 3-second clips and follows natural instructions like “make this sound nervous and shaky”.

Quick Docker setup:

docker run --gpus all -p 8880:8880 qwen3-tts-api

Drop-in Python usage:


client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
 model="qwen3-tts",
 voice="Vivian",
 input="Your text here"
)
response.stream_to_file("output.mp3")

Hits ~97ms latency for streaming, which means it starts talking almost instantly. Works with Open-Webui right out of the box.

Qwen3-TTS: Fast Local ElevenLabs Alternative

Related Tips

KaniTTS2: Fast Local Text-to-Speech with Cloning

AdaLLM: True FP4 Inference on RTX 4090s Without FP16 Fallbac

Chatbot Framework Rebuilt in Rust: 10MB Binary