coding

Qwen3-TTS: Fast Local ElevenLabs Alternative

Qwen3-TTS offers a fast, locally-run text-to-speech solution that serves as an alternative to ElevenLabs, providing high-quality voice synthesis without cloud

Someone found that Qwen3-TTS is basically a local alternative to ElevenLabs that actually sounds human and runs stupid fast.

The cool part: it’s OpenAI-compatible, so existing code works with just a URL swap. Plus it does voice cloning from 3-second clips and follows natural instructions like “make this sound nervous and shaky”.

Quick Docker setup:

docker run --gpus all -p 8880:8880 qwen3-tts-api

Drop-in Python usage:


client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
 model="qwen3-tts",
 voice="Vivian",
 input="Your text here"
)
response.stream_to_file("output.mp3")

Hits ~97ms latency for streaming, which means it starts talking almost instantly. Works with Open-Webui right out of the box.