Tencent's WeDLM-8B: 3-6x Faster via Diffusion
Tencent introduces WeDLM-8B, a diffusion-based language model that achieves three to six times faster inference speeds compared to traditional autoregressive
Someone found that Tencent’s WeDLM-8B-Instruct is stupidly fast compared to regular language models - like 3-6x faster than vLLM-optimized Qwen3-8B when doing math problems.
It’s a diffusion language model instead of the usual autoregressive setup. The speed boost comes from generating tokens in parallel rather than one-at-a-time, which really pays off for reasoning-heavy tasks.
Quick start:
model = AutoModelForCausalLM.from_pretrained("tencent/WeDLM-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B-Instruct")
Worth checking out at https://huggingface.co/tencent/WeDLM-8B-Instruct if math/reasoning workloads are eating up inference time. The trade-off is it’s a newer architecture, so tooling support isn’t as mature as standard models yet.
Related Tips
Verity: Local AI Search Engine Like Perplexity
Verity is a local AI search engine that runs entirely on a user's device, providing privacy-focused searches similar to Perplexity without sending data to
ACE-Step 1.5: Free Local Music AI Rivals Suno v4/v5
ACE-Step 1.5 is an open-source music generation AI model that runs locally on consumer hardware, offering quality comparable to commercial services like Suno
MOVA: Open-Source Synchronized Video & Audio Gen
MOVA is an open-source framework that generates synchronized video and audio content simultaneously, enabling coherent multimodal media creation through