coding

Running Qwen's 397B Model Locally with Quantization

The article explains how to run Qwen's massive 397 billion parameter language model on local hardware using quantization techniques to reduce memory

Someone figured out you can actually run Qwen’s massive 397B parameter model locally now, which is pretty wild.

The trick is using quantized versions - 3-bit runs on a Mac with 192GB RAM, or 4-bit (MXFP4) on an M3 Ultra with 256GB. Performance supposedly matches GPT-5.2 and Claude Opus 4.5.

Quick start:

This is the first open release in the Qwen3.5 family. The fact that something competitive with top-tier proprietary models can run on consumer hardware (granted, high-end consumer hardware) is a big shift from needing cloud GPUs for everything.