coding

Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs

Unsloth Kernels enables efficient fine-tuning of 30 billion parameter Mixture of Experts models on consumer-grade GPUs through optimized memory management and

Someone found that Unsloth’s new kernels let you fine-tune massive MoE models on surprisingly cheap hardware now.

The specs are pretty wild:

  • gpt-oss-20b fits in 12.8GB VRAM (runs on a single RTX 3090)
  • Qwen3-30B needs just 63GB for 16-bit LoRA
  • 12x faster training with 35% less memory than before
  • Works on consumer GPUs, not just data-center stuff

They built custom Triton kernels that optimize the grouped matrix multiplications in MoE architectures. Turns out the memory savings scale exponentially - the bigger your model and context length, the more you save.

Free Colab notebooks to try it:

Works with Qwen3, DeepSeek R1/V3, and GLM models too. The whole thing is open source, so anyone can run billion-parameter MoE fine-tuning without renting expensive cloud instances