Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs
Unsloth Kernels enables efficient fine-tuning of 30 billion parameter Mixture of Experts models on consumer-grade GPUs through optimized memory management and
Someone found that Unsloth’s new kernels let you fine-tune massive MoE models on surprisingly cheap hardware now.
The specs are pretty wild:
- gpt-oss-20b fits in 12.8GB VRAM (runs on a single RTX 3090)
- Qwen3-30B needs just 63GB for 16-bit LoRA
- 12x faster training with 35% less memory than before
- Works on consumer GPUs, not just data-center stuff
They built custom Triton kernels that optimize the grouped matrix multiplications in MoE architectures. Turns out the memory savings scale exponentially - the bigger your model and context length, the more you save.
Free Colab notebooks to try it:
- gpt-oss (20B): https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-Fine-tuning.ipynb
- Main repo: https://github.com/unslothai/unsloth
Works with Qwen3, DeepSeek R1/V3, and GLM models too. The whole thing is open source, so anyone can run billion-parameter MoE fine-tuning without renting expensive cloud instances
Related Tips
Nvidia's DMS Cuts LLM Memory Usage by 8x
Nvidia introduces Dynamic Memory Scheduling that reduces large language model memory consumption by eight times, enabling more efficient AI inference and
Unsloth Kernels: 12x Faster MoE Training, 12GB VRAM
Unsloth Kernels achieves 12x faster Mixture of Experts model training while using only 12GB of VRAM through optimized kernel implementations and memory
New llama.cpp Models Need GGUF Quantizations
Users must convert new llama.cpp models to GGUF format through quantization processes before they can be used with the llama.cpp inference engine for local