Kimi-Linear Q2_K Quantization Fixed in llama.cpp

Someone got Kimi-Linear working properly in llama.cpp after fixing some broken Q2_K quantization issues. Turns out the Q2_K version now handles logic puzzles and long-context tasks that were completely broken before.

Quick start:

Pull the fixed branch:

git clone https://github.com/ggml-org/llama.cpp cd llama.cpp git fetch origin pull/18381/head:kimi-linear git checkout kimi-linear

Grab the model: https://huggingface.co/AaryanK/Kimi-Linear-48B-A3B-Instruct-GGUF

Or use this Colab notebook to skip the setup: https://colab.research.google.com/drive/1NMHMmmht-jxyfZqJr5xMlOE3O2O4-WDq

The coherence improvements at Q2_K are apparently pretty significant - basic math and essay generation that failed before now work. Worth testing if you’ve been waiting for better quantization support on this model.

Kimi-Linear Q2_K Quantization Fixed in llama.cpp

Related Tips

Nvidia's DMS Cuts LLM Memory Usage by 8x

Unsloth Kernels: 12x Faster MoE Training, 12GB VRAM

Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs