New llama.cpp Models Need GGUF Quantizations

Someone noticed that llama.cpp just added support for two new models, but there’s a gap before the usual quantized versions show up.

The releases:

Step3.5-Flash: https://github.com/ggml-org/llama.cpp/releases/tag/b7964
Kimi-Linear-48B-A3B: https://github.com/ggml-org/llama.cpp/releases/tag/b7957

Checking the usual HuggingFace spots (Kimi GGUFs & Step-3.5 GGUFs) shows nothing from the popular quantizers yet - probably hitting today or tomorrow.

Quick workaround: The ik_llama community already has a Step-3.5-Flash GGUF up at https://huggingface.co/ubergarm/Step-3

New llama.cpp Models Need GGUF Quantizations

Related Tips

Nvidia's DMS Cuts LLM Memory Usage by 8x

Unsloth Kernels: 12x Faster MoE Training, 12GB VRAM

Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs