coding

New llama.cpp Models Need GGUF Quantizations

Users must convert new llama.cpp models to GGUF format through quantization processes before they can be used with the llama.cpp inference engine for local

Someone noticed that llama.cpp just added support for two new models, but there’s a gap before the usual quantized versions show up.

The releases:

Checking the usual HuggingFace spots (Kimi GGUFs & Step-3.5 GGUFs) shows nothing from the popular quantizers yet - probably hitting today or tomorrow.

Quick workaround: The ik_llama community already has a Step-3.5-Flash GGUF up at https://huggingface.co/ubergarm/Step-3