coding

GLM-4 9B GGUF Quantization In Progress

GLM-4 9B GGUF quantization is currently underway, converting the model into optimized GGUF format for efficient local deployment and reduced memory usage.

Someone’s working on quantizing GLM-4 (the 9B Chinese language model) and shared the GGUF files before finishing the full set.

The repo is live at https://huggingface.co/AaryanK/GLM-4.7-GGUF but still being updated since it’s a big model. GLM-4 is pretty interesting - handles both English and Chinese, supports vision tasks, and has a 128K context window.

For anyone wanting to run it locally with llama.cpp or Ollama once the quants finish:

# Download with huggingface-cli huggingface-cli download AaryanK/GLM-4.7-GGUF

Worth bookmarking if you need a solid bilingual model that runs locally. The original is 9B parameters, so the quantized versions should be way more practical for consumer hardware. Check back in a day or two for the complete quant collection.