GLM-4 9B GGUF Quantization In Progress

Someone’s working on quantizing GLM-4 (the 9B Chinese language model) and shared the GGUF files before finishing the full set.

The repo is live at https://huggingface.co/AaryanK/GLM-4.7-GGUF but still being updated since it’s a big model. GLM-4 is pretty interesting - handles both English and Chinese, supports vision tasks, and has a 128K context window.

For anyone wanting to run it locally with llama.cpp or Ollama once the quants finish:

# Download with huggingface-cli huggingface-cli download AaryanK/GLM-4.7-GGUF

Worth bookmarking if you need a solid bilingual model that runs locally. The original is 9B parameters, so the quantized versions should be way more practical for consumer hardware. Check back in a day or two for the complete quant collection.

GLM-4 9B GGUF Quantization In Progress

Related Tips

Nvidia's DMS Cuts LLM Memory Usage by 8x

Unsloth Kernels: 12x Faster MoE Training, 12GB VRAM

Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs