GLM 4.7 Flash Uncensored: Fast Local AI Model
GLM 4.7 Flash Uncensored is a fast, lightweight AI model designed for local deployment, offering unrestricted conversational capabilities and quick response
Someone fine-tuned the new GLM 4.7 Flash model to remove content restrictions while keeping performance intact. Pretty interesting for local AI setups.
The model runs on only ~3B active params (from a 30B MoE architecture), so inference is surprisingly fast. Two versions available - Balanced for coding tasks and Aggressive for everything else.
Recommended settings for llama.cpp:
--temp 1.0 --top-p 0.95 --min-p 0.01 --jinja
For tool use, switch to --temp 0.7 --top-p 1.0 and keep repeat penalty at 1.0.
Works with llama.cpp, LM Studio, Jan, and koboldcpp. Currently has chat template issues with Ollama though.
Download links with Q8_0, Q6_K, and Q4_K_M quants:
- https://huggingface.co/HauhauCS/GLM-4.7-Flash-Uncensored-HauhauCS-Balanced
- https://huggingface.co/HauhauCS/GLM-4.7-Flash-Uncensored-HauhauCS-Aggressive
The creator claims it’s effectively lossless compared to
Related Tips
"Take a Deep Breath" Boosts AI Accuracy on Hard Tasks
Research reveals that adding the phrase 'take a deep breath' to AI prompts significantly improves performance on complex reasoning tasks by encouraging more
Free Tool Tests Qwen's Voice Cloning (No GPU Needed)
This article explores a free tool that tests Qwen's voice cloning technology without requiring GPU hardware, making advanced AI voice synthesis accessible to
ACE-Step 1.5: Fast Open-Source Music Generator
ACE-Step 1.5 is a fast open-source music generation model that creates high-quality audio from text prompts, offering efficient performance and accessibility