GPT-OSS 120B Uncensored Model Released Without Filters
A community developer released an uncensored 120-billion parameter language model that reportedly processes queries without content filtering or safety
GPT-OSS 120B Uncensored: Zero Refusals Reported
What It Is
A community developer has released an uncensored variant of GPT-OSS 120B, a large language model that reportedly processes queries without content filtering or safety refusals. The model, available at https://huggingface.co/HauhauCS/GPTOSS-120B-Uncensored-HauhauCS-Aggressive, uses a mixture-of-experts (MoE) architecture with 117 billion total parameters but only 5.1 billion active during inference. This design allows the model to deliver performance comparable to much larger models while requiring significantly less computational resources during operation.
The model supports a 128K token context window and runs on standard inference frameworks including llama.cpp, LM Studio, and Ollama. Despite its size, the entire model fits in a single 61GB file, making it deployable on high-end consumer hardware or a single H100 GPU. The “uncensored” designation indicates the model has been fine-tuned to remove content restrictions typically found in commercial language models.
Why It Matters
This release represents a growing trend in open-source AI development where researchers and developers create alternatives to commercially restricted models. Organizations conducting security research, content moderation system testing, or adversarial AI studies often need models that don’t refuse certain query types. Academic institutions studying model behavior, bias, or safety mechanisms benefit from having unrestricted baselines for comparison.
The MoE architecture makes this particularly significant - by activating only 5.1B parameters per forward pass, the model achieves efficiency that makes large-scale language modeling accessible to smaller research teams and independent developers. This democratization of AI capabilities shifts power away from organizations that can afford to run 100B+ parameter dense models continuously.
However, the zero-refusal claim raises questions about responsible AI deployment. Models without safety guardrails can generate harmful content, misinformation, or assist in malicious activities. The open-source community continues debating whether unrestricted model access advances research or creates unnecessary risks.
Getting Started
Developers can download the model from https://huggingface.co/HauhauCS/GPTOSS-120B-Uncensored-HauhauCS-Aggressive and run it using llama.cpp with specific configuration requirements. The model uses a Harmony response format that requires the Jinja templating flag:
Inference parameters must be carefully configured to avoid broken outputs. The recommended settings are:
--temp 1.0 --top-k 40
Critically, users should disable top_p, min_p, and repetition penalties, as many inference clients enable these by default. Incorrect sampling parameters will produce garbled or nonsensical responses.
For systems with limited VRAM, llama.cpp supports offloading MoE layers to CPU using the --n-cpu-moe N flag, where N specifies the number of expert layers to run on CPU rather than GPU. This allows deployment on hardware configurations that couldn’t otherwise handle a 61GB model.
Context
The uncensored model landscape includes several alternatives at different scales. The same creator maintains smaller variants at 20B, 8B, and 4.7B parameters for developers with tighter resource constraints. These smaller models trade capability for accessibility while maintaining the same unrestricted approach.
Commercial alternatives like GPT-4 and Claude prioritize safety through extensive RLHF training and content filtering, making them unsuitable for certain research applications. Open models like Llama 2 and Mistral occupy middle ground with some safety training but more permissive licensing.
The MoE architecture itself presents tradeoffs - while only 5.1B parameters activate per token, the full 117B parameter set must still be loaded into memory. This differs from dense models where parameter count directly correlates with memory requirements during inference. Developers should verify their hardware can accommodate the full model size regardless of active parameter efficiency.
Testing claims of “zero refusals” requires careful methodology. Models can technically respond to any query while still producing evasive, unhelpful, or incorrect outputs. The practical utility depends heavily on the specific use case and whether the model’s training data supports the requested task domain.
Related Tips
Liquid AI MoE Models Run in Browser via WebGPU
Liquid AI's Mixture of Experts language models now run directly in web browsers using WebGPU technology, enabling client-side AI inference without servers or
LLMs Develop Universal Internal Language Representation
Research shows large language models develop a universal internal representation across languages in their middle layers, with identical content in different
LLMs Develop Universal Internal Representation
Research reveals that large language models develop language-agnostic internal representations, where identical content in different languages produces more