Uncensored Qwen3.5-35B Maintains Full Performance
HauhauCS releases an uncensored version of Alibaba's Qwen3.5-35B language model that removes content filtering while preserving original capabilities,
Uncensored Qwen3.5-35B Released Without Degradation
What It Is
HauhauCS has released an uncensored variant of Alibaba’s Qwen3.5-35B language model that removes content filtering while maintaining the original model’s capabilities. The release, called Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive, achieved zero refusals across 465 test prompts - a significant benchmark for uncensored models.
This model uses a Mixture of Experts (MoE) architecture with 35 billion total parameters but only activates roughly 3 billion per inference pass, distributed across 256 expert networks. The architecture enables efficient processing while maintaining a 262,000 token context window and multimodal capabilities spanning text, image, and video inputs.
The release includes multiple quantization formats ranging from BF16 (full precision) down to IQ2_M (highly compressed), allowing deployment across different hardware configurations. Unlike some uncensored variants that introduce personality shifts or capability degradation, this version strips safety filters without modifying the underlying model behavior.
Why It Matters
Uncensored models fill a research gap that safety-aligned models cannot address. Academic researchers studying bias, toxicity detection, or adversarial prompting need models that respond to edge cases without refusal. Content moderation teams require tools that can generate examples of problematic content for training classifiers. Fiction writers and game developers often need models that can handle mature themes without constant guardrails.
The zero-refusal benchmark matters because previous uncensored releases often traded safety filtering for reduced coherence or instruction-following ability. Models would either refuse fewer prompts but produce lower-quality outputs, or maintain quality while still blocking certain request types. A model that removes filters without degradation suggests the safety alignment and core capabilities exist in separate layers that can be independently modified.
The MoE architecture makes this particularly relevant for resource-constrained deployments. Activating only 3 billion parameters per forward pass means inference costs remain manageable despite the 35 billion parameter count. Teams can run uncensored models locally without requiring data center infrastructure.
Getting Started
Download the model from https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive and select a quantization level based on available VRAM. The Q4_K_M format provides a reasonable balance between quality and memory requirements for most consumer GPUs.
When running with llama.cpp, the --jinja flag is required for proper prompt formatting:
./main -m qwen3.5-35b-a3b-uncensored.gguf --jinja -p "Your prompt here"
Recommended sampling parameters for optimal output quality:
top_k = 20
repeat_penalty = 1.0
presence_penalty = 1.5
top_p = 0.95
min_p = 0
These settings balance creativity with coherence. The presence_penalty of 1.5 discourages repetitive phrasing while the temperature of 1.0 maintains response diversity without excessive randomness.
Context
Uncensored models exist in a legal and ethical gray area. While they serve legitimate research purposes, they also enable harmful applications. Organizations deploying these models should implement application-layer filtering and access controls rather than relying on model-level safety.
Alternative uncensored models include the Dolphin series and various community fine-tunes of Llama models, though these typically target smaller parameter counts. Qwen3.5’s multimodal capabilities and large context window differentiate it from text-only alternatives.
The MoE architecture introduces complexity compared to dense models. Not all inference frameworks fully support MoE routing, and quantization can affect expert selection patterns. Teams should validate performance on their specific use cases rather than assuming benchmark results transfer directly.
Developers should also consider that “uncensored” doesn’t mean “unbiased.” The model still reflects training data patterns and may exhibit other forms of systematic behavior that differ from explicit refusals. Testing across diverse prompts remains essential for understanding actual model behavior beyond refusal metrics.
Related Tips
Liquid AI MoE Models Run in Browser via WebGPU
Liquid AI's Mixture of Experts language models now run directly in web browsers using WebGPU technology, enabling client-side AI inference without servers or
LLMs Develop Universal Internal Language Representation
Research shows large language models develop a universal internal representation across languages in their middle layers, with identical content in different
LLMs Develop Universal Internal Representation
Research reveals that large language models develop language-agnostic internal representations, where identical content in different languages produces more