chatgpt

Uncensored Gemma 3 Models with o1-Style Reasoning

DavidAU released 20 uncensored Gemma 3 models ranging from 1B to 27B parameters that display o1-style reasoning chains, showing step-by-step thinking processes

Uncensored Gemma 3 Models with o1-Style Reasoning

What It Is

A collection of 20 fine-tuned Gemma 3 models now offers visible reasoning chains similar to OpenAI’s o1, but without content filters and across multiple parameter sizes. Developer DavidAU released these models spanning 1B, 4B, 12B, and 27B parameters, each displaying their step-by-step thinking process before arriving at final answers.

The models underwent a two-stage process: first removing content restrictions through “Heretic’ing,” then fine-tuning on distilled datasets from GPT, Claude, Gemini, and GLM 4.7 Flash using Unsloth. This approach produces models that expose their reasoning tokens while maintaining fewer guardrails than typical commercial offerings. Benchmark results show improvements over baseline Gemma 3 performance across most metrics, with some categories seeing substantial gains.

The full collection lives at https://huggingface.co/collections/DavidAU/gemma-3-reasoning-thinking-models-incl-uncensored, providing researchers and developers with downloadable weights ready for local deployment.

Why It Matters

These models address a gap in the reasoning model landscape. While o1 and similar systems demonstrate impressive problem-solving through chain-of-thought reasoning, they remain closed-source, expensive to run, and filtered. Smaller organizations and individual developers gain access to reasoning capabilities without API costs or content restrictions.

The size range proves particularly significant. A 1B parameter model can run on modest hardware - even some smartphones - while still showing its work. This democratizes access to reasoning models beyond teams with GPU clusters. Research groups studying AI safety, reasoning patterns, or model behavior can examine thinking processes without reverse-engineering black boxes.

Removing content filters creates both opportunities and responsibilities. Academic researchers studying edge cases, red-teaming efforts, and applications requiring unfiltered outputs gain useful tools. However, the lack of guardrails means developers must implement their own safety measures for production deployments.

The benchmark improvements over base Gemma 3 suggest the fine-tuning methodology works. Combining uncensoring with reasoning-focused training appears to enhance rather than degrade model capabilities, challenging assumptions that safety filters and performance exist in zero-sum tension.

Getting Started

Download models directly from the Hugging Face collection at https://huggingface.co/collections/DavidAU/gemma-3-reasoning-thinking-models-incl-uncensored. Start with the 1B or 4B versions for testing on consumer hardware.

Using the Transformers library, loading these models follows standard patterns:


model_name = "DavidAU/Gemma-3-4B-Reasoning-Uncensored"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Solve this step by step: What is 15% of 240?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=500)
print(tokenizer.decode(outputs[0]))

The models output reasoning tokens before final answers, so expect longer responses than typical completion models. Adjust max_length accordingly to capture full reasoning chains.

For local deployment, tools like Ollama or LM Studio can serve these models with minimal configuration. The 1B version requires roughly 2GB RAM, while the 27B needs approximately 54GB, making hardware planning straightforward.

Context

These models compete with other open reasoning approaches like DeepSeek-R1 and various chain-of-thought implementations. The uncensored aspect differentiates them from most alternatives, though projects like WizardLM and Dolphin have explored similar territory.

Limitations remain clear. Smaller parameter counts mean reduced general knowledge compared to frontier models. The 1B version handles basic reasoning but struggles with complex multi-step problems requiring extensive world knowledge. Reasoning quality doesn’t match o1 or Claude 3.5 Sonnet, particularly on mathematical proofs or intricate logical puzzles.

The distillation approach raises questions about reasoning authenticity. Training on outputs from larger models may produce pattern matching rather than genuine reasoning, though visible thinking processes help developers evaluate this distinction. Benchmark improvements suggest practical utility regardless of philosophical debates about machine reasoning.

For teams building local-first applications, these models offer a compelling option. The combination of transparent reasoning, multiple size options, and absence of API dependencies creates flexibility commercial alternatives cannot match.