claude

Qwen 3.5 40B Models Trained on Claude Reasoning

DavidAU has released Qwen 3.5 40B models fine-tuned on synthetic data to replicate Claude's step-by-step reasoning patterns for complex problem-solving and

Qwen 3.5 40B Fine-Tuned on Claude Reasoning

What It Is

DavidAU has released a series of Qwen 3.5 40B models fine-tuned to replicate Claude’s reasoning patterns. The approach involves training Alibaba’s open-source Qwen models on synthetic data that mimics how Anthropic’s Claude breaks down complex problems. Rather than just copying outputs, these models attempt to internalize the step-by-step analytical style that makes Claude effective at technical tasks.

Three variants exist, each with different constraint levels. The base version at https://huggingface.co/DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking maintains standard safety filters. The Heretic variant (https://huggingface.co/DavidAU/Qwen3.5-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking) removes content restrictions. The RoughHouse edition (https://huggingface.co/DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking) pushes further into unfiltered territory.

All three models ship with GGUF quantizations from Mradermacher, making them practical for consumer GPUs. A 40B parameter model typically requires 80GB of VRAM at full precision, but quantized versions can run on hardware with 24-32GB.

Why It Matters

This release demonstrates how reasoning capabilities can transfer between model families through fine-tuning. Claude’s analytical approach has become a benchmark for problem-solving quality, but Anthropic’s models remain closed-source. Training open models to exhibit similar reasoning patterns creates alternatives for developers who need local deployment or custom modifications.

The availability of multiple constraint levels addresses different use cases. Research teams might prefer uncensored variants for studying model behavior without artificial limitations. Production applications typically need the filtered base version to avoid liability issues. Having both options in the same model family simplifies architecture decisions.

GGUF quantization support matters because it determines whether these models remain theoretical or become practical tools. A model that requires $10,000 in GPU hardware serves a different audience than one running on a gaming PC. Mradermacher’s quantizations bridge that gap, expanding the potential user base from well-funded labs to individual developers.

The broader collection of 38+ models at https://huggingface.co/collections/DavidAU suggests systematic experimentation across model sizes. Teams can test whether reasoning patterns scale down to smaller models or require the full 40B parameter count to function effectively.

Getting Started

Download models through the Hugging Face CLI or web interface. For local inference with quantized versions:


model = Llama(
 model_path="Qwen3.5-40B-Claude-4.5-Opus-Q4_K_M.gguf",
 n_ctx=8192,
 n_gpu_layers=35
)

response = model.create_chat_completion(
 messages=[{"role": "user", "content": "Explain gradient descent"}]
)

Adjust n_gpu_layers based on available VRAM. Higher values offload more computation to the GPU, improving speed at the cost of memory usage.

The model repositories include example prompts showing the reasoning process in action. Testing with technical questions reveals how well the Claude-style thinking transfers to the Qwen architecture.

Context

This approach differs from distillation, where a smaller model learns to mimic a larger one’s outputs. Fine-tuning on reasoning patterns attempts to teach the process rather than just the results. Whether this distinction produces meaningfully different behavior remains an empirical question.

Qwen 3.5 already performs well on technical benchmarks without Claude-specific training. The value proposition here centers on reasoning transparency and problem decomposition rather than raw accuracy improvements. Teams should compare outputs between base Qwen and these fine-tuned versions to assess whether the reasoning style justifies the additional training overhead.

Alternative approaches include using Claude’s API directly, training on other reasoning datasets like OpenAI’s o1 traces, or developing custom reasoning frameworks. Each option involves different tradeoffs between control, cost, and capability.

The uncensored variants raise familiar questions about model safety and responsible release practices. These models serve legitimate research purposes but require careful deployment decisions.