Someone Has Made an Uncensored Version of QwQ-32B-Preview, And It Is Awesome

Sebastian Petrus
4 min readDec 3, 2024

--

The AI community has recently seen the emergence of an uncensored version of QwQ through abliteration. Let’s explore what this means for users and developers.

Hey, if you are working with APIs, Apidog is here to make your life easier. It’s an all-in-one API development tool that streamlines the entire process — from design and documentation to testing and debugging.

Apidog — the all-in-one API development tool

QwQ-32B demonstrates impressive capabilities across various benchmarks, particularly in reasoning tasks. The model has notably outperformed several major competitors including GPT-4o mini, GPT-4o preview, and Claude 3.5 Sonnet.

Let’s comparing Performance of QwQ-32B-Preview vs GPT-4o-mini

QwQ-32B demonstrates remarkable performance across key benchmarks, particularly in reasoning and mathematical tasks:

Mathematical Reasoning

  • Achieved 90.6% pass@1 accuracy on MATH-500, surpassing OpenAI o1-preview (85.5%)
  • Scored 50.0% on AIME, significantly higher than o1-preview (44.6%) and GPT-4o (9.3%)
  • Shows exceptional strength in complex mathematical computations

General Question Answering

  • Scored 65.2% on GPQA, nearly matching Claude 3.5 Sonnet (65%)
  • Slightly behind o1-preview (72.3%) but maintains competitive performance
  • Demonstrates strong analytical capabilities

Technical Performance

  • LiveCodeBench score of 50.0%, competitive but below o1-mini (58.0%)
  • Handles up to 32,768 tokens in context window
  • Uses advanced architecture features like RoPE and SwiGLU

QwQ-32B-Preview-abliterated: QwQ, But Uncensored

QwQ uncensored comes in two primary variants. The base version lives on HuggingFace under huihui-ai’s repository. The quantized version, created by mradermacher, provides multiple compression options:

FormatSize (GB)PerformanceQ2_K12.4Fastest, lowest qualityQ4_K_M20.0Balanced speed and qualityQ6_K27.0Higher quality, slowerQ8_034.9Best quality, most demanding

The model maintains its core functionality but exhibits unique quirks. It frequently switches between English and Chinese during conversations. This language-switching behavior stems from QwQ’s bilingual training foundation.

Users have discovered several methods to work around the model’s limitations:

Name Change Technique

The model responds differently when its identifier changes from ‘assistant’ to another name. This simple modification often bypasses built-in restrictions.

JSON Schema Approach

Fine-tuning on specific JSON output formats can effectively remove alignment constraints. The model learns to prioritize schema compliance over content restrictions.

Hardware Requirements to Run QwQ Locally

Running QwQ uncensored demands specific hardware configurations:

  • Minimum 8GB VRAM for basic versions
  • 12–16GB VRAM recommended for optimal performance
  • Storage requirements vary by quantization level
  • CPU requirements depend on chosen format

After making sure your device is capable of running QwQ locally, you can download it from these sources:

Alternatively, if you prefer to use Ollama, you can install it via this command:

ollama run huihui_ai/qwq-abliterated:32b-preview-Q5_K_M

You can read more about the method to remove the refusals with transformers from here.

Use QwQ Online with Anakin AI

If you are seeking an All-in-One AI platform that manages all your AI subscriptions in one place, including

  • Virtually any LLMs, such as: Claude 3.5 Sonnet, Google Gemini, GPT-40 and GPT-o1, Qwen Models & Other Open Source Models.
  • You can even use the uncensored Dolphin Mistral & Llama models!
  • Best AI Image Generation Models such as: FLUX, Stable Diffusion 3.5, Recraft
  • You can even use AI Video Generation Models such as Minimax, Runway Gen-3 and Luma AI with Anakin AI

Conclusion

QwQ uncensored represents an interesting experiment in model modification. While it successfully removes many artificial restrictions, it introduces new challenges. The ongoing work to refine these models demonstrates the community’s commitment to advancing AI capabilities while maintaining useful functionality.The success of this implementation highlights both the possibilities and limitations of current abliteration techniques. Users must carefully weigh their specific needs against the trade-offs involved in using an abliterated model versus finding alternative solutions through prompt engineering or other fine-tuning approaches.

--

--

Sebastian Petrus
Sebastian Petrus

Written by Sebastian Petrus

Asist Prof @U of Waterloo, AI/ML, e/acc

Responses (2)