
Qwen3-Max Thinking Outperforms Gemini 3 Pro and GPT-5.2 in Reasoning Exams
TL;DR
The new reasoning model Qwen3-Max Thinking, developed by Alibaba Cloud, promises to match and even surpass the capabilities of competing artificial intelligence models Gemini 3 Pro and GPT-5.2.
Qwen3-Max Thinking Stands Out in the AI Market
The new reasoning model Qwen3-Max Thinking, developed by Alibaba Cloud, promises to match and even surpass the capabilities of competing artificial intelligence models Gemini 3 Pro and GPT-5.2. The presentation occurred at a strategic moment, as the company seeks to innovate in the field of language models with an accessible and efficient proposal.
This model was introduced by the Qwen Team, known for delivering robust open-source models. Alibaba Cloud received praise, even from Airbnb CEO Brian Chesky, who commended their solutions as cost-effective alternatives to American models.
The innovation of Qwen3-Max Thinking lies in its architecture, which combines efficiency with autonomy, rewriting the rules of traditional logical reasoning.
Architecture: Redefining the Test Scale
The main innovation of Qwen3-Max Thinking is a technique called Test-time scaling. Unlike models that generate responses linearly, this approach allows the model to trade computational power for intelligence, adopting a strategy of multiple iterations.
Through a unique "take-experience" mechanism, the model refines its knowledge based on previous experiences, allowing:
- Identify Dead Ends: Recognize flaws in reasoning without fully traversing the path.
- Focus Compute: Direct processing power toward unresolved uncertainties.
These improvements have resulted in significant performance leaps, as demonstrated in PhD-level science benchmarks.
Integration with Adaptive Tools
Qwen3-Max Thinking distinguishes itself through the integration of adaptive tools that allow the model to autonomously choose the correct tool for each task, combining logical thinking and practical functions.
The capabilities include:
- Web Search and Extraction: For real-time factual inquiries.
- Memory: Store and recall specific user contexts.
- Code Interpreter: Write and execute snippets of Python.
Benchmark Analysis: Facts and Results
The performance of Qwen3-Max Thinking in rigorous benchmarks, such as HMMT, showed a score of 98.0, surpassing Gemini 3 Pro and other competitors.
Additionally, in the assessment "Humanity's Last Exam", which encompasses complex issues from different disciplines, the model achieved 49.8 points, outperforming Gemini 3 Pro and GPT-5.2.
The Cost of Reasoning: Price Analysis
Alibaba Cloud positioned qwen3-max-2026-01-23 as a premium yet affordable option, with a price of $1.20 per one million input tokens.
Compared to traditional models, this cost is competitive, offering top-tier performance at a reduced price.
Developer Ecosystem
Qwen3-Max Thinking is designed for easy integration, compatible with formats from OpenAI and Anthropic, allowing developers to seamlessly incorporate this new model into their applications.
Final Considerations
The launch of Qwen3-Max Thinking marks an evolution in the AI market, focusing more on reasoning skills and autonomous tool use than merely on intelligent chatbots. With a competitive pricing model, Alibaba Cloud establishes itself as a serious contender.
The provision of free tools for a limited time encourages developers to explore the new capabilities, further intensifying the competition in the AI space.
Content selected and edited with AI assistance. Original sources referenced above.


