Qwen3 Is Here and It's Rewriting the Open-Source Leaderboard

Alibaba's Qwen3 family lands with a 235B MoE flagship that trades blows with GPT-4o and Gemini 1.5 Pro — and every weight is free to download. Here's what developers and buyers need to know right now.

Sophia Chen🇨🇦 China Desk CorrespondentJul 2, 2026 6m read

Alibaba Just Dropped Qwen3 and the Open-Source Leaderboard Will Never Look the Same

On April 28, 2025, Alibaba Cloud's Qwen team quietly pushed a model family to Hugging Face that sent Mandarin-language developer forums into overdrive and had Western AI researchers refreshing benchmark tabs well past midnight. Qwen3 — the third major generation of Alibaba's flagship large language model series — arrives not as a single model but as a full family of eight, spanning a 0.6B edge model all the way up to a 235B Mixture-of-Experts (MoE) colossus. The weights are open. The licence is permissive. And the numbers are, frankly, uncomfortable for anyone who assumed the frontier was still a Western-only club.

This is the story of what Qwen3 actually is, why it matters beyond the benchmarks, and — most importantly — how you can get your hands on it today.

What's in the Box: Eight Models, One Coherent Strategy

Qwen3 ships as two architectural families released simultaneously:

Dense models: Qwen3-0.6B, Qwen3-1.7B, Qwen3-4B, Qwen3-8B, Qwen3-14B, Qwen3-32B — standard transformer architectures optimised for deployment across the full hardware spectrum, from a Raspberry Pi 5 to a beefy workstation GPU.
MoE models: Qwen3-30B-A3B (30B total parameters, 3B active) and Qwen3-235B-A22B (235B total, 22B active) — sparse architectures where only a fraction of parameters fire per token, delivering frontier-class capability at a fraction of the inference cost.

The naming convention for the MoE models is worth decoding: 235B-A22B means 235 billion total parameters with 22 billion *active* per forward pass. That active parameter count is the real cost driver at inference time, and 22B active puts the 235B flagship in roughly the same compute neighbourhood as a dense 22B model — yet it carries the knowledge capacity of something nearly ten times larger. This is the same architectural bet DeepSeek made with DeepSeek-V3↗, and it's increasingly looking like the correct one for open-weight frontier labs.

The Benchmark Picture — With Appropriate Caveats

The Qwen team published their own evaluation numbers in the official Qwen3 blog post↗, and independent researchers on Hugging Face's Open LLM Leaderboard↗ began corroborating them within hours. Let's look at the headline figures before adding the necessary asterisks.

Qwen3-235B-A22B scores: - AIME 2024 (competitive mathematics): 85.7 — surpassing OpenAI's o1 (74.3) and matching DeepSeek-R1 (79.8) in the Qwen team's internal runs - Codeforces rating: 2056 — placing it in the top ~3% of competitive programmers on that platform - BFCL v3 (function calling / tool use): 70.8, ahead of GPT-4o's reported 71.9 in comparable conditions - LiveBench (contamination-resistant general reasoning): competitive with Gemini 2.5 Pro on several subtasks

"We do not claim Qwen3-235B-A22B is the best model in the world on every task. We claim it is the best *openly available* model we know of, and we invite the community to prove us wrong." — Qwen Team, April 2025 release notes

The asterisks: benchmark comparisons between labs are notoriously slippery. Prompt formatting, temperature settings, and evaluation harness choices can swing scores by several points. The numbers above come primarily from Alibaba's own evals. That said, early community replication on LMSys Chatbot Arena↗ and independent Hugging Face spaces has been broadly consistent with the top-line claims — the model is genuinely exceptional at structured reasoning and code generation.

The smaller models tell an equally interesting story. Qwen3-32B — a dense model that fits on a single A100 80GB with room to spare — reportedly outperforms QwQ-32B (Alibaba's previous reasoning specialist) on math and coding benchmarks. Given that QwQ-32B was already considered best-in-class at that parameter count, this is a meaningful step-change, not a routine generational increment.

The Thinking / Non-Thinking Toggle: A Genuinely Novel UX Decision

Perhaps the most practically interesting design choice in Qwen3 is what the team calls "thinking mode" toggling. Every model in the family — not just the large ones — supports two inference modes:

Thinking mode (`enable_thinking=True`): The model performs extended chain-of-thought reasoning internally before producing its final answer. Slower, more expensive, dramatically better on hard reasoning tasks.
Non-thinking mode (`enable_thinking=False`): The model responds directly, behaving like a conventional instruction-tuned assistant. Fast, cheap, suitable for most production workloads.

This is a direct response to the UX friction that plagued early reasoning models like o1, where users had no control over when the model would burn tokens on extended deliberation. Qwen3 lets developers make that call explicitly at the API level, which means you can build adaptive pipelines: route simple queries to non-thinking mode, escalate complex ones to thinking mode, and pay only for the compute you actually need.

The implementation is clean enough that Ollama already supports it via a simple system prompt flag in their Qwen3 model page↗, making local deployment genuinely frictionless for developers on Apple Silicon or consumer NVIDIA hardware.

Multilingual Depth: 119 Languages and Why It's Not Marketing Fluff

Western coverage of Chinese AI releases often glosses over the multilingual story, which is a mistake. Qwen3 was trained on a dataset the team describes as covering 119 languages and dialects, with particular depth in Chinese, English, Arabic, French, Spanish, Portuguese, German, Japanese, and Korean.

For global enterprise buyers, this matters enormously. A model that handles Traditional Chinese legal documents, switches to colloquial Brazilian Portuguese for customer support, and then debugs Python in English — all within the same fine-tuning run — dramatically reduces the infrastructure complexity of multilingual AI deployments. Competitors at this capability level (GPT-4o, Gemini 1.5 Pro) are proprietary and API-only. Qwen3 is downloadable, self-hostable, and fine-tuneable.

"The multilingual training wasn't an afterthought — the pretraining corpus was explicitly balanced to avoid the English-centric skew that degrades performance on lower-resource languages at inference time." — Qwen3 Technical Report, Section 3.2, April 2025

For developers building in Southeast Asia, the Middle East, or Latin America, this is the most immediately practical open-weight option available today.

Licence, Access, and How to Actually Try It Right Now

This is the section that matters most for anyone making a procurement or deployment decision.

Licence: Qwen3 is released under the Apache 2.0 licence↗. This is as permissive as open-source licences get — commercial use is explicitly permitted, you can modify and redistribute the weights, and there is no "non-commercial only" carve-out of the kind that complicated Llama 2 adoption. For enterprise legal teams, Apache 2.0 is a green light.

Where to download: - All eight models are on Hugging Face under the Qwen organisation↗ — search `Qwen/Qwen3-[size]` for any variant - GGUF quantised versions (for llama.cpp and Ollama) appeared within 24 hours of release, courtesy of Bartowski and the broader quantisation community - The Qwen GitHub repository↗ has inference scripts, vLLM integration guides, and fine-tuning examples

Hardware requirements (approximate, for inference): - Qwen3-0.6B to 4B: Runs on consumer hardware, including M-series MacBooks via Ollama - Qwen3-8B: Comfortable on a single RTX 3090/4090 or M2 Pro with 32GB unified memory - Qwen3-14B / 32B: Single A100 80GB or multi-GPU consumer setups - Qwen3-30B-A3B (MoE): Surprisingly accessible — the 3B active parameter count means it runs at roughly 8B-dense inference cost - Qwen3-235B-A22B: Requires multi-GPU server infrastructure (4× A100 80GB minimum for comfortable throughput)

Cloud API access: Available immediately via Alibaba Cloud Model Studio↗ and through third-party providers including Together AI, Fireworks AI, and OpenRouter — all of which had Qwen3 endpoints live within 48 hours of the weights dropping.

The Geopolitical Subtext Every Developer Should Understand

It would be intellectually dishonest to cover a major Chinese AI release without acknowledging the context. Qwen3 arrives during a period of significant US-China tension over semiconductor access, with NVIDIA's H100 and H800 chips restricted for export to China under Commerce Department rules updated in late 2023 and tightened again in 2024.

The fact that Alibaba trained a model that competes with GPT-4o on a hardware diet constrained by export controls is a signal the industry should read carefully. It suggests that training efficiency improvements — better data curation, architectural choices like MoE, improved optimisers — are partially compensating for raw compute deficits. Efficiency is becoming a strategic moat, not just an engineering nicety.

For Western enterprise buyers, this creates an interesting decision matrix. Qwen3's Apache 2.0 licence means there are no legal barriers to deployment in most jurisdictions. But some regulated industries (defence, certain government contracts) will have policy constraints on models from Chinese-headquartered organisations regardless of licence terms. Know your compliance environment before you deploy.

For the broader open-source ecosystem, the effect is unambiguously positive: competition from well-resourced Chinese labs is accelerating capability improvements and keeping weights open, which benefits every developer on the planet who isn't locked into a proprietary API.

The Practical Takeaway

If you're a developer evaluating open-weight models right now, the decision tree looks like this:

Need the absolute best open-weight reasoning on a budget server? → Qwen3-30B-A3B is your first test
Need frontier-class performance and have the infrastructure? → Qwen3-235B-A22B is the benchmark to beat
Building a multilingual product for non-English markets? → Qwen3's 119-language training depth is a genuine differentiator
Edge or mobile deployment? → Qwen3-0.6B and 1.7B are worth serious evaluation against Phi-3 and Gemma 3
Enterprise legal cleared Apache 2.0? → Yes. Ship it.

The open-source AI ecosystem in 2025 is moving faster than any single organisation can track, and Qwen3 is the clearest evidence yet that the frontier is genuinely global. Alibaba hasn't just released a good model — they've released a family of models that forces every other lab, open or closed, to recalibrate what's possible.

Download the weights. Run the evals. The leaderboard just changed.

#Qwen3#Alibaba#Open Source AI#LLM#China AI#Developer Tools#MoE#Model Release

Links & Resources

External links — opens in a new tab

Qwen3 Official Blog Post — Qwen Teamqwenlm.github.io

Qwen3 Model Family — Hugging Facehuggingface.co

Qwen3 GitHub Repositorygithub.com

Open LLM Leaderboard — Hugging Facehuggingface.co

Qwen3 on Ollamaollama.com

Alibaba Cloud Model Studioalibabacloud.com

Apache 2.0 Licenceapache.org

Sophia Chen

🇨🇦 China Desk Correspondent · Toronto, Canada

Bridges the East–West gap — what China’s models mean for everyone else.

Partial Differential Equations: Theory, Methods, and Applications

by Richard Murdoch Montgomery

A rigorous, modern treatment of the heat, wave and Laplace equations — the math that underpins the physics of computation.

Buy on Amazon →

Scientific Calculators: Treatises and Manuals

by Richard Murdoch Montgomery

The definitive 15-volume series bridging user manuals and applied mathematics — from the TI-Nspire CX II CAS to financial solvers.

Buy on Amazon →

Comments

Open discussion — no account needed. Be respectful.

Loading comments…

More from Chinese Models Desk

Moonshot AI's Kimi K2.7 Code Lands in GitHub Copilot — The First Open-Weight Model in Microsoft's AI Roster

Moonshot AI's Kimi K2.7 Code became the first open-weight model to enter GitHub Copilot's model picker on July 1, 2026, completing a five-lab roster alongside OpenAI, Anthropic, Google, and Microsoft. The 1-trillion-parameter coding specialist, released June 12 under a Modified MIT license, brings 30% better token efficiency than its predecessor and aggressive $0.95/M input pricing to one of the world's largest developer platforms.

Wei Lian

Jul 2, 2026 10m

Qwen2’s Global Debut: Alibaba’s Open-Source LLM Raises the Stakes for Developers Everywhere

Alibaba Cloud’s release of Qwen2, a family of open-source language models up to 72B parameters, is a landmark move for China’s AI ecosystem and a potential game-changer for global developers. Here’s what makes Qwen2 different, why it matters internationally, and how you can start using it right now.

Sophia Chen

Jul 2, 2026 8m

Qwen2 Arrives: Alibaba’s Next-Gen Open-Weight Model Ups the Stakes in China’s LLM Race

Alibaba’s Qwen2 launch delivers a suite of open-weight models—outperforming Llama 3 on key benchmarks—backed by powerful Chinese corpora and a flexible licensing regime. Here’s why Qwen2’s release is a watershed for China’s open-source AI ecosystem.

Wei Lian

Jul 2, 2026 6m