World of AI
Posts
Qwen 3: Alibaba’s Open-Source LLM Leap

Qwen 3: Alibaba’s Open-Source LLM Leap

Alibaba’s Qwen 3 is a powerful new open-source LLM series featuring both dense and Mixture-of-Experts models that excel in coding, reasoning, and multilingual tasks. With impressive performance benchmarks, extended context lengths, and broad accessibility, Qwen 3 sets a new standard for open-source AI development.

World of AI
May 13, 2025

In partnership with

World of AI | Edition # 37

Qwen 3: Alibaba’s Open-Source LLM Leap

The world of open-source large language models (LLMs) continues to evolve rapidly—and the latest entrant, Qwen 3 from Alibaba, makes a compelling case for the top spot. Unveiled in late April 2025, this new series of models packs serious power across a range of tasks, with a particular emphasis on coding, reasoning, and multilingual support. Here's an in-depth look at what makes Qwen 3 a standout.

A Diverse Model Lineup

Alibaba has released a total of eight Qwen 3 models, including six dense models ranging from 600 million to 32 billion parameters, and two Mixture-of-Experts (MoE) models. The flagship model boasts a massive 235 billion parameters, but activates only 22 billion during inference—striking a balance between scale and efficiency. There's also a 30 billion parameter MoE model with 3 billion active parameters for those needing something lighter but still powerful.

All models are released under the Apache 2.0 license, making them highly accessible for both research and commercial use.

Exceptional Coding Performance

Where Qwen 3 truly shines is in code generation and problem-solving benchmarks. It outperforms many well-known models like GPT-4, DeepSeek, and Gemma in competitive coding tests such as Codeforces, LiveCodeBench, and AMi.

However, the benchmark comparisons notably exclude Anthropic’s Claude Sonnet series (e.g., Sonnet 3.5 and 3.7), which are known to be strong in similar tasks—an omission worth keeping in mind when interpreting performance claims.

Efficient and Scalable with MoE

Thanks to the Mixture-of-Experts architecture, Qwen 3 models can reduce inference cost without sacrificing performance. Only a small fraction of the total parameters are activated per query, making even the larger models viable for a broader range of hardware.

Extended Context and Easy Access

Qwen 3 supports context lengths ranging from 32,000 to 128,000 tokens, with the 128K context available on models larger than 8 billion parameters—ideal for long documents or complex multi-step reasoning tasks.

These models are widely accessible via platforms like Hugging Face, ModelScope, Kaggle, and can be run locally using tools such as Ollama, LM Studio, MLX, Llama.cpp, and K Transformers.

Hybrid Thinking Mode

Qwen 3 introduces Alibaba’s first hybrid thinking model. This allows it to reason step-by-step for tasks that require logical depth while also offering instant answers for simpler queries. Users can adjust a "thinking budget" to trade off between speed and accuracy—similar to how agents plan and optimize workflows.

Language Support and Agentic Capabilities

The models support 119 languages and dialects, significantly broadening their usability across the globe. Additionally, Qwen 3 is optimized for agentic tasks like tool use and multi-step planning (MCP). Demonstrations show how the model can decide when and how to call external tools to solve complex tasks.

Scaled-Up Training and Synthetic Data

Training for Qwen 3 was conducted on an impressive 36 trillion tokens, doubling the volume used for Qwen 2.5. The data was sourced from the web and PDF-style documents, with help from Qwen 2.5VL and Qwen 2.5 for content extraction and quality filtering. For math and code training, synthetic data generation was also employed using Qwen’s previous models.

Focused on Text—Not Multimodal (Yet)

Currently, Qwen 3 is a text-in, text-out model, with no multimodal capabilities such as image or audio input. However, given the rapid development of this series, future multimodal extensions may not be far off.

Beats LLaMA 4 in Benchmarks

In one of the most talked-about comparisons, Qwen 3’s flagship model outperforms LLaMA 4 Maverick in nearly every category: general reasoning, mathematics, multilingual tasks, and coding. The only exception was a slight underperformance on one multilingual benchmark.

This has sparked lively discussion in online communities—with some users jokingly declaring “RIP LLaMA 4, April 2025 to April 2025.”

First Impressions and How to Try It

Initial hands-on tests show impressive results, especially for real-world use cases like web development tasks. Users can try the models directly through chat.qwen.ai or download them via Hugging Face. Local deployment is also straightforward, especially with tools like Ollama, which simplifies installation and resource management.

Final Thoughts

With Qwen 3, Alibaba has set a new bar for open-source LLMs. Between the advanced architecture, code-first training, and broad accessibility, it’s clear that this release isn’t just a step forward—it’s a leap.

As the pace of innovation accelerates, Qwen 3 stands as a reminder: in the world of AI, today’s state-of-the-art can become yesterday’s news overnight.

Receive Honest News Today

Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.

Reply

or to participate.