Qwen3重磅发布:开源大模型新标杆,双思考模式引领AI新浪潮
Qwen3 is the latest open-source large language model series featuring dual thinking modes (reasoning vs. fast response), support for 119 languages, and enhanced agent capabilities. It includes both dense and MoE architectures with models ranging from 0.6B to 235B parameters, all released under Apache 2.0 license. (Qwen3是最新开源的大型语言模型系列,具备双思考模式(推理与快速响应)、支持119种语言和增强的Agent能力。包含密集和MoE架构,模型参数从0.6B到235B不等,均以Apache 2.0许可证开源。)
Today marks a significant milestone for the Qwen series with the official release of Qwen3, our latest generation of large language models. This new family of models pushes the boundaries of performance, efficiency, and accessibility, offering a compelling suite of options for researchers, developers, and organizations worldwide.
今天,我们正式发布 Qwen3,这是 Qwen 系列大型语言模型的最新一代。这一新模型家族在性能、效率和可访问性方面都实现了突破,为全球的研究人员、开发者和组织提供了一套极具吸引力的选择。
Key Highlights and Competitive Performance
Our flagship model, Qwen3-235B-A22B, demonstrates highly competitive results across a range of benchmarks, including code generation, mathematical reasoning, and general-purpose tasks. It stands shoulder-to-shoulder with other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro.
我们的旗舰模型 Qwen3-235B-A22B 在一系列基准测试中,包括代码生成、数学推理和通用任务,都展现出了极具竞争力的结果。它与 DeepSeek-R1、o1、o3-mini、Grok-3 和 Gemini-2.5-Pro 等其他顶级模型表现相当。
Efficiency is a core theme of Qwen3. The smaller Mixture-of-Experts (MoE) model, Qwen3-30B-A3B, outperforms its predecessor QwQ-32B while activating only 10% of the parameters. Remarkably, even the compact Qwen3-4B model can match the performance of the much larger Qwen2.5-72B-Instruct.
效率是 Qwen3 的核心主题之一。较小的混合专家模型 Qwen3-30B-A3B 在性能上超越了其前身 QwQ-32B,而激活的参数数量仅为后者的 10%。更令人瞩目的是,即便是紧凑型的 Qwen3-4B 模型,其性能也能与规模大得多的 Qwen2.5-72B-Instruct 相匹敌。
Open-Source Commitment and Model Family
In line with our commitment to open research, we are releasing the weights for two MoE models under the permissive Apache 2.0 license:
- Qwen3-235B-A22B: A large-scale model with over 235 billion total parameters and 22 billion activated parameters.
- Qwen3-30B-A3B: A compact MoE model with approximately 30 billion total parameters and 3 billion activated parameters.
秉承我们对开放研究的承诺,我们在宽松的 Apache 2.0 许可下开源了两个 MoE 模型的权重:
- Qwen3-235B-A22B:一个大规模模型,拥有超过 2350 亿总参数和 220 亿激活参数。
- Qwen3-30B-A3B:一个紧凑型 MoE 模型,拥有约 300 亿总参数和 30 亿激活参数。
Additionally, we are open-sourcing six dense models, providing a full spectrum of options for different computational needs: Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B.
此外,我们还开源了六个稠密模型,为不同的计算需求提供了全面的选择:Qwen3-32B、Qwen3-14B、Qwen3-8B、Qwen3-4B、Qwen3-1.7B 和 Qwen3-0.6B。
Model Specifications
Dense Models
| Model | Layers | Heads (Q / KV) | Tie Embedding | Context Length |
|---|---|---|---|---|
| Qwen3-0.6B | 28 | 16 / 8 | Yes | 32K |
| Qwen3-1.7B | 28 | 16 / 8 | Yes | 32K |
| Qwen3-4B | 36 | 32 / 8 | Yes | 32K |
| Qwen3-8B | 36 | 32 / 8 | No | 128K |
| Qwen3-14B | 40 | 40 / 8 | No | 128K |
| Qwen3-32B | 64 | 64 / 8 | No | 128K |
MoE Models
| Model | Layers | Heads (Q / KV) | # Experts (Total / Activated) | Context Length |
|---|---|---|---|---|
| Qwen3-30B-A3B | 48 | 32 / 4 | 128 / 8 | 128K |
| Qwen3-235B-A22B | 94 | 64 / 4 | 128 / 8 | 128K |
Core Innovations of Qwen3
1. Dual-Thinking Modes for Flexible Reasoning
Qwen3 introduces a novel, user-controllable dual-mode reasoning architecture.
Qwen3 引入了一种新颖的、用户可控的双模式推理架构。
- Thinking Mode: In this mode, the model engages in step-by-step reasoning, "thinking out loud" before delivering a final answer. This is ideal for complex problems requiring deep analysis and logical deduction.
思考模式:在此模式下,模型会进行逐步推理,在给出最终答案前“出声思考”。这非常适合需要深入分析和逻辑推导的复杂问题。
- Non-Thinking Mode: This mode provides fast, near-instantaneous responses, optimized for tasks where speed is prioritized over depth of reasoning, such as simple Q&A or information retrieval.
非思考模式:此模式提供快速、近乎即时的响应,针对那些速度优先于推理深度的任务(如简单问答或信息检索)进行了优化。
This flexibility allows users to dynamically allocate "thinking budget" based on the task at hand. Complex queries can be solved with extended reasoning chains, while simple ones receive direct answers without unnecessary latency, enabling a better balance between cost-effectiveness and output quality.
这种灵活性使用户能够根据手头的任务动态分配“思考预算”。复杂的查询可以通过扩展的推理链来解决,而简单的查询则可以直接获得答案,无需不必要的延迟,从而在成本效益和输出质量之间实现更优的平衡。
2. Extensive Multilingual Capabilities
Qwen3 supports an impressive 119 languages and dialects, spanning major language families including Indo-European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, Tai-Kadai, Uralic, Austroasiatic, and others like Japanese and Korean. This broad multilingual proficiency unlocks new possibilities for international applications and makes advanced AI accessible to a global user base.
Qwen3 支持多达 119 种语言和方言,涵盖印欧语系、汉藏语系、亚非语系、南岛语系、德拉威语系、突厥语系、壮侗语系、乌拉尔语系、南亚语系以及其他如日语和韩语等主要语系。这种广泛的多语言能力为国际应用开辟了新的可能性,并使全球用户都能接触到先进的人工智能。
3. Enhanced Agent and Tool-Use Abilities
We have significantly optimized Qwen3's capabilities as an autonomous agent. The models demonstrate improved tool-calling proficiency, better environmental interaction, and strengthened support for the Model Context Protocol (MCP). This makes Qwen3 particularly adept at tasks that require planning, executing multi-step actions, and utilizing external tools and APIs.
我们显著优化了 Qwen3 作为自主智能体的能力。这些模型在工具调用熟练度、环境交互以及对模型上下文协议的支持方面都有所提升。这使得 Qwen3 特别擅长需要规划、执行多步操作以及利用外部工具和 API 的任务。
Technical Deep Dive: Training Methodology
Pre-training at Scale
The foundation of Qwen3's performance is its massive and meticulously curated pre-training dataset. While Qwen2.5 was trained on 18 trillion tokens, Qwen3's dataset has been nearly doubled to approximately 36 trillion tokens, covering all 119 supported languages.
Qwen3 性能的基础是其庞大且精心策划的预训练数据集。Qwen2.5 是在 18 万亿个 token 上训练的,而 Qwen3 的数据集几乎翻了一番,达到约 36 万亿个 token,涵盖了所有 119 种支持的语言。
To construct this corpus, we employed a multi-source strategy:
- Web-scale data collection.
- High-quality text extraction from PDF documents using Qwen2.5-VL, with quality refinement by Qwen2.5.
- Synthetic data generation using domain-expert models (Qwen2.5-Math and Qwen2.5-Coder) to augment mathematical and coding content, including textbooks, Q&A pairs, and code snippets.
为了构建这个语料库,我们采用了多源策略:
- 网络规模的数据收集。
- 使用 Qwen2.5-VL 从 PDF 文档中提取高质量文本,并由 Qwen2.5 进行质量精炼。
- 使用领域专家模型生成合成数据,以增加数学和代码内容,包括教科书、问答对和代码片段。
The pre-training was conducted in three strategic phases:
- Phase S1: Basic training on over 30T tokens with a 4K context length to establish fundamental language skills.
- Phase S2: Training on an additional 5T tokens with an improved dataset enriched in STEM, programming, and reasoning tasks.
- Phase S3: Context length extension to 32K using high-quality long-context data.
预训练分三个战略阶段进行:
- 阶段 S1:在超过 30T token、4K 上下文长度上进行基础训练,以建立基本的语言技能。
- 阶段 S2:在额外的 5T token 上进行训练,使用在 STEM、编程和推理任务方面得到增强的改进数据集。
- 阶段 S3:使用高质量的长上下文数据将上下文长度扩展到 32K。
The results are striking: Qwen3 dense base models achieve performance comparable to their larger Qwen2.5 counterparts (e.g., Qwen3-1.7B vs. Qwen2.5-3B), with even superior results in STEM, coding, and reasoning. The MoE base models match the performance of Qwen2.5 dense models while activating only ~10% of the parameters, leading to substantial savings in training and inference costs.
结果令人瞩目:Qwen3 稠密基础模型的性能与规模更大的 Qwen2.5 对应模型相当,在 STEM、编码和推理领域甚至表现更优。MoE 基础模型仅激活约 10% 的参数就能达到 Qwen2.5 稠密模型的性能,从而在训练和推理成本上实现了大幅节省。
Post-Training for Instruction Following and Reasoning
To develop the versatile, instruction-tuned models, we implemented a sophisticated four-stage post-training pipeline designed to seamlessly integrate thinking and non-thinking capabilities:
为了开发多功能、经过指令调优的模型,我们实施了一个复杂的四阶段后训练流程,旨在无缝集成思考和非思考能力:
- Long Chain-of-Thought (CoT) Cold Start: Supervised fine-tuning on diverse long CoT data across mathematics, code, logic, and STEM to instill basic reasoning skills.
长思维链冷启动:在涵盖数学、代码、逻辑和 STEM 的多样化长思维链数据上进行监督微调,以灌输基本的推理技能。
- Long CoT Reinforcement Learning (RL): Large-scale RL with rule-based rewards to enhance the model's exploration and in-depth reasoning capabilities.
长思维链强化学习:进行大规模强化学习,使用基于规则的奖励来增强模型的探索和深度推理能力。
- Thinking Mode Fusion: Fine-tuning on a blended dataset of CoT data and standard instruction-tuning data to integrate the non-thinking mode into the thinking-capable model.
思维模式融合:在思维链数据和标准指令微调数据的混合数据集上进行微调,将非思考模式整合到具备思考能力的模型中。
- General RL: Final RL fine-tuning across 20+ general domains (instruction following, formatting, agent skills) to polish general capabilities and correct undesirable behaviors.
通用强化学习:在 20 多个通用领域进行最终的强化学习微调,以完善通用能力并纠正不良行为。
(Due to the extensive technical content, this post will now transition to practical implementation guidance. The following sections cover how to get started with Qwen3.)
(由于技术内容广泛,本文现在将转向实践指导。以下部分介绍如何开始使用 Qwen3。)
Getting Started with Qwen3
The post-trained models (e.g., Qwen3-30B-A3B) and their base counterparts are now available on platforms like Hugging Face, ModelScope, and Kaggle.
经过后训练的模型及其基础版本现已在 Hugging Face、ModelScope 和 Kaggle 等平台上开放使用。
Quick Inference Example
Here is a standard example of using Qwen3-30B-A3B with Hugging Face Transformers, demonstrating the thinking mode:
以下是一个使用 Hugging Face Transformers 运行
Qwen3-30B-A3B的标准示例,展示了思考模式:
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-30B-A3B"
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switch between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Conduct text generation
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# Parse thinking content and final answer
try:
# Find the index of the closing think tag (token_id 151668 for `</think>`)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("Thinking content:", thinking_content)
print("Final answer:", content)
To disable thinking mode for a faster response, simply set enable_thinking=False in the apply_chat_template call.
要禁用思考模式以获得更快的响应,只需在
apply_chat_template调用中设置enable_thinking=False。
Deployment Recommendations
For scalable API serving, we recommend:
- SGLang (
>=0.4.6.post1):python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B --reasoning-parser qwen3 - vLLM (
>=0.8.4):vllm serve Qwen/Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1- SGLang (
>=0.4.6.post1):python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B --reasoning-parser qwen3 - vLLM (
>=0.8.4):vllm serve Qwen/Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1
- SGLang (
Omit the --reasoning-parser argument to serve the model in non-thinking mode.
省略
--reasoning-parser参数即可在非思考模式下运行模型。
For local experimentation and development, excellent options include Ollama (ollama run qwen3:30b-a3b), LM Studio, llama.cpp, and KTransformers.
对于本地实验和开发,优秀的选择包括 Ollama (
ollama run qwen3:30b-a3b)、LM Studio、llama.cpp 和 KTransformers。
The Road Ahead
Qwen3 represents a significant milestone on our journey toward Artificial General Intelligence (AGI). By scaling pre-training and reinforcement learning, integrating flexible reasoning modes, and expanding multilingual support, we have created a more capable and accessible model family.
Qwen3 是我们迈向通用人工智能旅程中的一个重要里程碑。通过扩大预训练和强化学习的规模、集成灵活的推理模式以及扩展多语言支持,我们创建了一个能力更强、更易访问的模型家族。
Looking forward, we plan to enhance our models across multiple dimensions: optimizing architectures and training methods, scaling data and model size, extending context length, broadening modalities, and advancing reinforcement learning with environmental feedback for long-horizon reasoning. We believe the field is transitioning from an era of training models to an era of training agents, and our next iterations will aim to bring meaningful advancements to both work and life.
展望未来,我们计划从多个维度提升我们的模型:优化架构和训练方法、扩展数据和模型规模、延长上下文长度、拓宽模态范围,并利用环境反馈推进强化学习以实现长周期推理。我们相信,该领域正从训练模型的时代过渡到训练智能体的时代,我们的下一代迭代将致力于为工作和生活带来有意义的进步。
We invite the global community to explore Qwen3. Try the models on our Qwen Chat web interface (chat.qwen.ai) or mobile app, download the weights, and build the next generation of AI applications. We are excited to see what you create.
我们邀请全球社区共同探索 Qwen3。您可以在我们的 Qwen Chat 网页界面或移动应用程序中试用模型,下载权重,并构建下一代人工智能应用。我们期待看到您的创造。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。