GEO

阿里巴巴Qwen3发布:混合推理架构与119种语言支持的新一代开源大模型

2026/1/24
阿里巴巴Qwen3发布:混合推理架构与119种语言支持的新一代开源大模型
AI Summary (BLUF)

Alibaba's Qwen3 is a new-generation large language model featuring a hybrid reasoning mode (thinking vs. non-thinking), support for 119 languages, and optimized agent capabilities. It includes specialized models like Qwen3-Embedding and Qwen3-Reranker for text representation and retrieval tasks, all open-sourced under Apache 2.0 for commercial use. 阿里巴巴推出的新一代大型语言模型Qwen3,具备混合推理模式(思考与非思考)、支持119种语言、优化Agent能力,并包含Qwen3-Embedding和Qwen3-Reranker等专用模型,全部采用Apache 2.0协议开源,可免费商用。

Introduction

Alibaba has unveiled Qwen3, the latest iteration in its Qwen series of large language models (LLMs). This new generation marks a significant leap forward, introducing a hybrid reasoning architecture, dramatically expanded multilingual capabilities, and a suite of specialized models designed for tasks like text embedding and reranking. Notably, the entire Qwen3 series is released under the permissive Apache 2.0 license, making it freely available for commercial and research use worldwide. This article provides a comprehensive technical overview of Qwen3's key features, architectural innovations, and performance benchmarks.

阿里巴巴近日发布了其Qwen系列大型语言模型(LLM)的最新版本——Qwen3。这一新一代模型标志着一次重大飞跃,引入了混合推理架构、显著扩展的多语言能力,以及一套专为文本嵌入和重排序等任务设计的模型。值得注意的是,整个Qwen3系列均采用宽松的Apache 2.0许可证发布,可供全球范围内的商业和研究免费使用。本文将对Qwen3的核心特性、架构创新和性能基准进行全面技术概述。

Key Features and Capabilities

1. Hybrid Reasoning Modes

Qwen3 introduces a novel dual-mode reasoning system, allowing users to tailor the model's approach based on task complexity.

  • Thinking Mode: In this mode, the model engages in step-by-step, chain-of-thought reasoning before delivering a final answer. This is ideal for complex problem-solving, mathematical tasks, and intricate logical reasoning, where accuracy and depth are paramount.
  • Non-Thinking Mode: This mode provides fast, near-instantaneous responses. It is optimized for straightforward queries, simple Q&A, and tasks where low latency is more critical than deep deliberation.

1. 混合推理模式

Qwen3引入了一种新颖的双模式推理系统,允许用户根据任务复杂性调整模型的处理方式。

  • 思考模式:在此模式下,模型会进行逐步的思维链推理,然后给出最终答案。这非常适合复杂的解决问题、数学任务和精细的逻辑推理,其中准确性和深度至关重要。
  • 非思考模式:此模式提供快速、近乎即时的响应。它针对简单的查询、问答以及对低延迟要求高于深度思考的任务进行了优化。

2. Extensive Multilingual Support

Qwen3 supports an impressive 119 languages and dialects, a substantial increase from the 29 languages supported by its predecessor, Qwen2.5. This includes major languages like English, French, and Chinese (Simplified and Traditional), as well as regional dialects like Cantonese, significantly broadening its applicability for global and cross-cultural applications.

2. 广泛的多语言支持

Qwen3支持多达119种语言和方言,相比其前代Qwen2.5支持的29种语言有大幅提升。这包括英语、法语、中文(简体和繁体)等主要语言,以及粤语等地区方言,极大地扩展了其在全球和跨文化应用中的适用性。

3. Enhanced Agent and Tool Integration

Qwen3 features optimized coding and agent capabilities. It natively supports the Model Context Protocol (MCP), enabling seamless and efficient integration with external tools and data sources. When combined with the Qwen-Agent framework, it significantly reduces coding complexity for building sophisticated agents capable of performing tasks on devices like phones and computers.

3. 增强的智能体与工具集成能力

Qwen3具备优化的编码和智能体能力。它原生支持模型上下文协议(MCP),能够与外部工具和数据源实现无缝高效集成。与Qwen-Agent框架结合使用时,可以大大降低构建能够在手机、电脑等设备上执行任务的复杂智能体的编码复杂度。

4. Flexible Model Portfolio

The Qwen3 series offers a diverse range of model configurations to suit various computational constraints and deployment scenarios:

  • Mixture-of-Experts (MoE) Models: Qwen3-235B-A22B and Qwen3-30B-A3B.
  • Dense Models: Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B.
    This portfolio ensures coverage from resource-constrained edge devices to large-scale enterprise server deployments.

4. 灵活的模型组合

Qwen3系列提供了多样化的模型配置,以适应不同的计算限制和部署场景:

  • 混合专家模型Qwen3-235B-A22B 和 Qwen3-30B-A3B。
  • 稠密模型Qwen3-32B、Qwen3-14B、Qwen3-8B、Qwen3-4B、Qwen3-1.7B 和 Qwen3-0.6B。
    该组合确保了从资源受限的边缘设备到大规模企业服务器部署的全覆盖。

Specialized Models: Embedding and Reranker

Alongside the main LLM, Alibaba has open-sourced two specialized models crucial for retrieval-augmented generation (RAG) and search systems.

Qwen3-Embedding

This model is designed for generating high-quality semantic representations (embeddings) of text.

  • Function: It takes a single text segment as input and uses the hidden state vector corresponding to the final layer's [EOS] token as the semantic representation of the input text.
  • Performance: The 8B parameter version achieved a top score of 70.58 on the MTEB multilingual leaderboard, surpassing commercial API services like Google's Gemini-Embedding.
  • Use Cases: Ideal for tasks requiring semantic text representation, such as text classification, clustering, and similarity calculation.

除了主LLM,阿里巴巴还开源了两个对检索增强生成(RAG)和搜索系统至关重要的专用模型。

Qwen3-嵌入模型

该模型专为生成高质量的文本语义表示(嵌入)而设计。

  • 功能:接收单段文本作为输入,并使用模型最后一层[EOS]标记对应的隐藏状态向量作为输入文本的语义表示。
  • 性能:其80亿参数版本在MTEB多语言排行榜上获得了70.58的最高分,超越了谷歌Gemini-Embedding等商业API服务。
  • 应用场景:适用于需要对文本进行语义表征的场景,如文本分类、聚类和相似度计算。

Qwen3-Reranker

This model is specialized for scoring the relevance between text pairs, such as a user query and a candidate document.

  • Function: It employs a single-tower architecture to compute and output a relevance score for an input text pair.
  • Performance: The 8B model excels in multilingual retrieval, scoring 69.02. It achieves 77.45 on Chinese-specific tasks and 69.76 on English tasks, significantly outperforming other baseline models.
  • Use Cases: Critical for improving the relevance and accuracy of search results in search engines or answer ranking in QA systems.

Qwen3-重排序模型

该模型专用于对文本对(例如用户查询与候选文档)之间的相关性进行评分。

  • 功能:采用单塔结构计算并输出输入文本对的相关性得分。
  • 性能:其80亿参数模型在多语言检索任务中表现出色,得分为69.02。在中文特定任务上得分为77.45,在英文任务上得分为69.76,显著优于其他基线模型。
  • 应用场景:对于提升搜索引擎中搜索结果的相关性和准确性,或问答系统中的答案排序至关重要。

Technical Architecture and Training

1. Large-Scale Pre-training

Qwen3 was pre-trained on a massive dataset of approximately 36 trillion tokens, doubling the scale of Qwen2.5. This data covers all 119 supported languages. The pre-training was conducted in three strategic stages:

  1. Stage 1 (S1): Basic training on over 30T tokens with a 4K context length to establish foundational language skills.
  2. Stage 2 (S2): Training on an additional 5T tokens with an enriched mix of knowledge-intensive data (STEM, coding, reasoning).
  3. Stage 3: High-quality long-context training to extend the effective context length to 32K tokens.

1. 大规模预训练

Qwen3在约36万亿个token的大规模数据集上进行了预训练,规模是Qwen2.5的两倍。这些数据涵盖了所有119种支持的语言。预训练分为三个战略阶段:

  1. 第一阶段:在超过30万亿token、4K上下文长度的数据上进行基础训练,以建立基本的语言技能。
  2. 第二阶段:在额外的5万亿token上进行训练,数据混合了更多知识密集型内容(STEM、编程、推理)。
  3. 第三阶段:使用高质量的长上下文数据进行训练,将有效上下文长度扩展到32K token。

2. Optimized Post-Training Pipeline

To develop the hybrid reasoning model, a sophisticated four-stage post-training pipeline was implemented:

  1. Long Chain-of-Thought Cold Start: Supervised fine-tuning on diverse long-chain reasoning data across mathematics, code, and logic.
  2. Reasoning-based Reinforcement Learning (RL): Using rule-based rewards to enhance the model's exploration and in-depth reasoning capabilities.
  3. Thinking Mode Fusion: Fine-tuning on a combined dataset of long-chain data and standard instruction data to integrate the non-thinking mode into the thinking model.
  4. General Reinforcement Learning: Applying RL across 20+ general domains (instruction following, formatting, agent skills) to enhance overall capability and correct undesirable behaviors.

2. 优化的后训练流程

为了开发混合推理模型,实施了一个复杂的四阶段后训练流程:

  1. 长思维链冷启动:在涵盖数学、代码和逻辑的多样化长思维链数据上进行监督微调。
  2. 基于推理的强化学习:使用基于规则的奖励来增强模型的探索和深度推理能力。
  3. 思维模式融合:在长思维链数据和标准指令数据的组合数据集上进行微调,将非思考模式整合到思考模型中。
  4. 通用强化学习:在超过20个通用领域(指令遵循、格式遵循、智能体技能)应用强化学习,以增强整体能力并纠正不良行为。

Performance and Efficiency

Qwen3 demonstrates state-of-the-art performance across numerous benchmarks while maintaining high deployment efficiency.

  • Benchmark Excellence:
    • AIME25: Scored 81.5, setting a new open-source record.
    • LiveCodeBench: Surpassed 70 points, outperforming models like Grok3.
    • ArenaHard: Achieved a score of 95.6, surpassing OpenAI-o1 and DeepSeek-R1.
  • Deployment Efficiency: Performance is achieved with remarkable efficiency. For instance, the full-capacity model can be deployed using only 4 H20 GPUs, with GPU memory consumption reportedly one-third of that required by models with comparable performance.

Qwen3在众多基准测试中展示了顶尖性能,同时保持了较高的部署效率。

  • 基准测试卓越表现
    • AIME25:得分81.5,创造了新的开源纪录。
    • LiveCodeBench:超过70分,表现优于Grok3等模型。
    • ArenaHard:获得95.6分,超越了OpenAI-o1和DeepSeek-R1。
  • 部署效率:其性能是在显著的效率下实现的。例如,满血版模型仅需4张H20 GPU即可部署,据报道其显存占用仅为性能相近模型的三分之一。

Project Resources

Developers and researchers can access Qwen3 through the following official channels:

开发者和研究人员可以通过以下官方渠道获取Qwen3

Conclusion

Alibaba's Qwen3 represents a formidable advancement in the open-source LLM landscape. Its hybrid reasoning design offers unprecedented flexibility, its massive multilingual training corpus breaks down language barriers, and its specialized embedding/reranker models provide a complete toolkit for advanced NLP applications. Coupled with its top-tier benchmark performance, efficient deployment characteristics, and fully open-source Apache 2.0 license, Qwen3 is poised to become a cornerstone model for both academic research and industrial innovation worldwide.

阿里巴巴的Qwen3代表了开源大型语言模型领域的一次重大进步。其混合推理设计提供了前所未有的灵活性,其庞大的多语言训练语料库打破了语言障碍,其专用的嵌入和重排序模型为高级NLP应用提供了完整的工具包。再加上其顶级的基准测试性能、高效的部署特性以及完全开源的Apache 2.0许可证,Qwen3有望成为全球学术研究和工业创新的基石模型。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。