Gemini 3 Deep Think 在学术基准测试中表现如何？

在Humanity's Last Exam达48.4%（无工具），ARC-AGI-2达84.6%（经ARC Prize Foundation验证），Codeforces Elo 3455，表现领先。

Gemini 3 Deep Think 在学术基准测试中表现如何？

在Humanity's Last Exam达48.4%（无工具），ARC-AGI-2达84.6%（经ARC Prize Foundation验证），Codeforces Elo 3455，表现领先。

Gemini 3 Deep Think升级后如何助力科研挑战？2026年新功能实测

Q: 如何获取 Gemini 3 Deep Think？

Google AI Ultra订阅用户可在Gemini应用中使用；研究人员、工程师和企业可通过Gemini API申请早期访问权限，申请链接见原文。

Introduction

Today, we’re releasing a major upgrade to Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights., our specialized reasoning mode, built to push the frontier of intelligence and solve modern challenges across science, research, and engineering.

今天，我们发布了 Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights. 的重大升级版本，这是我们专有的推理模式，旨在推动智能前沿，解决科学、研究和工程领域的现代挑战。

We updated Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights. in close partnership with scientists and researchers to tackle tough research challenges — where problems often lack clear guardrails or a single correct solution and data is often messy or incomplete. By blending deep scientific knowledge with everyday engineering utility, Deep Think moves beyond abstract theory to drive practical applications.

我们与科学家和研究人员紧密合作，更新了 Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights.，以应对棘手的研究挑战——这些问题常常缺乏明确的边界、单一的正确答案，且数据往往杂乱或不完整。通过将深厚的科学知识与日常工程实用性相结合，Deep Think 超越了抽象理论，驱动实际应用。

The new Deep Think is now available in the Gemini app for Google AI Ultra subscribers and, for the first time, we’re also making Deep Think available via the Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model. to select researchers, engineers and enterprises. Express interest in early access here.

新的 Deep Think 现已面向 Google AI Ultra 订阅者开放，在 Gemini 应用中可用。同时，我们首次通过 Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model. 向选定的研究人员、工程师和企业提供 Deep Think。点击此处申请早期访问权限。

Here is how our early testers are already using the latest Deep Think:

以下是早期测试者如何使用最新版 Deep Think：

Lisa Carbone, a mathematician at Rutgers University, used Deep Think to review a highly technical mathematics paper. Deep Think successfully identified a subtle logical flaw that had previously passed through human peer review unnoticed.

Lisa Carbone，罗格斯大学的数学家，利用 Deep Think 审阅了一篇高度技术性的数学论文。Deep Think 成功识别出一个此前通过人类同行评审未被发现的细微逻辑缺陷。
Wang Lab at Duke University utilized Deep Think to optimize fabrication methods for complex crystal growth. Deep Think designed a recipe for growing thin films larger than 100 μm, meeting a precise target that previous methods had struggled to hit.

杜克大学王实验室利用 Deep Think 优化复杂晶体生长的制造方法。Deep Think 设计出生长大于 100 微米薄膜的配方，达到了此前方法难以实现的精确目标。
Anupam Pathak, R&D lead in Google’s Platforms and Devices division, tested Deep Think to accelerate the design of physical components.

Anupam Pathak，谷歌平台与设备部门的研发负责人，测试了 Deep Think 以加速物理组件的设计。

Key Capabilities and Benchmarks

Elevating Reasoning with Mathematical and Algorithmic Rigor

提升数学与算法严谨性

Last year, we showed that specialized versions of Deep Think could successfully navigate some of the toughest challenges in reasoning, achieving gold-medal standards at math and programming world championships. More recently, Deep Think has enabled specialized agents to conduct research-level mathematics exploration.

去年，我们展示了 Deep Think 的专用版本能够成功应对推理领域最艰巨的挑战，在数学和编程世界锦标赛中达到金牌标准。最近，Deep Think 使专用智能体能够进行研究级别的数学探索。

The updated Deep Think mode continues to push the frontiers of intelligence, reaching new heights across the most rigorous academic benchmarks, including:

更新后的 Deep Think 模式继续推动智能前沿，在最严格的学术基准测试中达到新的高度，包括：


Benchmark	Performance	Description
Humanity’s Last Exam	48.4% (without tools)	A benchmark designed to test the limits of modern frontier models
ARC-AGI-2人工智能推理能力基准测试，Deep Think得分84.6%，由ARC Prize Foundation验证。	84.6% (verified by ARC Prize Foundation)	Measures abstraction and reasoning capability
Codeforces (Elo rating)	Elo 3455	Competitive programming challenges benchmark
International Math Olympiad 2025	Gold medal level	World’s most prestigious mathematics competition for pre-university students

基准测试结果表

Navigating Complex Scientific Domains

驾驭复杂科学领域

Beyond mathematics and competitive coding, Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights. now also excels across broad scientific domains such as chemistry and physics. Our updated Deep Think mode demonstrates gold medal-level results on the written sections of the 2025 International Physics Olympiad and Chemistry Olympiad. It also demonstrates proficiency in advanced theoretical physics, achieving a score of 50.5% on CMT-Benchmark.

除了数学和竞技编程，Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights. 现在还在化学和物理等广泛科学领域表现出色。我们更新的 Deep Think 模式在 2025 年国际物理奥林匹克竞赛和国际化学奥林匹克竞赛的笔试部分达到金牌水平。在高级理论物理方面，它在 CMT-Benchmark 上取得了 50.5% 的成绩。


Benchmark	Performance	Description
2025 International Physics Olympiad (written)	Gold medal level	World’s top physics competition for secondary school students
2025 International Chemistry Olympiad (written)	Gold medal level	World’s top chemistry competition for secondary school students
CMT-Benchmark (advanced theoretical physics)	50.5%	Evaluation of proficiency in condensed matter theory and related physics

科学基准测试结果表

Accelerating Real-World Engineering

加速实际工程应用

In addition to its state-of-the-art performance, Deep Think is built to drive practical applications, enabling researchers to interpret complex data, and engineers to model physical systems through code. Most importantly, we are working to bring Deep Think to researchers and practitioners where they need it most — beginning with surfaces such as the Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model..

除了最先进的性能，Deep Think 旨在推动实际应用，帮助研究人员解读复杂数据，工程师通过代码建模物理系统。最重要的是，我们正在努力将 Deep Think 带到研究人员和实践者最需要的地方——从 Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model. 等界面开始。

With the updated Deep Think, you can turn a sketch into a 3D-printable reality. Deep Think analyzes the drawing, models the complex shape and generates a file to create the physical object with 3D printing.

借助更新的 Deep Think，您可以将草图变为可 3D 打印的现实。Deep Think 分析图纸、建模复杂形状并生成文件，通过 3D 打印创建实物。

Availability and Early Access

可用性与早期访问

Google AI Ultra subscribers will be able to access the updated Deep Think mode starting today in the Gemini app. Scientists, engineers and enterprises can also now express interest in our early access program to test Deep Think via the Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model..

Google AI Ultra 订阅者即日起可在 Gemini 应用中访问更新的 Deep Think 模式。科学家、工程师和企业现在也可以申请我们的早期访问计划，通过 Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model. 测试 Deep Think。

We can’t wait to see what you discover.

我们迫不及待想看到您的发现。

常见问题（FAQ）

Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights. 有哪些升级？

升级推理能力，聚焦解决科学、研究和工程难题；与科学家合作优化，处理无边界、数据杂乱问题。新版本已向Google AI Ultra订阅用户开放，并首次通过API提供早期访问。

如何获取 Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights.？

Google AI Ultra订阅用户可在Gemini应用中使用；研究人员、工程师和企业可通过Gemini APIThe application programming interface for integrating and calling the Gemini 3.0 model.申请早期访问权限，申请链接见原文。

Gemini 3 Deep ThinkAn enhanced reasoning mode for Gemini 3 that pushes its performance to new heights. 在学术基准测试中表现如何？

在Humanity's Last ExamA benchmark test where Gemini 3.0 scored 37.5% without tool assistance.达48.4%（无工具），ARC-AGI-2人工智能推理能力基准测试，Deep Think得分84.6%，由ARC Prize Foundation验证。达84.6%（经ARC Prize Foundation验证），Codeforces Elo编程竞赛评分系统，Deep Think达到Elo 3455。 3455，表现领先。

AI Summary (BLUF)