Claude Opus 4和Sonnet 4发布:2026年AI编码与推理新标杆
Anthropic introduces Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents with extended thinking, parallel tool use, and memory capabilities.
原文翻译: Anthropic发布Claude Opus 4和Claude Sonnet 4,通过扩展思考、并行工具使用和记忆能力,为编码、高级推理和AI智能体设定了新标准。
Today, Anthropic unveils the next generation of its flagship AI models: Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 and Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites.. These models establish new benchmarks for coding proficiency, advanced reasoning, and the capabilities of AI agents.
今天,Anthropic 发布了其旗舰 AI 模型的新一代产品:Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 和 Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites.。这些模型为编码能力、高级推理和 AI 智能体能力树立了新的标杆。
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 is positioned as the world's premier coding model, demonstrating sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites. represents a significant upgrade over its predecessor, Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to user instructions.
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 被定位为全球顶级的编码模型,在复杂、长期运行的任务和智能体工作流中展现出持续的高性能。Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites. 是其前代 Claude Sonnet 3.7 的重大升级,在提供卓越编码和推理能力的同时,对用户指令的响应也更加精确。
Key Announcements and Capabilities
Alongside the new models, Anthropic announced several pivotal enhancements and features.
除了新模型,Anthropic 还宣布了几项关键的增强功能和特性。
Extended Thinking with Tool Use (Beta)
Both new models can now utilize external tools—such as web search—during extended thinking phases. This allows Claude to alternate between deep reasoning and practical tool use, significantly improving the quality and accuracy of its responses.
两款新模型现在都可以在扩展思考阶段使用外部工具——例如网络搜索。这使得 Claude 能够在深度推理和实际工具使用之间交替进行,显著提高了其响应的质量和准确性。
New Model Capabilities
The Claude 4 models introduce several advanced capabilities:
- Parallel Tool Execution: Both models can use multiple tools simultaneously.
- Enhanced Instruction Following: Improved precision in adhering to complex user instructions.
- Advanced Memory with Local File Access: When developers grant access to local files, the models demonstrate significantly improved memory. They can extract and save key facts to maintain context and build tacit knowledge over time, enabling better long-term task awareness and coherence.
Claude 4 模型引入了多项高级能力:
- 并行工具执行:两款模型可以同时使用多个工具。
- 增强的指令遵循:在遵循复杂用户指令方面精度更高。
- 具备本地文件访问的高级记忆:当开发者授予本地文件访问权限时,模型展现出显著改进的记忆能力。它们可以提取并保存关键信息以维持上下文,并随时间积累隐性知识,从而实现更好的长期任务感知和连贯性。
Claude CodeAn AI coding assistant developed by Anthropic that provides code completion and explanations. is Now Generally Available
Following a successful research preview, Claude CodeAn AI coding assistant developed by Anthropic that provides code completion and explanations. is now generally available. It expands how developers can collaborate with Claude by supporting:
- Background tasks via GitHub Actions.
- Native integrations with VS Code and JetBrains IDEs, displaying proposed edits directly within the code files for seamless pair programming.
经过成功的研究预览,Claude CodeAn AI coding assistant developed by Anthropic that provides code completion and explanations. 现已全面上市。它通过以下方式扩展了开发者与 Claude 协作的方式:
- 通过 GitHub Actions 支持后台任务。
- 与 VS Code 和 JetBrains IDE 的原生集成,将建议的编辑直接显示在代码文件中,实现无缝结对编程。
New API Capabilities for Agent Development
Anthropic is releasing four new capabilities on its API to empower developers to build more powerful AI agents:
- Code Execution Tool
- MCP (Model Context Protocol) Connector
- Files API
- Prompt Caching (up to one hour)
Anthropic 在其 API 上发布了四项新功能,以赋能开发者构建更强大的 AI 智能体:
- 代码执行工具
- MCP(模型上下文协议)连接器
- 文件 API
- 提示词缓存(最长一小时)
Model Availability and Pricing
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 and Sonnet 4 are hybrid models offering two operational modes: near-instant responses and extended thinking for deeper reasoning. They are available through:
- Claude.ai: Included in Pro, Max, Team, and Enterprise plans. Sonnet 4 is also available to free users.
- API: Direct access via Anthropic's API.
- Cloud Platforms: Amazon Bedrock and Google Cloud's Vertex AI.
Pricing remains consistent with previous Opus and Sonnet models:
- Opus 4: $15 per million tokens (input) / $75 per million tokens (output)
- Sonnet 4: $3 per million tokens (input) / $15 per million tokens (output)
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 和 Sonnet 4 是混合模型,提供两种操作模式:近即时响应和用于深度推理的扩展思考。它们可通过以下方式获取:
- Claude.ai:包含在 Pro、Max、Team 和 Enterprise 计划中。Sonnet 4 也对免费用户开放。
- API:通过 Anthropic 的 API 直接访问。
- 云平台:Amazon Bedrock 和 Google Cloud 的 Vertex AI。
定价与之前的 Opus 和 Sonnet 模型保持一致:
- Opus 4:每百万令牌 15 美元(输入)/ 每百万令牌 75 美元(输出)
- Sonnet 4:每百万令牌 3 美元(输入)/ 每百万令牌 15 美元(输出)
Deep Dive: Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 is Anthropic's most powerful model to date and claims the title of the world's best coding model. It leads on key benchmarks:
- SWE-bench软件工程基准测试,用于评估模型在真实软件工程任务中的性能,Claude Opus 4和Sonnet 4在该测试中分别达到72.5%和72.7%的通过率。 (Verified): 72.5%
- Terminal-bench: 43.2%
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 是 Anthropic 迄今为止最强大的模型,并宣称是全球最佳编码模型。它在关键基准测试中领先:
- SWE-bench软件工程基准测试,用于评估模型在真实软件工程任务中的性能,Claude Opus 4和Sonnet 4在该测试中分别达到72.5%和72.7%的通过率。 (已验证):72.5%
- Terminal-bench:43.2%
The model is engineered for sustained performance on long-running tasks that require focused effort over thousands of steps, capable of working continuously for several hours. This dramatically outperforms all previous Sonnet models and significantly expands the potential of AI agents.
该模型专为需要集中精力、经过数千个步骤的长期运行任务而设计,能够持续工作数小时。这显著超越了所有之前的 Sonnet 模型,并极大地扩展了 AI 智能体的潜力。
Industry Validation
Leading companies have validated Opus 4's capabilities:
- Cursor: Calls it state-of-the-art for coding and a leap forward in understanding complex codebases.
- Replit: Reports improved precision and dramatic advancements for making complex changes across multiple files.
- Block: Notes it's the first model to boost code quality during editing and debugging in their agent (
codename goose) while maintaining full performance and reliability. - Rakuten: Validated its capabilities with a demanding open-source refactor that ran independently for 7 hours with sustained performance.
- Cognition: Highlights that Opus 4 excels at solving complex challenges other models cannot, successfully handling critical actions that previous models missed.
领先的公司已验证了 Opus 4 的能力:
- Cursor:称其为编码领域的尖端技术,在理解复杂代码库方面实现了飞跃。
- Replit:报告称其在跨多个文件进行复杂更改时精度更高,进步显著。
- Block:指出这是首个在其智能体(
代号 goose)的编辑和调试过程中能提升代码质量,同时保持完全性能和可靠性的模型。- Rakuten:通过一项要求苛刻的开源重构任务验证了其能力,该任务独立运行了 7 小时且性能稳定。
- Cognition:强调 Opus 4 擅长解决其他模型无法应对的复杂挑战,成功处理了先前模型遗漏的关键操作。
Deep Dive: Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites.
Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites. represents a significant improvement over the industry-leading Sonnet 3.7. It excels in coding, achieving a state-of-the-art 72.7% on SWE-bench软件工程基准测试,用于评估模型在真实软件工程任务中的性能,Claude Opus 4和Sonnet 4在该测试中分别达到72.5%和72.7%的通过率。. The model is designed to balance high performance with efficiency for a wide range of internal and external use cases, featuring enhanced steerability for greater control over implementations. While it does not match Opus 4 in most domains, it offers an optimal mix of capability and practicality.
Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites. 相比行业领先的 Sonnet 3.7 有显著改进。它在编码方面表现出色,在 SWE-bench软件工程基准测试,用于评估模型在真实软件工程任务中的性能,Claude Opus 4和Sonnet 4在该测试中分别达到72.5%和72.7%的通过率。 上达到了顶尖的 72.7%。该模型旨在为广泛的内部和外部用例平衡高性能与效率,并具有增强的可操控性,以便更好地控制实现。虽然它在大多数领域不及 Opus 4,但它提供了能力与实用性的最佳组合。
Industry Validation
Key partners have reported strong results with Sonnet 4:
- GitHub: States that Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites. soars in agentic scenarios and will introduce it as the model powering the new coding agent in GitHub Copilot.
- Manus: Highlights improvements in following complex instructions, clear reasoning, and aesthetic outputs.
- iGent: Reports that Sonnet 4 excels at autonomous multi-feature app development, with substantially improved problem-solving and codebase navigation—reducing navigation errors from 20% to near zero.
- Sourcegraph: Says the model shows promise as a substantial leap in software development—staying on track longer, understanding problems more deeply, and providing more elegant code quality.
- Augment Code: Reports higher success rates, more surgical code edits, and more careful work through complex tasks, making it their top choice for a primary model.
主要合作伙伴报告了 Sonnet 4 的出色成果:
- GitHub:表示 Claude Sonnet 4An AI model mentioned as being supported alongside Gemini models on certain mirror sites. 在智能体场景中表现出色,将把它作为为 GitHub Copilot 中新编码智能体提供动力的模型引入。
- Manus:强调了其在遵循复杂指令、清晰推理和美观输出方面的改进。
- iGent:报告称 Sonnet 4 擅长自主的多功能应用开发,问题解决和代码库导航能力大幅提升——将导航错误率从 20% 降低到接近零。
- Sourcegraph:称该模型有望实现软件开发的重大飞跃——能更长时间地保持正轨、更深入地理解问题并提供更优雅的代码质量。
- Augment Code:报告了更高的成功率、更精准的代码编辑以及在复杂任务中更细致的工作,使其成为他们主要模型的首选。
Core Model Improvements
Beyond the headline features, the Claude 4 family includes several fundamental enhancements.
除了主要特性,Claude 4 系列还包括几项根本性的改进。
Reduced Shortcutting Behavior
Both models are 65% less likely to engage in shortcutting or using loopholes to complete tasks compared to Sonnet 3.7, particularly on agentic tasks susceptible to such behaviors.
与 Sonnet 3.7 相比,两款模型使用捷径或漏洞完成任务的可能性降低了 65%,尤其是在容易发生此类行为的智能体任务上。
Advanced Memory Capabilities
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 shows a dramatic improvement in memory capabilities. When applications provide local file access, Opus 4 becomes skilled at creating and maintaining 'memory files' to store key information. This unlocks better long-term task awareness, coherence, and performance. An example cited is Opus 4 creating a 'Navigation Guide' while playing Pokémon to improve its gameplay over time.
Claude Opus 4Anthropic推出的下一代Claude模型,是全球最佳编码模型,在复杂长期任务和智能体工作流中表现卓越,支持扩展思考、并行工具使用和记忆能力。 在记忆能力方面显示出巨大改进。当应用程序提供本地文件访问权限时,Opus 4 能够熟练地创建和维护“记忆文件”来存储关键信息。这带来了更好的长期任务感知、连贯性和性能。引用的一个例子是 Opus 4 在玩《宝可梦》时创建了一个“导航指南”,以随着时间的推移改进其游戏玩法。
Thinking Summaries
For Claude 4 models using extended thinking, Anthropic has introduced thinking summaries. A smaller model condenses lengthy thought processes, though this summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about the new Developer Mode to retain full access.
对于使用扩展思考的 Claude 4 模型,Anthropic 引入了思考摘要功能。一个较小的模型会压缩冗长的思维过程,不过这种摘要仅在大约 5% 的情况下需要——大多数思维过程都足够短,可以完整显示。需要原始思维链进行高级提示词工程的用户可以联系销售部门,了解新的开发者模式以保留完全访问权限。
Getting Started with Claude 4
These models represent a significant step toward the vision of AI as a virtual collaborator—capable of maintaining full context, sustaining focus on longer projects, and driving transformational impact. They have undergone extensive testing and evaluation to minimize risk and maximize safety, including implementing measures for higher AI Safety Levels like ASL-3.
这些模型朝着将 AI 视为虚拟协作者的愿景迈出了重要一步——能够保持完整的上下文、持续专注于长期项目并推动变革性影响。它们经过了广泛的测试和评估,以最小化风险并最大化安全性,包括为更高级别的 AI 安全等级(如 ASL-3)实施措施。
Developers and users can get started today on Claude.ai, Claude Code, or through the platform of their choice.
开发者和用户今天就可以通过 Claude.ai、Claude Code 或他们选择的平台开始使用。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。