构建高效LLM智能体：实用模式与最佳实践指南

Over the past year, we have collaborated with numerous teams across various industries to build Large Language Model (LLM) agents. A consistent pattern emerged: the most successful implementations were not built with the most complex frameworks or specialized libraries. Instead, they leveraged simple, composable patterns. This post distills the lessons learned from working with our customers and from our own development efforts, offering practical guidance for developers aiming to build effective agentic systems.

在过去的一年中，我们与跨行业的数十个团队合作，共同构建大型语言模型（LLM）智能体。一个一致的模式浮现出来：最成功的实现并非使用最复杂的框架或专用库，而是利用了简单、可组合的模式。本文总结了从客户合作和自身开发经验中获得的教训，为旨在构建有效智能体系统的开发者提供实用指导。

What Are Agents?

The term "agent" can be defined in several ways. Some define agents as fully autonomous systems that operate independently over extended periods, utilizing various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but we draw an important architectural distinction between two core types: workflows and agents.

“智能体”这个术语可以有多种定义方式。一些人将智能体定义为完全自主的系统，能够长时间独立运行，利用各种工具完成复杂任务。另一些人则用这个术语来描述遵循预定义工作流的、更具规定性的实现。在 Anthropic，我们将所有这些变体归类为智能体系统，但在架构上对两种核心类型做了重要区分：工作流和智能体。

Workflows are systems where LLMs and tools are orchestrated through predefined, deterministic code paths. The sequence of actions is largely fixed or follows a predictable decision tree.
工作流是指通过预定义的、确定性的代码路径来编排LLM和工具的系统。其动作序列在很大程度上是固定的，或遵循可预测的决策树。
Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage. They maintain control over how to accomplish a task, making decisions based on real-time reasoning and environmental feedback.
智能体则是指LLM动态指导自身流程和工具使用的系统。它们控制着如何完成任务，基于实时推理和环境反馈做出决策。

Below, we will explore both types of agentic systems in detail.

下文我们将详细探讨这两种类型的智能体系统。

When (and When Not) to Use Agents

When building applications with LLMs, we recommend finding the simplest viable solution and only increasing complexity when it is demonstrably necessary. Often, this means not building agentic systems at all. Agentic systems frequently trade increased latency and cost for improved task performance. It is crucial to evaluate when this trade-off is justified.

在使用LLM构建应用程序时，我们建议寻找最简单的可行解决方案，并且仅在明确必要时才增加复杂性。通常，这意味着根本不构建智能体系统。智能体系统常常以提高延迟和成本为代价来换取任务性能的提升。评估这种权衡何时是合理的至关重要。

When greater complexity is warranted, workflows offer predictability and consistency for well-defined, repetitive tasks. Agents become the better option when the problem requires flexibility, adaptive reasoning, and model-driven decision-making at scale. For a vast number of applications, however, optimizing single LLM calls with techniques like retrieval-augmented generation (RAG) and well-crafted in-context examples is sufficient.

当确实需要更高复杂度时，工作流为定义明确、重复性的任务提供了可预测性和一致性。智能体则在问题需要灵活性、自适应推理和大规模模型驱动决策时成为更好的选择。然而，对于绝大多数应用而言，通过检索增强生成（RAG）和精心设计的上下文示例等技术来优化单次LLM调用通常就足够了。

When and How to Use Frameworks

Numerous frameworks exist to simplify the implementation of agentic systems, such as the Claude Agent SDK, AWS's Strands Agents SDK, Rivet (a drag-and-drop GUI LLM workflow builder), and Vellum (another GUI tool for building and testing complex workflows).

现有许多框架可以简化智能体系统的实现，例如 Claude Agent SDK、AWS 的 Strands Agents SDK、Rivet（一个拖放式 GUI LLM 工作流构建器）以及 Vellum（另一个用于构建和测试复杂工作流的 GUI 工具）。

These frameworks lower the barrier to entry by abstracting standard low-level tasks like calling LLM APIs, defining and parsing tools, and chaining calls together. However, they often introduce layers of abstraction that can obscure the underlying prompts and model responses, making debugging more challenging. They can also inadvertently encourage over-engineering when a simpler setup would suffice.

这些框架通过抽象化调用LLM API、定义和解析工具、链接调用等标准底层任务，降低了入门门槛。然而，它们通常会引入抽象层，这可能掩盖底层的提示词和模型响应，使调试更具挑战性。此外，当更简单的设置就足够时，它们也可能无意中鼓励过度设计。

We suggest developers begin by using LLM APIs directly; many foundational patterns can be implemented in just a few lines of code. If you choose to use a framework, ensure you understand the underlying mechanics it abstracts. Incorrect assumptions about the framework's internals are a common source of errors.

我们建议开发者从直接使用LLM API开始；许多基础模式只需几行代码即可实现。如果你选择使用框架，请确保理解它所抽象的底层机制。对框架内部工作原理的错误假设是常见的错误来源。

Building Blocks, Workflows, and Agents

In this section, we explore common patterns for agentic systems observed in production. We start with the foundational building block—the augmented LLM—and progressively increase complexity, moving from simple compositional workflows to autonomous agents.

在本节中，我们将探讨在生产环境中观察到的智能体系统的常见模式。我们从基础构建模块——增强型LLM大型语言模型通过检索、工具和记忆等增强功能进行扩展，使其能够生成搜索查询、选择适当工具并确定保留哪些信息。——开始，逐步增加复杂性，从简单的组合式工作流过渡到自主智能体能够理解复杂输入、进行推理和规划、可靠使用工具并从错误中恢复的LLM系统，在明确任务后独立规划和操作，可根据环境反馈调整行为。。

Building Block: The Augmented LLM

The fundamental building block of any agentic system is an LLM enhanced with augmentations such as retrieval (access to external knowledge), tools (ability to perform actions), and memory (context of past interactions). Modern models can actively utilize these capabilities—generating their own search queries, selecting appropriate tools, and determining what information to retain for future steps.

任何智能体系统的基本构建模块都是一个增强了检索（访问外部知识）、工具（执行操作的能力）和记忆（过去交互的上下文）等功能的LLM。现代模型可以主动利用这些能力——生成自己的搜索查询、选择合适的工具，并决定保留哪些信息以供后续步骤使用。

We recommend focusing on two key implementation aspects: 1) tailoring these capabilities to your specific use case, and 2) ensuring they provide a clear, well-documented interface for the LLM. One approach to standardizing this interface is through protocols like the recently released Model Context Protocol (MCP), which allows developers to integrate with a growing ecosystem of third-party tools via a simple client implementation.

我们建议重点关注两个关键的实现方面：1) 根据你的具体用例定制这些功能；2) 确保它们为LLM提供清晰、文档完善的接口。标准化此接口的一种方法是通过像最近发布的模型上下文协议Anthropic最近发布的协议，允许开发者通过简单的客户端实现与不断增长的第三方工具生态系统集成。（MCP）这样的协议，它允许开发者通过简单的客户端实现与不断增长的第三方工具生态系统集成。

For the remainder of this discussion, we will assume each LLM call has access to these augmented capabilities.

在接下来的讨论中，我们将假设每次LLM调用都能访问这些增强功能。

Workflow: Prompt Chaining

Prompt chaining decomposes a task into a linear sequence of steps, where the output of one LLM call becomes the input for the next. Programmatic checks or "gates" can be inserted at intermediate steps to validate that the process remains on track before proceeding.

提示链将任务分解为一系列步骤的工作流，每个LLM调用处理前一个的输出，可添加程序化检查确保进程正常进行。将任务分解为线性步骤序列，其中一个LLM调用的输出成为下一个调用的输入。可以在中间步骤插入程序化检查或“关卡”，以验证流程在继续之前是否仍在正轨上。

When to use: This pattern is ideal for tasks that can be cleanly decomposed into fixed, sequential subtasks. The primary trade-off is increased latency for higher accuracy, as each LLM call handles a simpler, more focused sub-problem.

何时使用： 此模式非常适合可以清晰地分解为固定、顺序子任务的任务。主要的权衡是以增加延迟来换取更高的准确性，因为每次LLM调用处理的是一个更简单、更聚焦的子问题。

Examples:

Generating marketing copy and then translating it into another language.
生成营销文案，然后将其翻译成另一种语言。
Writing a document outline, verifying it meets criteria, then expanding it into a full document.
编写文档大纲，验证其是否符合标准，然后将其扩展为完整文档。

Workflow: Routing

Routing classifies an input and directs it to a specialized downstream process, prompt, or tool. This allows for separation of concerns and the use of more specialized, optimized prompts for different input types.

路由对输入进行分类并将其定向到专门后续任务的工作流，允许关注点分离并构建更专业的提示。对输入进行分类，并将其引导至专门的下游流程、提示词或工具。这实现了关注点分离，并允许针对不同的输入类型使用更专业、更优化的提示词。

When to use: Effective for complex tasks with distinct input categories that benefit from separate handling, and where classification can be performed accurately (by an LLM or a traditional classifier).

何时使用： 适用于具有不同输入类别的复杂任务，这些类别受益于单独处理，并且分类可以准确执行（通过LLM或传统分类器）。

Examples:

Directing customer service queries (general questions, refunds, tech support) to different processes.
将客户服务查询（一般问题、退款、技术支持）引导至不同的处理流程。
Routing simple queries to cost-efficient models (e.g., Claude Haiku) and complex ones to more capable models (e.g., Claude Sonnet).
将简单查询路由对输入进行分类并将其定向到专门后续任务的工作流，允许关注点分离并构建更专业的提示。到高性价比模型（如 Claude Haiku），复杂查询路由对输入进行分类并将其定向到专门后续任务的工作流，允许关注点分离并构建更专业的提示。到能力更强的模型（如 Claude Sonnet）。

Workflow: Parallelization

This workflow involves running multiple LLM calls simultaneously and aggregating their outputs programmatically. It has two key variations:

此工作流涉及同时运行多个LLM调用，并以编程方式聚合它们的输出。它有两个关键变体：

Sectioning: Breaking a task into independent subtasks run in parallel.
分段： 将任务分解为独立并行运行的子任务。
Voting: Running the same task multiple times to gather diverse outputs or achieve consensus.
投票： 多次运行相同的任务以收集不同的输出或达成共识。

When to use: Beneficial when subtasks are independent and can be parallelized for speed, or when multiple perspectives/attempts are needed for higher confidence or robustness.

何时使用： 当子任务独立且可以并行化LLM同时处理任务并通过程序聚合输出的工作流，包括分段（将任务分解为独立并行子任务）和投票（多次运行相同任务以获得多样化输出）两种变体。以提高速度时，或者当需要多个视角/尝试以获得更高置信度或鲁棒性时，此模式非常有益。

Examples:

Sectioning: Implementing guardrails where one model handles the user query and another screens for safety.
分段： 实施护栏，一个模型处理用户查询，另一个进行安全检查。
Voting: Reviewing code for security vulnerabilities with multiple independent prompts.
投票： 使用多个独立的提示词审查代码中的安全漏洞。

Workflow: Orchestrator-Workers

In this pattern, a central "orchestrator" LLM dynamically analyzes a task, breaks it down into subtasks, delegates them to specialized "worker" LLMs, and synthesizes the workers' results into a final output.

在此模式中，一个中央“协调器”LLM动态分析任务，将其分解为子任务，委托给专门的“工作器”LLM，并将工作器的结果综合成最终输出。

When to use: Suited for complex, unpredictable tasks where the required subtasks cannot be predefined. The key difference from simple parallelization is this dynamic, input-dependent decomposition performed by the orchestrator.

何时使用： 适用于复杂、不可预测的任务，其中所需的子任务无法预先定义。与简单并行化LLM同时处理任务并通过程序聚合输出的工作流，包括分段（将任务分解为独立并行子任务）和投票（多次运行相同任务以获得多样化输出）两种变体。的关键区别在于，这是由协调器执行的动态的、依赖于输入的分解。

Example: A coding agent that determines which files need to be modified and what changes to make based on a natural language request.

示例： 一个编码智能体，根据自然语言请求确定需要修改哪些文件以及进行何种更改。

Workflow: Evaluator-Optimizer

This workflow sets up an iterative refinement loop. One LLM (the "generator") produces a response, and another LLM (the "evaluator") critiques it, providing feedback that is used to generate an improved version in the next iteration.

此工作流建立一个迭代优化循环。一个LLM（“生成器”）产生响应，另一个LLM（“评估器”）对其进行评判，提供反馈，用于在下一轮迭代中生成改进版本。

When to use: Particularly effective when clear evaluation criteria exist and iterative refinement adds significant value. A good indicator is if human feedback demonstrably improves LLM output, and an LLM can simulate that feedback role.

何时使用： 当存在明确的评估标准且迭代优化能显著增加价值时，此模式特别有效。一个好的判断标准是：如果人类反馈能明显改进LLM输出，并且LLM可以模拟该反馈角色。

Example: Literary translation, where an evaluator LLM provides critiques on nuance, style, and accuracy to refine the translator LLM's output over several rounds.

示例： 文学翻译，其中评估器LLM对细微差别、风格和准确性提供批评，以在多轮中优化翻译器LLM的输出。

Agents

Agents represent the most autonomous class of agentic systems. They emerge as LLMs mature in capabilities like complex reasoning, planning, reliable tool use, and error recovery. An agent begins with a high-level goal from a user, formulates its own plan, and operates independently, using tools and gathering environmental feedback ("ground truth") at each step to assess progress. They may pause for human input at defined checkpoints or when stuck.

智能体代表了最自主的一类智能体系统。随着LLM在复杂推理、规划、可靠工具使用和错误恢复等能力上的成熟，智能体应运而生。智能体从用户的高级目标开始，制定自己的计划，并独立运行，在每一步使用工具并收集环境反馈（“真实情况”）以评估进展。它们可能在预定义的检查点或遇到阻碍时暂停以获取人工输入。

When to use: Ideal for open-ended problems where the number and nature of steps are unpredictable and cannot be hardcoded. Their autonomy makes them powerful for scaling tasks in trusted environments but comes with higher costs and the risk of compounding errors. Extensive testing in sandboxed environments and robust guardrails are essential.

何时使用： 非常适合步骤数量和性质不可预测且无法硬编码的开放式问题。它们的自主性使其在可信环境中扩展任务时非常强大，但也伴随着更高的成本和错误累积的风险。在沙盒环境中的广泛测试和强大的护栏至关重要。

Examples:

A coding agent that resolves software engineering tasks (e.g., SWE-bench) requiring edits across multiple files.
解决需要跨多个文件编辑的软件工程任务（例如 SWE-bench）的编码智能体。
A "computer use" agent that controls a desktop interface to accomplish tasks.
控制桌面界面以完成任务的“计算机使用”智能体。

Summary and Core Principles

Success with LLMs is not about building the most sophisticated system, but the right system for the need. Start simple—optimize single prompts with evaluation, retrieval, and examples. Introduce multi-step agentic systems only when simpler solutions are insufficient.

在LLM领域取得成功，不在于构建最复杂的系统，而在于构建满足需求的正确系统。从简单开始——通过评估、检索和示例优化单个提示词。仅当更简单的解决方案不足时，才引入多步骤智能体系统。

When implementing agents, we advocate for three core principles:

在实现智能体时，我们倡导三个核心原则：

Maintain Simplicity: Favor straightforward, interpretable designs over unnecessary complexity.
保持简单： 倾向于直接、可解释的设计，避免不必要的复杂性。
Prioritize Transparency: Make the agent's thought process, planning steps, and tool calls visible and understandable.
优先考虑透明度： 使智能体的思维过程、规划步骤和工具调用可见且可理解。
Craft the Agent-Computer Interface (ACI) Meticulously: Invest in clear, well-documented, and robustly tested tools, akin to designing a good human-computer interface.
精心设计智能体-计算机接口（ACI）： 投入精力创建清晰、文档完善且经过严格测试的工具，类似于设计良好的人机界面。

Frameworks can accelerate prototyping, but don't hesitate to peel back abstraction layers and build with foundational components as you move toward production. By adhering to these principles, you can create agents that are not only powerful but also reliable, maintainable, and trustworthy.

框架可以加速原型设计，但在迈向生产环境时，不要犹豫去剥离抽象层，使用基础组件进行构建。遵循这些原则，你可以创建出不仅功能强大，而且可靠、可维护且值得信赖的智能体。

Acknowledgements: Written by Erik Schluntz and Barry Zhang, drawing upon experiences at Anthropic and insights from our customers.

致谢：本文由 Erik Schluntz 和 Barry Zhang 撰写，借鉴了 Anthropic 的经验和来自客户的见解。