智能体计算图如何优化LLM工作流？2026年最新框架与评估方法

Q: 结构感知评估与传统任务评估有何不同？

结构感知评估不仅关注任务完成度，还补充图级属性（如复杂度）、执行成本、鲁棒性及跨输入的结构变化分析，提供更全面的工作流优化指导。

Agentic Computation Graphs: A Survey on Designing and Optimizing LLM Workflows

摘要

Abstract

大型语言模型（LLM）系统正日益流行，它们通过构建可执行的工作流来完成任务，这些工作流交织了LLM调用、信息检索、工具使用、代码执行、内存更新和验证。本综述回顾了近期设计和优化此类工作流的方法，我们将其视为智能体计算图。我们根据工作流结构何时确定来组织文献，其中“结构”指的是存在哪些组件或智能体、它们如何相互依赖以及信息如何在它们之间流动。这一视角区分了静态方法（在部署前固定可重用的工作流框架）和动态方法（在执行前或执行期间为特定运行选择、生成或修订工作流）。我们进一步沿着三个维度组织先前的工作：结构何时确定、工作流的哪一部分被优化，以及哪些评估信号指导优化（例如，任务指标、验证器信号、偏好或源自执行轨迹的反馈）。我们还区分了可重用的工作流模板可重复使用的工作流设计模式，作为实现图的基础、运行特定的实现图以及执行轨迹，将可重用的设计选择与给定运行中实际部署的结构以及已实现的运行时行为分离开来。最后，我们概述了一种结构感知的评估视角，该视角用图级属性、执行成本、鲁棒性以及跨输入的结构变化来补充下游任务指标。我们的目标是提供一个清晰的词汇表、一个用于定位新方法的统一框架、一个对现有文献更具可比性的视角，以及一个为未来LLM智能体工作流优化工作提供更具可重复性的评估标准。

Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification. This survey reviews recent methods for designing and optimizing such workflows, which we treat as agentic computation graphs (ACGs). We organize the literature based on when workflow structure is determined, where structure refers to which components or agents are present, how they depend on each other, and how information flows between them. This lens distinguishes static methods, which fix a reusable workflow scaffold before deployment, from dynamic methods, which select, generate, or revise the workflow for a particular run before or during execution. We further organize prior work along three dimensions: when structure is determined, what part of the workflow is optimized, and which evaluation signals guide optimization (e.g., task metrics, verifier signals, preferences, or trace-derived feedback). We also distinguish reusable workflow templates, run-specific realized graphs, and execution traces, separating reusable design choices from the structures actually deployed in a given run and from realized runtime behavior. Finally, we outline a structure-aware evaluation perspective that complements downstream task metrics with graph-level properties, execution cost, robustness, and structural variation across inputs. Our goal is to provide a clear vocabulary, a unified framework for positioning new methods, a more comparable view of existing body of literature, and a more reproducible evaluation standard for future work in workflow optimizations for LLM agents.

引言：从线性链到智能体计算图

Introduction: From Linear Chains to Agentic Computation Graphs

传统上，大语言模型的应用常被视为一个简单的“输入-输出”过程。然而，随着任务复杂度的提升，单一模型调用已不足以应对。现代LLM智能体系统通过将任务分解为一系列步骤，并协调不同的能力模块（如代码解释器、搜索引擎、验证器）来工作。这种由多个组件（节点）及其数据和控制流依赖关系（边）组成的系统，我们称之为智能体计算图。它超越了简单的线性链式思维（CoT），形成了一个动态、可执行的计算网络。

Traditionally, the application of large language models has often been viewed as a simple "input-output" process. However, as task complexity increases, a single model call is no longer sufficient. Modern LLM agent systems work by decomposing tasks into a series of steps and coordinating different capability modules (such as code interpreters, search engines, verifiers). We refer to such a system, composed of multiple components (nodes) and their data and control flow dependencies (edges), as an Agentic Computation Graph (ACG). It goes beyond simple linear chain-of-thought (CoT) to form a dynamic, executable computational network.

理解、设计和优化这些计算图是构建高效、可靠LLM智能体的核心。本文旨在系统性地梳理相关研究，提供一个清晰的分类框架和分析视角。

Understanding, designing, and optimizing these computation graphs is central to building efficient and reliable LLM agents. This article aims to systematically review related research, providing a clear classification framework and analytical perspective.

核心概念与分类框架

Core Concepts and Classification Framework

什么是智能体计算图？

What is an Agentic Computation Graph?

一个智能体计算图是一个有向图 G = (V, E)，其中：

节点 (V): 代表计算单元或“智能体”。一个节点可以是一个LLM调用、一个工具函数（如Python解释器、API调用）、一个记忆存储/检索操作，或一个决策逻辑（如条件判断）。
边 (E): 代表节点间的依赖关系和数据流。一条边表示上游节点的输出是下游节点的输入，或者下游节点的执行依赖于上游节点的完成状态。

An Agentic Computation Graph is a directed graph G = (V, E), where:

Nodes (V): Represent computational units or "agents". A node can be an LLM call, a tool function (e.g., Python interpreter, API call), a memory store/retrieval operation, or a decision logic (e.g., conditional judgment).

Edges (E): Represent dependencies and data flow between nodes. An edge indicates that the output of an upstream node is the input to a downstream node, or that the execution of a downstream node depends on the completion status of an upstream node.

其核心特征是将任务求解过程显式地建模为一个可执行、可分析的结构化计算流程。

Its core feature is explicitly modeling the problem-solving process as an executable, analyzable, structured computational flow.

关键分类维度

Key Classification Dimensions

本文提出的主要分类依据是工作流结构确定的时间点。这引出了两大范式：

The primary classification basis proposed in this paper is the timing of when the workflow structure is determined. This leads to two major paradigms:


范式	结构确定时机	核心特点	典型代表/方法	优点	缺点
静态工作流	部署/运行前	结构固定，可重复使用；设计时优化。	ReAct, Plan-and-Execute, 预定义多智能体协作框架。	确定性高，易于调试和验证，执行效率通常较高。	灵活性差，难以适应未见过的任务模式或异常。
动态工作流	运行前或运行时	结构根据具体任务或中间结果即时生成或调整。	LLM-based Planner (如Voyager), 基于反射或验证的图重写。	适应性强，能处理复杂、开放域任务。	不确定性高，优化和调试困难，可能引入额外开销。

Paradigm Structure Determination Timing Core Characteristics Typical Examples/Methods Advantages Disadvantages

Static Workflow Before deployment/execution Fixed structure, reusable; optimized at design time. ReAct, Plan-and-Execute, predefined multi-agent collaboration frameworks. High determinism, easy to debug and verify, typically higher execution efficiency. Poor flexibility, difficult to adapt to unseen task patterns or exceptions.

Dynamic Workflow Before or during runtime Structure is generated or adjusted on-the-fly based on specific tasks or intermediate results. LLM-based Planner (e.g., Voyager), reflection or verification-based graph rewriting. High adaptability, capable of handling complex, open-domain tasks. High uncertainty, difficult to optimize and debug, may introduce additional overhead.

在此基础之上，我们可以从三个更细致的维度对工作流优化方法进行分析：

Building upon this, we can analyze workflow optimization methods along three more detailed dimensions:

何时优化 (When): 是设计时（一次性）、编译时（针对任务类型）还是运行时（针对本次执行）？

优化什么 (What): 是优化节点的内部逻辑（如提示词）、节点间的连接结构（如图拓扑），还是节点的执行参数（如模型选择）？

依据什么优化 (Which Signal): 使用什么信号来指导优化？例如：

最终任务指标（如准确率、成功率）

内部验证器信号（如代码语法检查、事实一致性校验）

人类偏好反馈

从历史执行轨迹中推导出的奖励

When to Optimize: Is it at design time (one-time), compile time (for a task type), or runtime (for this specific execution)?

What to Optimize: Is it the internal logic of nodes (e.g., prompts), the connection structure between nodes (e.g., graph topology), or the execution parameters of nodes (e.g., model selection)?

Which Signal to Use for Optimization: What signal guides the optimization? For example:

Final task metrics (e.g., accuracy, success rate)

Internal verifier signals (e.g., code syntax check, factual consistency verification)

Human preference feedback

Rewards derived from historical execution traces

静态工作流：设计与优化

Static Workflows: Design and Optimization

静态工作流的核心思想是预先定义一个可靠的任务解决模板。这种方法将人类的领域知识和对任务分解的理解编码到图结构中。

The core idea of static workflows is to predefine a reliable task-solving template. This method encodes human domain knowledge and understanding of task decomposition into the graph structure.

常见静态模式

Common Static Patterns

模式名称图结构描述关键节点类型适用场景

ReAct (Reasoning + Acting) 循环结构：思考 -> 行动 -> 观察 -> ... LLM（思考），工具调用（行动），环境反馈（观察）。需要与环境交互的问答、决策任务。

Plan-and-Execute 两阶段流水线：规划器 -> 执行器。 LLM规划器（生成步骤），多个工具执行节点。可被清晰分解为子步骤的复杂任务（如数据分析、报告生成）。

多智能体协作 有向无环图（DAG），不同角色的智能体节点通过消息传递协作。专长化LLM智能体（如程序员、评论员、经理），协调器。需要多角度专业知识或辩论的任务（如代码审查、方案设计）。

Pattern Name Graph Structure Description Key Node Types Applicable Scenarios

ReAct (Reasoning + Acting) Loop structure: Think -> Act -> Observe -> ... LLM (Think), Tool Call (Act), Environment Feedback (Observe). QA and decision-making tasks requiring interaction with an environment.

Plan-and-Execute Two-stage pipeline: Planner -> Executor. LLM Planner (generates steps), Multiple Tool Execution Nodes. Complex tasks that can be clearly decomposed into sub-steps (e.g., data analysis, report generation).

Multi-Agent Collaboration Directed Acyclic Graph (DAG), agent nodes with different roles collaborate via message passing. Specialized LLM Agents (e.g., programmer, critic, manager), Coordinator. Tasks requiring multi-perspective expertise or debate (e.g., code review, solution design).

静态工作流的优化策略

Optimization Strategies for Static Workflows

即使结构固定，其内部仍有大量可优化空间：

Even with a fixed structure, there is significant room for optimization internally:

节点级优化: 为每个LLM节点精心设计提示词（Prompt），或为工具节点调整参数。这可以通过提示工程、基于梯度的提示微调或强化学习来完成。

边级优化: 优化节点间传递的信息格式和内容，例如通过记忆（Memory） 节点提炼和缓存关键历史信息，避免冗余传输。

全局优化: 将整个工作流视为一个黑盒，使用进化算法或贝叶斯优化来调整其超参数（如循环次数、决策阈值），以最大化最终任务奖励。

Node-Level Optimization: Carefully design prompts for each LLM node or adjust parameters for tool nodes. This can be done through prompt engineering, gradient-based prompt tuning, or reinforcement learning.

Edge-Level Optimization: Optimize the format and content of information passed between nodes, for example, by using Memory nodes to distill and cache key historical information, avoiding redundant transmission.

Global Optimization: Treat the entire workflow as a black box and use evolutionary algorithms or Bayesian optimization to tune its hyperparameters (e.g., number of loops, decision thresholds) to maximize the final task reward.

静态优化的优势在于其可重复性和稳定性。一旦优化完成，该工作流可以高效、可靠地处理同类任务。

The advantage of static optimization lies in its reproducibility and stability. Once optimized, the workflow can handle similar tasks efficiently and reliably.

（注：由于输入内容主要提供了论文的元数据和摘要，本文基于该摘要扩展了核心概念、分类框架和静态工作流部分。若需继续撰写动态工作流、结构感知评估超越传统任务指标的评估方法，同时考虑图级属性、执行成本、鲁棒性和结构变化等章节，请提供更多详细内容。）

(Note: Since the input content primarily provided the paper's metadata and abstract, this article expanded upon the core concepts, classification framework, and static workflow sections based on that abstract. If continuation into dynamic workflows, structure-aware evaluation, and other chapters is desired, please provide more detailed content.)

常见问题（FAQ）

什么是智能体计算图（ACG）将LLM智能体系统表示为包含LLM调用、信息检索、工具使用、代码执行、内存更新和验证等组件的可执行工作流计算图，它如何优化LLM工作流？

智能体计算图是将LLM工作流建模为有向图，其中节点代表LLM调用、工具使用等计算单元，边代表数据流和依赖关系。它通过结构化设计实现任务分解与模块协调，为优化提供系统框架。

静态工作流和动态工作流的主要区别是什么？

主要区别在于工作流结构确定的时间：静态方法在部署前固定可重用框架；动态方法在执行前或执行中为特定任务生成或调整结构，后者更灵活但设计更复杂。

结构感知评估超越传统任务指标的评估方法，同时考虑图级属性、执行成本、鲁棒性和结构变化与传统任务评估有何不同？

结构感知评估超越传统任务指标的评估方法，同时考虑图级属性、执行成本、鲁棒性和结构变化不仅关注任务完成度，还补充图级属性（如复杂度）、执行成本、鲁棒性及跨输入的结构变化分析，提供更全面的工作流优化指导。


Paradigm	Structure Determination Timing	Core Characteristics	Typical Examples/Methods	Advantages	Disadvantages
Static Workflow	Before deployment/execution	Fixed structure, reusable; optimized at design time.	ReAct, Plan-and-Execute, predefined multi-agent collaboration frameworks.	High determinism, easy to debug and verify, typically higher execution efficiency.	Poor flexibility, difficult to adapt to unseen task patterns or exceptions.
Dynamic Workflow	Before or during runtime	Structure is generated or adjusted on-the-fly based on specific tasks or intermediate results.	LLM-based Planner (e.g., Voyager), reflection or verification-based graph rewriting.	High adaptability, capable of handling complex, open-domain tasks.	High uncertainty, difficult to optimize and debug, may introduce additional overhead.


模式名称	图结构描述	关键节点类型	适用场景
ReAct (Reasoning + Acting)	循环结构：思考 -> 行动 -> 观察 -> ...	LLM（思考），工具调用（行动），环境反馈（观察）。	需要与环境交互的问答、决策任务。
Plan-and-Execute	两阶段流水线：规划器 -> 执行器。	LLM规划器（生成步骤），多个工具执行节点。	可被清晰分解为子步骤的复杂任务（如数据分析、报告生成）。
多智能体协作	有向无环图（DAG），不同角色的智能体节点通过消息传递协作。	专长化LLM智能体（如程序员、评论员、经理），协调器。	需要多角度专业知识或辩论的任务（如代码审查、方案设计）。


Pattern Name	Graph Structure Description	Key Node Types	Applicable Scenarios
ReAct (Reasoning + Acting)	Loop structure: Think -> Act -> Observe -> ...	LLM (Think), Tool Call (Act), Environment Feedback (Observe).	QA and decision-making tasks requiring interaction with an environment.
Plan-and-Execute	Two-stage pipeline: Planner -> Executor.	LLM Planner (generates steps), Multiple Tool Execution Nodes.	Complex tasks that can be clearly decomposed into sub-steps (e.g., data analysis, report generation).
Multi-Agent Collaboration	Directed Acyclic Graph (DAG), agent nodes with different roles collaborate via message passing.	Specialized LLM Agents (e.g., programmer, critic, manager), Coordinator.	Tasks requiring multi-perspective expertise or debate (e.g., code review, solution design).

AI Summary (BLUF)

摘要

引言：从线性链到智能体计算图

核心概念与分类框架

什么是智能体计算图？

关键分类维度

静态工作流：设计与优化

常见静态模式

静态工作流的优化策略

常见问题（FAQ）

什么是智能体计算图（ACG）将LLM智能体系统表示为包含LLM调用、信息检索、工具使用、代码执行、内存更新和验证等组件的可执行工作流计算图，它如何优化LLM工作流？

静态工作流和动态工作流的主要区别是什么？

结构感知评估超越传统任务指标的评估方法，同时考虑图级属性、执行成本、鲁棒性和结构变化与传统任务评估有何不同？