GEO
赞助商内容

如何用JSON和Pydantic实现LLM结构化输出?2026年最新实践指南

2026/4/23
如何用JSON和Pydantic实现LLM结构化输出?2026年最新实践指南

AI Summary (BLUF)

This article explains the critical importance of structured outputs in LLM workflows, detailing how to implement them from scratch using JSON and Pydantic, and through the Gemini SDK, to build reliabl

AI Agents Foundations #3: Structured Outputs — The Cornerstone of Reliable AI Applications

引言:一次生产事故的教训

Introduction: A Lesson from a Production Incident

在最近的一个项目中,我们的生产AI系统在一次重要演示前崩溃了。原因何在?因为我们没有在整个大型语言模型工作流中一致地使用结构化输出

In a recent project, our production AI system crashed right before an important demo. The reason? We were not consistently using structured outputs across our Large Language Model (LLM) workflows.

我们的测试环境一直运行良好,仅使用简单的正则表达式解析LLM的响应。然而,一旦部署到生产环境,一切都崩溃了。我们的正则表达式模式无法匹配略有差异的响应格式,数据类型不一致,下游流程无法处理这些不可预测的数据,导致了连锁故障。当演示日到来时,我们的系统完全无法使用。

Our staging environment had been working fine with simple regex parsing of LLM responses. However, when we deployed to production, everything fell apart. Our regex patterns failed to match slightly different response formats, data types were inconsistent, and downstream processes couldn't handle the unpredictable data, causing cascading failures. When demo day arrived, our system was completely unusable.

问题显而易见:我们一直依赖脆弱的字符串解析,寄希望于LLM始终以完全相同的格式响应。但在生产环境中,尤其是对于AI系统,用户总会输入你意想不到的内容。没有结构化输出,我们就无法进行数据验证、类型检查,也无法真正控制输出的格式。正如锁文件确保依赖项的一致性一样,结构化输出通过定义LLM响应的预期结构,确保了AI数据契约的一致性。

The problem was clear: we had been relying on fragile string parsing, hoping the LLM would always respond in the exact same format. But in production, especially with AI systems, users will always enter inputs you never expect. Without structured outputs, we had no data validation, no type checking, and no real control over how the output should look. Just as lock files ensure consistent dependencies, structured outputs ensure consistent AI data contracts by defining the expected structure for LLM responses.

在本系列的前一篇文章中,我们探讨了工作流与智能体之间的区别。现在,我们将解决一个根本性的挑战:如何从LLM中可靠地获取信息。

In the previous article of this series, we explored the difference between workflows and agents. Now, we will tackle a fundamental challenge: getting reliable information out of an LLM.

为了准确理解其原理,我们将首先从零开始编写代码,然后转向使用流行的LLM API,例如Gemini的GenAI SDK:

To understand exactly what happens, we will first write everything from scratch and then move to using popular LLM APIs such as Gemini's GenAI SDK:

  1. 使用JSON从零开始实现 (From scratch using JSON)
  2. 使用Pydantic从零开始实现 (From scratch using Pydantic (We love Pydantic!))
  3. 使用Gemini SDKPydantic (Using the Gemini SDK and Pydantic)

为什么需要结构化输出

Why Do We Need Structured Outputs?

在开始编码之前,理解为什么结构化输出是构建可靠AI应用的基础至关重要。当LLM返回一个自由格式的字符串时,你将面临解析它的繁琐任务。这通常涉及脆弱的正则表达式或字符串分割逻辑,如果模型的输出稍有变化,这些逻辑很容易失效。结构化输出通过强制模型的响应进入可预测的格式(如JSONPydantic)来解决这个问题。

Before we start coding, it is crucial to understand why structured outputs are foundational to building reliable AI applications. When an LLM returns a free-form string, you face the messy task of parsing it. This often involves fragile regular expressions or string-splitting logic that can easily break if the model outputs change slightly. Structured outputs solve this by forcing the model's response into a predictable format like JSON or Pydantic.

这种方法提供了几个关键优势。首先,结构化输出易于编程解析和操作。你不再需要处理原始文本,而是处理干净的Python对象,这使得你的代码更具可预测性且更易于调试。使用像Pydantic这样的库还增加了一层数据和类型验证。如果LLM在期望整数的地方返回了字符串,你的应用程序会立即引发清晰的验证错误,防止错误数据传播。

This approach offers several key benefits. First, structured outputs are easy to parse and manipulate programmatically. Instead of wrestling with raw text, you work with clean Python objects, making your code more predictable and easier to debug. Using libraries like Pydantic adds a layer of data and type validation. If the LLM returns a string where an integer is expected, your application raises a clear validation error immediately, preventing bad data from propagating.

此外,结构化输出使得工作流中各步骤之间的编排更加容易。当你清楚知道有哪些可用信息时,将其传递给下一个LLM调用或下游系统(如数据库或API)就会简单得多。这种控制还能降低成本。通过确保LLM只生成必要的数据,而不包含无用的内容(例如,“这是您请求的输出...”),你可以减少输出令牌的数量。

Furthermore, structured outputs are easier to orchestrate between steps in a workflow. When you know what information you have available, it is much simpler to pass it to the next LLM call or a downstream system like a database or API. This control also reduces costs. By ensuring the LLM generates only the necessary data without useless artifacts (e.g., "Here is the output you requested..."), you reduce the number of output tokens.

💡 快速提示:你可以通过接入一个LLMOps开源工具(如Opik)轻松计算运行工作流或智能体的成本。

💡 Quick Tip: You can easily compute the costs of running your workflows or agent by plugging in an LLMOps open-source tool such as Opik.

最终,结构化输出在LLM(软件3.0)和你的应用程序代码(软件1.0)之间创建了一个正式的契约。它们是AI工程中建模领域对象的标准方法,连接了LLM的概率性质与确定性代码。

Ultimately, structured outputs create a formal contract between the LLM (Software 3.0) and your application code (Software 1.0). They are the standard method for modeling domain objects in AI engineering, connecting the probabilistic nature of LLMs with deterministic code.

方法对比:从零实现到使用SDK

Method Comparison: From Scratch to SDK

为了理解现代LLM API(如OpenAI和Gemini)的内部工作原理,我们将首先从零开始实现结构化输出。我们的目标是提示模型返回一个JSON对象,然后将其解析为Python字典。我们将使用“LLM作为评判员”的评估作为示例,要求LLM根据预定义的标准,将生成的文本与真实文档进行比较并打分。这是一个很好的用例,因为它需要从大量上下文中提取特定的结构化信息。

To understand how modern LLM APIs such as OpenAI and Gemini work under the hood, we will first implement structured outputs from scratch. Our goal is to prompt a model to return a JSON object and then parse it into a Python dictionary. We will use an "LLM-as-judge" evaluation as our example, where we ask an LLM to compare a generated text against a ground-truth document and score it based on predefined criteria. This is a great use case, as it requires extracting specific, structured information from a large context.

下表总结了我们将要探讨的三种主要方法及其核心特点:

The following table summarizes the three main methods we will explore and their core characteristics:

方法 核心描述 优点 缺点 适用场景
Method Core Description Advantages Disadvantages Use Case
1. 从零开始(JSON 手动构建提示词,要求LLM返回JSON字符串,并使用json.loads()解析。 无需额外依赖,理解底层原理。 脆弱,易受格式错误影响,无自动验证。 快速原型,教学演示。
1. From Scratch (JSON) Manually craft prompts to ask the LLM to return a JSON string and parse it with json.loads(). No extra dependencies, understand the underlying principles. Fragile, prone to formatting errors, no automatic validation. Rapid prototyping, educational demos.
2. 从零开始(Pydantic 定义Pydantic模型,在提示词中嵌入其JSON Schema,解析后使用模型验证。 强大的类型验证和数据清洗,代码更健壮、可维护。 提示词更复杂,需要手动处理Schema嵌入。 需要强类型保证的生产应用。
2. From Scratch (Pydantic) Define a Pydantic model, embed its JSON Schema in the prompt, and validate the parsed result using the model. Strong type validation and data cleaning, more robust and maintainable code. More complex prompts, requires manual handling of schema embedding. Production applications requiring strong type guarantees.
3. 使用官方SDK(如Gemini) 利用SDK内置的generate_content(..., response_mime_type="application/json", response_schema=...)等功能。 开发体验最佳,原生支持结构化输出,可靠性和性能最高。 供应商锁定,依赖特定SDK的更新和维护。 追求开发效率、稳定性和性能的生产级项目。
3. Using Official SDK (e.g., Gemini) Utilize built-in SDK functions like generate_content(..., response_mime_type="application/json", response_schema=...). Best developer experience, native support for structured outputs, highest reliability and performance. Vendor lock-in, dependent on specific SDK updates and maintenance. Production-grade projects prioritizing development efficiency, stability, and performance.

方法一:从零开始使用JSON

Method 1: From Scratch Using JSON

首先,我们定义用于评估的示例文档。这些将作为我们LLM评判员的输入:

First, we define our sample documents for the evaluation. These will serve as the input for our LLM judge:

GENERATED_DOCUMENT = """
# Q3 2023 Financial Performance Analysis

The Q3 earnings report shows a 20% increase in revenue and a 15% growth in user engagement,
beating market expectations. These impressive results reflect our successful product strategy
and strong market positioning.

Our core business segments demonstrated remarkable resilience, with digital services leading
the growth at 25% year-over-year. The expansion into new markets has proven particularly
successful, contributing to 30% of the total revenue increase.

Customer acquisition costs decreased by 10% while retention rates improved to 92%,
marking our best performance to date. These metrics, combined with our healthy cash flow
position, provide a strong foundation for continued growth into Q4 and beyond.
"""

GROUND_TRUTH_DOCUMENT = """
# Q3 2023 Financial Performance Analysis

The Q3 earnings report shows a 18% increase in revenue and a 15% growth in user engagement,
slightly below market expectations. These results reflect our product strategy adjustments
and competitive market positioning challenges.

Our core business segments showed mixed performance, with digital services growing at
22% year-over-year. The expansion into new markets has been challenging, contributing
to only 15% of the total revenue increase.

Customer acquisition costs increased by 5% while retention rates remained at 88%,
indicating areas for improvement. These metrics, combined with our cash flow position,
suggest we need strategic adjustments for Q4 growth.
"""

接下来,我们设计一个提示词,指示LLM根据真实文档评估生成的文档,并将输出格式化为JSON。我们提供了所需结构的清晰示例,并使用像<document>这样的XML标签来分隔输入和指令。这是一种有效的提示工程技术,可以提高清晰度并引导模型的输出。

Next, we craft a prompt that instructs the LLM to evaluate the generated document against the ground truth and format the output as JSON. We provide a clear example of the desired structure and use XML tags like <document> to separate inputs from instructions. This is an effective prompt engineering technique for improving clarity and guiding the model's output.

(文章后续部分将详细展开代码实现、Pydantic方法详解以及使用Gemini SDK的最佳实践。)

(The subsequent parts of the article will elaborate on the code implementation, detailed explanation of the Pydantic method, and best practices using the Gemini SDK.)

关键要点与后续方向

Key Takeaways and Next Steps

结构化输出并非可选项,而是构建健壮、可维护且高效AI系统的必需品。它们是你代码与LLM概率世界之间的关键桥梁。

Structured outputs are not optional; they are essential for building robust, maintainable, and efficient AI systems. They are the critical bridge between your code and the probabilistic world of LLMs.

在本系列的下一篇文章中,我们将深入探讨 《五种AI工作流模式》,学习如何将像结构化输出这样的基础模块组合成能够解决复杂现实任务的强大AI应用。

In the next article of this series, we will dive into 《The 5 AI Workflow Patterns》, learning how to compose foundational building blocks like structured outputs into powerful AI applications capable of solving complex real-world tasks.

让我们继续构建。

Let's continue building.

常见问题(FAQ)

LLM结构化输出具体有什么好处?

结构化输出让LLM的响应变成可预测的格式(如JSON),便于程序解析和验证,防止因输出格式不一致导致下游流程崩溃,是构建可靠AI应用的基础。

如何实现LLM的结构化输出

主要有三种方法:从零开始使用JSON定义结构;使用Pydantic进行数据验证和类型检查;或直接利用像Gemini SDK这样的工具,它们内置了对结构化输出的支持。

为什么生产环境必须用结构化输出

生产环境中用户输入不可预测,依赖正则表达式等字符串解析非常脆弱,容易因响应格式细微变化而失效。结构化输出通过强制数据契约,确保系统稳定可靠。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。

您可能感兴趣