什么是RAG检索增强生成？2026年AI大模型优化技术详解

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is a technique for optimizing the output of a large language model (LLM) by enabling it to reference an authoritative knowledge base outside of its training data sources before generating a response. Large language models are trained on vast datasets, using billions of parameters to generate original outputs for tasks like answering questions, translating languages, and completing sentences. Building on the already powerful capabilities of LLMs, RAG extends them to access domain-specific or organizational internal knowledge bases, all without the need to retrain the model. It is a cost-effective method for improving LLM output, ensuring it remains relevant, accurate, and useful across various contexts.

检索增强生成（RAG）是一种优化大语言模型输出的技术，使其能够在生成响应之前，引用其训练数据来源之外的权威知识库。大语言模型使用海量数据进行训练，通过数十亿参数来生成回答问题、翻译语言和完成句子等任务的原始输出。在LLM本就强大的功能基础上，RAG将其能力扩展至能访问特定领域或组织的内部知识库，且无需重新训练模型。这是一种经济高效地改进LLM输出的方法，确保其在各种情境下都能保持相关性、准确性和实用性。

Why is Retrieval-Augmented Generation Important?

LLMs are a key artificial intelligence (AI) technology that powers intelligent chatbots and other natural language processing (NLP) applications. The goal is to create bots capable of answering user questions in various contexts by cross-referencing authoritative knowledge sources. Unfortunately, the nature of LLM technology introduces unpredictability into their responses. Furthermore, LLM training data is static, imposing a cutoff date on the knowledge they possess.

Known challenges faced by LLMs include:

Providing false information when an answer is not known. (在不知道答案的情况下提供虚假信息。)
Offering outdated or overly broad information when users need specific, up-to-date responses. (在用户需要具体、最新的响应时，提供过时或过于宽泛的信息。)
Creating responses based on non-authoritative sources. (依据非权威来源生成响应。)
Generating inaccurate responses due to terminological confusion, where different training sources use the same term to discuss different things. (由于术语混淆——不同的训练来源使用相同的术语指代不同事物——而产生不准确的响应。)

You can think of a large language model as an over-eager new employee who refuses to stay current with world events but answers every question with absolute confidence. Unfortunately, this attitude can negatively impact user trust—something you wouldn't want your chatbot to emulate!

RAG is an approach to solving some of these challenges. It redirects the LLM to retrieve relevant information from authoritative, pre-determined knowledge sources. Organizations gain better control over the generated text output, and users gain insight into how the LLM arrived at its response.

大语言模型是一项关键的人工智能技术，为智能聊天机器人和其他自然语言处理应用程序提供动力。其目标是创建能够通过交叉引用权威知识源，在各种环境中回答用户问题的机器人。然而，LLM技术的本质给其响应带来了不可预测性。此外，LLM的训练数据是静态的，这为其所掌握的知识设定了一个截止日期。

LLM面临的已知挑战包括：

在不知道答案的情况下提供虚假信息。

在用户需要具体、最新响应时，提供过时或过于宽泛的信息。

依据非权威来源生成响应。

由于术语混淆（不同的训练来源使用相同术语指代不同事物）而产生不准确的响应。

你可以将大语言模型想象成一个过于热情的新员工，他拒绝了解时事，却总是以绝对的自信回答每一个问题。不幸的是，这种态度会对用户信任产生负面影响——这绝不是你希望聊天机器人效仿的！

RAG是解决其中部分挑战的一种方法。它会引导LLM从权威的、预先确定的知识源中检索相关信息。组织可以更好地控制生成的文本输出，用户也能深入了解LLM生成响应的依据。

What are the Benefits of Retrieval-Augmented Generation?

RAG technology offers several benefits for an organization's generative AI initiatives.

Cost-Effective Implementation

Chatbot development typically begins with a foundation model. A foundation model is an API-accessible LLM trained on a broad spectrum of generalized, unlabeled data. The computational and financial cost of retraining a foundation model for organization- or domain-specific information is high. RAG provides a more cost-effective method for introducing new data to an LLM. It makes generative AI technology more accessible and usable.

聊天机器人开发通常从基础模型开始。基础模型是在广泛的、未标记的广义数据上训练的、可通过API访问的大语言模型。针对组织或领域特定信息重新训练基础模型的计算和财务成本非常高。RAG提供了一种更经济高效的方法，将新数据引入LLM。它使生成式人工智能技术更易于获取和使用。

Current Information

Even if the original training data sources for an LLM are suitable for your needs, maintaining relevance is challenging. RAG allows developers to provide the generative model with the latest research, statistics, or news. They can use RAG to connect the LLM directly to real-time social media feeds, news websites, or other frequently updated information sources. The LLM can then provide users with up-to-date information.

即使LLM的原始训练数据源符合你的需求，保持其相关性也是一项挑战。RAG允许开发者为生成模型提供最新的研究、统计数据或新闻。他们可以使用RAG将LLM直接连接到实时社交媒体源、新闻网站或其他频繁更新的信息源。这样，LLM就能向用户提供最新信息。

Enhanced User Trust

RAG allows the LLM to present accurate information with source attribution. Outputs can include citations or references to sources. Users can also look up the source documents themselves if they need further clarification or more detailed information. This can increase trust and confidence in your generative AI solution.

RAG允许LLM通过来源归属来呈现准确信息。输出可以包含对来源的引用或引文。如果需要进一步说明或更详细的信息，用户也可以自行查阅源文档。这可以增强用户对你的生成式人工智能解决方案的信任和信心。

Greater Developer Control

With RAG, developers can more efficiently test and improve their chat applications. They can control and change the LLM's information sources to adapt to evolving needs or cross-functional use. Developers can also restrict the retrieval of sensitive information to different authorization levels and ensure the LLM generates appropriate responses. Furthermore, they can troubleshoot and fix issues if the LLM cites an incorrect information source for a specific question. Organizations can implement generative AI technology more confidently for a wider range of applications.

借助RAG，开发者可以更高效地测试和改进他们的聊天应用程序。他们可以控制和更改LLM的信息源，以适应不断变化的需求或跨职能使用。开发者还可以将敏感信息的检索限制在不同的授权级别内，并确保LLM生成恰当的响应。此外，如果LLM针对特定问题引用了错误的信息源，他们可以进行故障排除并修复。组织可以更有信心地将生成式人工智能技术应用于更广泛的场景。

How Does Retrieval-Augmented Generation Work?

Without RAG, an LLM takes a user input and creates a response based on its training information—what it already knows. With RAG, an information retrieval component is introduced that uses the user input to first extract information from new data sources. Both the user query and the relevant information are provided to the LLM. The LLM uses this new knowledge along with its training data to create a better response. The following sections outline the process.

Creating External Data

New data outside the LLM's original training dataset is called external data. It can come from multiple data sources, such as APIs, databases, or document repositories. The data may exist in various formats, like files, database records, or long-form text. Another AI technique, called an embedding language model, converts the data into a numerical representation and stores it in a vector database. This process creates a knowledge library that the generative AI model can understand.

LLM原始训练数据集之外的新数据称为外部数据。它可以来自多个数据源，例如API、数据库或文档存储库。数据可能以各种格式存在，如文件、数据库记录或长篇文本。另一种称为嵌入语言模型的人工智能技术将数据转换为数字表示形式，并将其存储在向量数据库中。这个过程创建了一个生成式人工智能模型能够理解的知识库。

Retrieving Relevant Information

The next step is to perform a relevance search. The user query is converted into a vector representation and matched against the vector database. For example, consider an intelligent chatbot that can answer an organization's human resources questions. If an employee searches for "How much annual leave do I have?", the system will retrieve the annual leave policy document along with the employee's personal past leave records. These specific documents, highly relevant to the employee's input, are returned. Relevance is calculated and established using mathematical vector computations and representations.

下一步是执行相关性搜索。用户查询被转换为向量表示，并与向量数据库进行匹配。例如，考虑一个能够回答组织人力资源问题的智能聊天机器人。如果员工搜索*"我有多少天年假？"*，系统将检索年假政策文档以及该员工个人过去的休假记录。这些与员工输入高度相关的特定文档将被返回。相关性是通过数学向量计算和表示法来确立的。

Augmenting the LLM Prompt

Next, the RAG model augments the user input (or prompt) by adding the retrieved relevant data in context. This step uses prompt engineering techniques to communicate effectively with the LLM. The augmented prompt allows the large language model to generate an accurate answer to the user query.

接下来，RAG模型通过添加上下文中的检索到的相关数据来增强用户输入。此步骤使用提示工程技术来与LLM进行有效沟通。增强后的提示使大语言模型能够为用户查询生成准确的答案。

Updating External Data

The next question might be—what if the external data becomes outdated? To maintain current information for retrieval, update the documents asynchronously and update their embedded representations. You can do this through automated real-time processes or regular batch processing. This is a common challenge in data management—different data science approaches can be used for change management.

接下来的问题可能是——如果外部数据过时了怎么办？为了维护可供检索的最新信息，需要异步更新文档并更新其嵌入表示。你可以通过自动化的实时流程或定期批处理来执行此操作。这是数据管理中常见的挑战——可以使用不同的数据科学方法进行变更管理。

The diagram below shows the conceptual flow of using RAG with an LLM.

Diagram showing the RAG process: User Query -> Retrieval (from External Data/Vector DB) -> Augmented Prompt -> LLM -> Final Response

下图展示了将RAG与LLM配合使用的概念流程。

What is the Difference Between Retrieval-Augmented Generation and Semantic Search?

Semantic search can refine RAG results and is suitable for organizations wanting to add extensive external knowledge sources to their LLM applications. Modern enterprises store vast amounts of information across various systems, such as manuals, FAQs, research reports, customer service guides, and HR document repositories. Contextual retrieval at scale is challenging and can degrade the quality of generated output.

Semantic search techniques can scan large databases containing disparate information and retrieve data more accurately. For example, they can answer questions like "How much was spent on mechanical repairs last year?" by mapping the question to relevant documents and returning specific text rather than search results. Developers can then use that answer to provide more context to the LLM.

Traditional or keyword-search solutions within RAG yield limited results for knowledge-intensive tasks. Developers must also handle complexities like word embeddings and document chunking when preparing data manually. In contrast, semantic search techniques handle all the knowledge base preparation work, so developers don't have to. They also generate semantically relevant passages and tokenized words ranked by relevance to maximize the quality of the RAG payload.

语义搜索可以优化RAG结果，适用于希望为其LLM应用程序添加大量外部知识源的组织。现代企业在各种系统中存储着海量信息，例如手册、常见问题解答、研究报告、客户服务指南和人力资源文档库等。大规模地进行上下文检索具有挑战性，并可能降低生成输出的质量。

语义搜索技术可以扫描包含不同信息的大型数据库，并更准确地检索数据。例如，它们可以通过将问题映射到相关文档并返回特定文本（而非搜索结果）来回答诸如*"去年在机械维修上花费了多少钱？"*之类的问题。然后，开发者可以使用该答案为LLM提供更多上下文。

RAG中传统的或基于关键字的搜索解决方案对于知识密集型任务产生的效果有限。开发者在手动准备数据时还必须处理词嵌入、文档分块等复杂问题。相比之下，语义搜索技术完成了所有知识库准备工作，因此开发者无需亲自动手。它们还生成语义相关的段落和按相关性排序的标记化词汇，以最大化RAG有效载荷的质量。

How Does AWS Support Your Retrieval-Augmented Generation Needs?

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models and a broad set of capabilities for building generative AI applications, simplifying development while maintaining privacy and security. With Amazon Bedrock Knowledge Bases, you can connect foundation models to your RAG data sources in just a few clicks. Vector transformation, retrieval, and augmented output generation are all handled automatically.

For organizations managing their own RAG, Amazon Kendra is a highly accurate, machine learning-powered enterprise search service. It provides an optimized Kendra Retrieve API that you can use with Amazon Kendra's high-accuracy semantic ranker as an enterprise-grade retriever for your RAG workflow. For example, using the Retrieve API, you can:

Retrieve up to 100 semantically relevant passages, each containing up to 200 tokens, ranked by relevance. (检索多达100个语义相关的段落，每个段落最多包含200个标记，按相关性排序。)
Connect to common data technologies using pre-built connectors, such as Amazon Simple Storage Service (Amazon S3), SharePoint, Confluence, and other websites. (使用预构建的连接器连接到常用数据技术，例如Amazon Simple Storage Service、SharePoint、Confluence和其他网站。)
Support multiple document formats, such as HTML, Word, PowerPoint, PDF, Excel, and text files. (支持多种文档格式，如HTML、Word、PowerPoint、PDF、Excel和文本文件。)
Filter responses based on documents permitted by end-user permissions. (根据最终用户权限允许的文档筛选响应。)

Amazon also provides options for organizations wanting to build more custom generative AI solutions. Amazon SageMaker JumpStart is a machine learning hub that includes foundation models, built-in algorithms, and pre-built machine learning solutions that you can deploy with just a few clicks. You can accelerate RAG implementation by referencing existing SageMaker notebooks and code examples.

Amazon Bedrock是一项完全托管的服务，提供多种高性能基础模型以及用于构建生成式AI应用程序的广泛功能集，在简化开发的同时维护隐私和安全。借助Amazon Bedrock的知识库功能，你只需点击几下即可将基础模型连接到你的RAG数据源。向量转换、检索和增强输出生成均自动处理。

对于自行管理RAG的组织，Amazon Kendra是一项由机器学习驱动的高精度企业搜索服务。它提供了一个优化的Kendra检索API，你可以将其与Amazon Kendra的高精度语义排序器一起用作RAG工作流的企业级检索器。例如，使用检索API，你可以：

检索多达100个语义相关的段落，每个段落最多包含200个标记，按相关性排序。

使用预构建的连接器连接到常用数据技术，例如Amazon Simple Storage Service、SharePoint、Confluence和其他网站。

支持多种文档格式，如HTML、Word、PowerPoint、PDF、Excel和文本文件。

根据最终用户权限允许的文档筛选响应。

亚马逊还为希望构建更定制化生成式AI解决方案的组织提供了选项。Amazon SageMaker JumpStart是一个机器学习中心，包含基础模型、内置算法和预构建的机器学习解决方案，只需点击几下即可部署。你可以通过参考现有的SageMaker笔记本和代码示例来加速RAG的实施。

Create a free AWS account to get started with Retrieval-Augmented Generation on AWS.

立即创建免费AWS账户，开始在AWS上使用检索增强生成。

常见问题（FAQ）

RAG技术如何帮助大语言模型获取最新信息？

RAG允许模型在生成回答前，从外部知识库检索最新数据，解决了LLM训练数据静态、信息可能过时的问题，确保回答的时效性。

使用RAG相比重新训练模型有哪些成本优势？

RAG无需重新训练基础模型，只需引导模型访问特定知识库，大幅降低了计算和财务成本，是实现领域知识集成的经济高效方案。

RAG如何提高AI回答的准确性和可信度？

RAG强制模型在生成前检索权威知识源，避免编造虚假信息或依赖非权威数据，从而提升回答的准确性并增强用户信任。