RAG如何提升大语言模型的准确性？（附核心概念解析）

Introduction

Without RAG, the LLM takes the user input and creates a response based on information it was trained on—or what it already knows. With RAG, an information retrieval component is introduced that utilizes the user input to first pull information from a new data source. The user query and the relevant information are both given to the LLM. The LLM uses the new knowledge and its training data to create better responses. The following sections provide an overview of the process.

在没有RAG的情况下，大语言模型（LLM）接收用户输入，并基于其训练数据（即其已知信息）生成响应。而引入RAG后，系统增加了一个信息检索组件，该组件利用用户输入首先从一个新的数据源中提取信息。随后，将用户查询和相关检索到的信息一并提供给LLM。LLM结合这些新知识和其训练数据，生成更优质的响应。以下章节将概述这一过程。

Key Concepts and Process Flow

Create External Data

The new data outside of the LLM's original training data set is called external data. It can come from multiple data sources, such as a APIs, databases, or document repositories. The data may exist in various formats like files, database records, or long-form text. Another AI technique, called embedding language models, converts data into numerical representations and stores it in a vector database. This process creates a knowledge library that the generative AI models can understand.

存在于LLM原始训练数据集之外的新数据被称为外部数据。它可以来自多个数据源，例如API、数据库或文档库。数据可能以各种格式存在，如文件、数据库记录或长文本。另一种称为嵌入语言模型的AI技术，将数据转换为数值表示形式，并存储到向量数据库中。这一过程创建了一个生成式AI模型能够理解的“知识库”。

Retrieve Relevant Information

The next step is to perform a relevancy search. The user query is converted to a vector representation and matched with the vector databases. For example, consider a smart chatbot that can answer human resource questions for an organization. If an employee searches, "How much annual leave do I have?" the system will retrieve annual leave policy documents alongside the individual employee's past leave record. These specific documents will be returned because they are highly-relevant to what the employee has input. The relevancy was calculated and established using mathematical vector calculations and representations.

下一步是执行相关性搜索。用户查询被转换为向量表示，并与向量数据库进行匹配。例如，考虑一个能为组织回答人力资源问题的智能聊天机器人。如果员工查询*“我有多少天年假？”*，系统将检索年假政策文件以及该员工过往的休假记录。这些特定文档之所以被返回，是因为它们与员工的输入高度相关。相关性是通过数学向量计算和表示来判定和建立的。

Augment the LLM Prompt

Next, the RAG model augments the user input (or prompts) by adding the relevant retrieved data in context. This step uses prompt engineering techniques to communicate effectively with the LLM. The augmented prompt allows the large language models to generate an accurate answer to user queries.

接着，RAG模型通过将检索到的相关数据添加上下文来增强用户输入（或提示）。此步骤运用提示工程技术，以有效地与LLM进行沟通。增强后的提示使得大语言模型能够针对用户查询生成准确的答案。

Update External Data

The next question may be—what if the external data becomes stale? To maintain current information for retrieval, asynchronously update the documents and update embedding representation of the documents. You can do this through automated real-time processes or periodic batch processing. This is a common challenge in data analytics—different data-science approaches to change management can be used.

接下来的问题可能是——如果外部数据过时了怎么办？为了保持检索信息的时效性，需要异步更新文档并更新文档的嵌入表示。这可以通过自动化的实时流程或定期的批处理来实现。这是数据分析中的一个常见挑战——可以采用不同的数据科学方法来进行变更管理。

Conceptual Architecture

The following diagram shows the conceptual flow of using RAG with LLMs.

下图展示了将RAG与LLM结合使用的概念流程。

RAG with LLMs Conceptual Flow

常见问题（FAQ）

RAG技术具体是如何工作的？

RAG通过检索组件从外部数据源（如数据库、文档库）获取相关信息，结合用户查询一起输入给LLM，使其基于新知识和训练数据生成更准确的回答。

RAG如何确保检索到的信息是最新的？

通过异步更新外部文档并重新计算其向量表示，可采用实时或批处理方式，确保知识库的时效性，避免数据过时影响回答准确性。

RAG相比传统LLM有哪些优势？

RAG能减少LLM的“幻觉”现象，通过引入外部数据提升回答的准确性和针对性，特别适合需要实时或特定领域知识的应用场景。

AI Summary (BLUF)