RAG检索增强生成技术如何提升大语言模型的准确性？

Q: RAG技术如何解决大模型的幻觉问题？

RAG通过检索外部知识源（如向量数据库）为生成阶段提供准确参考，使大模型基于事实数据而非仅凭训练记忆生成答案，从而有效减少虚构或错误信息。

Q: RAG应用流程包含哪些关键阶段？

主要分为数据准备阶段（数据提取、文本分割、向量化、入库）和应用阶段（用户查询、数据检索、提示词注入、大模型生成答案），实现从知识存储到智能问答的闭环。

Retrieval-Augmented Generation (RAG) represents a significant innovation in the fields of machine learning and natural language processing. It not only signifies technological advancement but also demonstrates remarkable potential in practical applications.

检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（Retrieval-Augmented Generation，简称RAG）是机器学习和自然语言处理领域的一项重大创新。它不仅代表了技术的进步，更在实际应用中展现了其惊人的潜力。

RAG combines two core technologies: retrieval and generation. Through this unique hybrid mechanism, it can provide more accurate and richer information when handling complex queries and generation tasks. Whether answering intricate questions or creating compelling narratives, RAG showcases its formidable capabilities.

RAG结合了检索（Retrieval）和生成（Generation）两大核心技术。通过这种独特的混合机制，它能够在处理复杂的查询和生成任务时，提供更加准确、丰富的信息。无论是在回答复杂的问题，还是在创作引人入胜的故事，RAG都展现了其不可小觑的能力。

What is RAG?

Retrieval-Augmented Generation (RAG) has become one of the most popular application frameworks for Large Language Models (LLMs). It is a concept that provides external knowledge sources to LLMs, enabling them to generate accurate and contextually relevant answers while mitigating model "hallucinations."

检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（Retrieval Augmented Generation，简称 RAG），已经成为当前最火热的LLM应用方案之一。它是一个为大模型提供外部知识源的概念，这使它们能够生成准确且符合上下文的答案，同时能够减少模型幻觉大语言模型生成看似合理但实际错误或虚构信息的情况，是GEO需要规避的重要风险。。

However, when applying foundational LLMs to real-world business scenarios, we often find that generic models cannot meet practical needs. The primary reasons include:

Knowledge Limitations: A model's knowledge is entirely derived from its training data. Mainstream LLMs (e.g., ChatGPT, Wenxin Yiyan, Tongyi Qianwen) are primarily trained on publicly available web data. They cannot access real-time, non-public, or offline data, leaving them without knowledge in these areas.

知识的局限性：模型自身的知识完全源于它的训练数据。现有的主流大模型训练集基本构建于网络公开数据，对于一些实时性的、非公开的或离线的数据是无法获取到的。
Hallucination Problem: The underlying principle of all AI models is based on mathematical probability; their outputs are essentially a series of numerical operations, and LLMs are no exception. They can sometimes generate plausible but incorrect information, especially in areas where they lack knowledge or expertise. Distinguishing these hallucinations is challenging as it requires users to possess domain-specific knowledge.

幻觉问题：AI模型的底层原理基于数学概率，其输出实质是一系列数值运算。大模型有时会生成看似合理实则错误的信息，尤其在它不具备知识的领域。区分这种幻觉对使用者自身知识有要求。
Data Security: For enterprises, data security is paramount. No company is willing to risk data leakage by uploading its private data to third-party platforms for training. This forces application solutions relying solely on generic LLMs to make trade-offs between data security and effectiveness.

数据安全性：对企业而言，数据安全至关重要。将私域数据上传至第三方平台进行训练存在泄露风险，导致完全依赖通用大模型的方案不得不在安全与效果间取舍。

RAG provides an effective framework to address the challenges mentioned above.

RAG是解决上述问题的一套有效方案。

RAG Architecture

In simple terms, RAG retrieves relevant knowledge and incorporates it into a prompt, allowing the LLM to reference this knowledge to generate reasonable answers. Therefore, the core of RAG can be understood as "Retrieval + Generation." The former primarily leverages the efficient storage and retrieval capabilities of vector databases to recall target knowledge, while the latter utilizes LLMs and prompt engineering to effectively employ the recalled knowledge to generate the final answer.

简而言之，RAG通过检索获取相关知识并将其融入提示词（Prompt），让大模型能够参考这些知识从而给出合理回答。因此，RAG的核心可理解为“检索+生成”。前者主要利用向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.的高效存储和检索能力召回目标知识；后者则是利用大模型和提示工程，将召回的知识合理利用以生成答案。

A complete RAG application workflow consists of two main phases:

完整的RAG应用流程主要包含两个阶段：

Data Preparation Phase: Data Extraction >> Text Splitting >> Vectorization (Embedding) >> Data Ingestion

数据准备阶段：数据提取 >> 文本分割 >> 向量化将文本数据（如文档片段）转换为向量表示的过程，以便存储到向量数据库中进行高效检索。 >> 数据入库
Application Phase: User Query >> Data Retrieval (Recall) >> Prompt Injection >> LLM Answer Generation

应用阶段：用户提问 >> 数据检索（召回） >> 注入Prompt >> LLM生成答案

Data Preparation Phase

Data preparation is typically an offline process involving the vectorization of private domain data, index construction, and database storage. Key steps include data extraction, text splitting, vectorization, and data ingestion.

数据准备一般是一个离线的过程，主要是将私域数据向量化将文本数据（如文档片段）转换为向量表示的过程，以便存储到向量数据库中进行高效检索。后构建索引并存入数据库。主要包括：数据提取、文本分割、向量化将文本数据（如文档片段）转换为向量表示的过程，以便存储到向量数据库中进行高效检索。、数据入库等环节。

Data Extraction
This involves:

Data Loading: Handling multi-format data and different data sources, processing data into a unified schema.

数据加载：包括多格式数据加载、不同数据源获取等，将数据处理为统一范式。
Data Processing: Includes data filtering, compression, and formatting.

数据处理：包括数据过滤、压缩、格式化等。
Metadata Acquisition: Extracting key information from the data, such as filename, title, timestamp, etc.

元数据获取：提取数据中的关键信息，如文件名、标题、时间等。

Text Splitting
Text segmentation primarily considers two factors:

The token limit of the embedding model.
The impact of semantic integrity on overall retrieval effectiveness.

1）嵌入模型的Tokens限制情况；2）语义完整性对检索效果的影响。

Common text splitting methods include:

Sentence Splitting: Segmenting at the "sentence" granularity to preserve complete semantic units. Common delimiters include periods, exclamation marks, question marks, and line breaks.

句分割：以“句”为粒度进行切分，保留完整语义。常见切分符包括句号、感叹号、问号、换行符等。
Fixed-Length Splitting: Dividing text into fixed-length chunks (e.g., 256/512 tokens) based on the embedding model's limit. This method may lose semantic information and is often mitigated by adding some overlap (redundancy) between chunks.

固定长度分割：根据嵌入模型的token长度限制，将文本分割为固定长度。这种方式可能损失语义，通常通过在块首尾增加重叠来缓解。

Vectorization (Embedding)
Vectorization is the process of converting text data into a vector matrix, which directly impacts subsequent retrieval performance. Common embedding models are listed in the table below. While these models suffice for most needs, fine-tuning open-source models or training custom models is recommended for specialized scenarios (e.g., involving rare proprietary terms).

向量化将文本数据（如文档片段）转换为向量表示的过程，以便存储到向量数据库中进行高效检索。是将文本数据转化为向量矩阵的过程，该过程会直接影响后续检索的效果。下表展示了常见的嵌入模型。对于特殊场景或追求更优效果，可以选择微调开源模型或训练定制模型。

Model Name	Provider / Type	Key Characteristics
text-embedding-ada-002	OpenAI	General-purpose, widely adopted
BGE (BAAI)	Open Source	Strong performance on MTEB benchmarks
Sentence Transformers	Open Source	Rich model library, easy fine-tuning
Cohere Embed	Cohere	Multilingual support, enterprise-focused

Data Ingestion
This process involves building indexes from vectorized data and writing them to a database. Databases suitable for RAG scenarios include FAISS, ChromaDB, Elasticsearch, and Milvus. The choice depends on business requirements, hardware, performance needs, and other factors.

数据向量化将文本数据（如文档片段）转换为向量表示的过程，以便存储到向量数据库中进行高效检索。后构建索引并写入数据库的过程称为数据入库。适用于RAG的数据库包括FAISS、ChromaDB、Elasticsearch、Milvus等。选择时需综合考虑业务场景、硬件及性能需求。

Application Phase

In the application phase, based on the user's query, relevant knowledge is recalled through efficient retrieval methods and integrated into the prompt. The LLM then references both the query and the provided knowledge to generate the answer. Key steps include data retrieval and prompt injection.

在应用阶段，根据用户的提问，通过高效的检索方法召回最相关的知识并融入提示词，大模型参考当前提问和相关知识生成答案。关键环节包括数据检索和提示词注入。

Data Retrieval
Common data retrieval methods include similarity search and full-text search. Combining multiple methods can improve recall rates.

Similarity Search: Calculates the similarity score between the query vector and all stored vectors, returning records with high scores. Common similarity metrics include cosine similarity, Euclidean distance, and Manhattan distance.

相似性检索：计算查询向量与所有存储向量的相似性得分，返回高分记录。常见方法包括余弦相似性、欧氏距离等。
Full-Text Search: A classic retrieval method. During ingestion, an inverted index is built using keywords. During retrieval, keywords are used to perform a full-text search to find corresponding records.

全文检索：一种经典检索方式。入库时通过关键词构建倒排索引；检索时通过关键词进行全文检索找到对应记录。

Prompt Injection
The prompt, as the direct input to the LLM, is a key factor influencing output accuracy. In RAG scenarios, a prompt typically includes a task description, background knowledge (retrieved), and task instructions (the user query). Additional instructions can be added based on the task and model performance. A simple example for a Q&A scenario is shown below:

提示词作为大模型的直接输入，是影响输出准确率的关键因素之一。在RAG中，提示词通常包括任务描述、背景知识（检索得到）和任务指令（用户提问）。可根据场景和模型性能添加其他指令。一个简单的知识问答提示词示例如下：

【任务描述】假如你是一个专业的客服机器人，请参考【背景知识】，回答用户问题。
【背景知识】 {retrieved_content} // 数据检索得到的相关文本
【问题】石头扫地机器人P10的续航时间是多久？

Prompt design is more art than science, relying heavily on experience. In practice, iterative prompt tuning is often required based on the model's actual outputs.

提示词设计依赖于经验，没有固定语法。实际应用中，往往需要根据大模型的输出进行针对性的调优。

RAG Classification and Evolution

The iterative evolution of RAG can be divided into three main stages: Naive RAG, Advanced RAG, and Modular RAG. Their distinctions are summarized below.

RAG的迭代升级主要分为三个阶段：原始RAG、高级RAG和模块化RAG。三者的主要区别如下。

Stage	Core Characteristics	Focus
Naive RAG	Basic components: Indexing, Retrieval, Generation.	Establishing foundational RAG functionality.
Advanced RAG	Introduces optimizations within the three core stages (pre/post-retrieval, generation).	Improving retrieval quality and overall system performance.
Modular RAG	Extends simple components into complex, independent modules (e.g., search, memory, alignment).	Handling complex requirements through flexible, modular design.

Naive RAG

Naive RAG typically consists of three simple components: Indexing, Retrieval, and Generation.

Indexing: An offline process involving preprocessing and converting raw documents (parsing, chunking, embedding) for efficient retrieval.
Retrieval: The online process of querying the built index, often using vector similarity.
Generation: Assembling the retrieval results with the user query and passing them to the LLM for a controlled response.

Due to its simplicity, Naive RAG often faces challenges in achieving optimal results:

Retrieval Quality: Retrieved content may not be suitable for the query due to issues like poor chunking, low knowledge coverage, or suboptimal embeddings.
Response Quality: Includes but is not limited to hallucinations. While RAG constrains arbitrary generation, it doesn't guarantee it. The model may still fabricate, refuse to answer, or ignore retrieved content.
Augmentation Process Issues: This refers to the synergy between retrieval and generation. Inappropriate documents or a weak model can lead to failure—the LLM might be misled by retrieval results or influenced by repetitive information, generating incoherent, redundant, or irrelevant answers. Conversely, over-reliance on retrieved content can overly restrict responses.

Advanced RAG

Advanced RAG introduces finer optimizations across indexing, retrieval, and generation to address the limitations of Naive RAG.

Indexing/Pre-Retrieval Optimizations: Focus on improving the quality of indexed content.
- Enhanced Data Granularity: Clean and standardize data to prevent information dilution during vectorization. Control factors like accuracy, contextual coherence, and timeliness.
- Optimized Index Structure: Adjust chunk sizes, modify storage paths, or incorporate graph structures.
- Metadata Augmentation: Add metadata (e.g., modification time, purpose) to documents/chunks for more flexible and timely retrieval.
- Alignment Optimization: Use LLMs to generate potential questions from documents, aligning document semantics with likely queries to improve retrieval similarity.
- Hybrid Retrieval: Combine multiple retrieval methods (e.g., vector, keyword, semantic) to improve recall.
Retrieval Optimization: Focus on the embedding model.
- Fine-tuning: Fine-tune embedding models on domain-specific or task-specific data for better performance.
- Dynamic Embeddings: Utilize context-aware embeddings (standard in modern models like BERT) where word representations change based on context.
Generation/Post-Retrieval Optimization: Process retrieved results before generation.
- Re-ranking: Use models (e.g., BGE Reranker) to reorder retrieved passages based on relevance, diversity, or optimal positioning within the context window.
- Prompt Compression: Reduce noise in the prompt by compressing or highlighting key retrieved passages before feeding to the LLM.

Further optimizations for the entire RAG pipeline include:

Recursive Retrieval: Retrieve based on similarity to small chunks but return larger surrounding chunks for context.
Hypothetical Document Embeddings (HyDE): Instruct the LLM to generate a hypothetical answer to the query, then use the embedding of that answer for retrieval, bringing the query closer to the document's semantic space.
Sub-queries/Query Decomposition: Break down complex queries into simpler sub-queries.
Step-Back Prompting: Use prompt engineering to guide the LLM through a reasoning process.

Modular RAG

Modular RAG represents an evolution of Advanced RAG, integrating numerous optimization strategies into independent, well-defined modules. Key modules include:

Search Module: Incorporates components from classic search systems like query rewriting, intent recognition, entity retrieval, and multi-stage recall. Handles diverse document formats (tables, formulas).
Memory Module: Leverages the LLM's inherent memory or external memory mechanisms to reference past interactions.
Additional Generation Module: Provides generation capabilities beyond the final answer, such as summarizing or denoising retrieved documents.
Task Adaptation Module: Adapts the RAG system for different downstream tasks (e.g., classification).
Alignment Module: Explicitly aligns the semantic spaces of queries and documents to improve similarity calculation.
Validation Module: Acts as post-retrieval processing to verify the relevance of retrieved documents to the query.

New modules can be integrated in two primary ways: direct addition/replacement, or by adjusting the collaborative workflow between modules, especially the interaction between the LLM and the retrieval system. Concepts like Self-RAG, which introduces an active judgment module, exemplify this modular, interactive approach.

Core Value and Challenges of RAG

Key Benefits

RAG offers several compelling advantages for LLM applications:

Maintaining Knowledge Freshness: Equips LLMs with the ability to access and reference the latest information, acting as "eyes" on the real world.

保持知识更新：为LLM配备查阅最新资讯的能力，如同为其安装观察现实世界的“眼睛”。
Providing Domain Expertise: Functions like a skilled librarian, rapidly retrieving precise information from specialized knowledge bases.

提供专业知识：像熟练的图书管理员，能迅速从专业书籍中找到最相关的答案。
Securing Private Knowledge: Enables the use of proprietary, sensitive corporate data (knowledge bases, contracts) without uploading it to external LLMs, addressing critical data security concerns.

保障私有知识安全：允许使用企业私有的敏感数据，而无需上传至外部大模型，解决了关键的数据安全问题。
Increasing Trustworthiness: Allows the system to cite the sources of its information, enhancing transparency and credibility.

增加可信度：系统能够提供信息来源，增强了透明度和可信度。
Reducing Hallucinations: Effectively mitigates LLM hallucinations by grounding responses in retrieved, external knowledge sources.

减少大模型“幻觉”：通过基于检索到的外部知识生成回答，有效减少模型幻觉大语言模型生成看似合理但实际错误或虚构信息的情况，是GEO需要规避的重要风险。。

Primary Challenges

Implementing RAG effectively presents several challenges:

Embedding Quality: The quality of vector representations is paramount. Embeddings must capture deep semantic features and contextual relationships. Using balanced, diverse datasets is crucial to avoid bias and ensure vectors accurately represent original text intent.

提升嵌入质量：向量表示的质量至关重要。嵌入必须捕获深层语义特征和上下文关系。使用平衡多样的数据集对避免偏差、确保向量准确表征文本意图非常关键。
Precise Knowledge Retrieval: Accurately finding the most relevant knowledge from vast external sources is complex. It requires deep understanding of query intent, efficient retrieval algorithms, and handling ambiguous queries while maintaining knowledge base freshness and accuracy

常见问题（FAQ）

RAG技术如何解决大模型的幻觉问题？

RAG通过检索外部知识源（如向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.）为生成阶段提供准确参考，使大模型基于事实数据而非仅凭训练记忆生成答案，从而有效减少虚构或错误信息。

RAG应用流程包含哪些关键阶段？

主要分为数据准备阶段（数据提取、文本分割、向量化将文本数据（如文档片段）转换为向量表示的过程，以便存储到向量数据库中进行高效检索。、入库）和应用阶段（用户查询、数据检索、提示词注入、大模型生成答案），实现从知识存储到智能问答的闭环。

企业为什么需要采用RAG方案？

RAG允许企业将私域数据安全地接入大模型，既解决了通用模型知识局限性和数据安全问题，又能提升特定领域任务的准确性，无需上传数据至第三方平台。