GEO

RAG技术如何优化大模型性能?2026年最新演进框架与评估方法详解

2026/4/14
RAG技术如何优化大模型性能?2026年最新演进框架与评估方法详解

AI Summary (BLUF)

This article provides a comprehensive overview of Retrieval-Augmented Generation (RAG), detailing its evolution from Naive to Advanced and Modular RAG frameworks, key challenges, optimization techniques, and evaluation methods, based on the 2023 survey paper.

原文翻译: 本文基于2023年的综述论文,全面概述了检索增强生成(RAG)技术,详细介绍了其从Naive到Advanced再到Modular RAG框架的演进、关键挑战、优化技术以及评估方法。

本文基于综述论文《Retrieval-Augmented Generation for Large Language Models: A Survey》的阅读笔记整理而成,旨在系统性地介绍 RAG 技术的定义、框架、关键优化点及评估方法。文中大部分图表均源自该论文。

This article is based on reading notes from the survey paper “Retrieval-Augmented Generation for Large Language Models: A Survey”, aiming to systematically introduce the definition, frameworks, key optimization points, and evaluation methods of RAG technology. Most of the charts and images in this article are sourced from that paper.

概述

大型语言模型(LLM)在展现出令人赞叹的能力的同时,也面临着幻觉、知识过时、过程不透明及推理不可回溯等挑战。自2023年以来,检索增强生成(RAG)已成为应对这些挑战的热门解决方案。

While Large Language Models (LLMs) demonstrate impressive capabilities, they also face challenges such as hallucination, outdated knowledge, opacity, and non-retraceable reasoning processes. Since 2023, Retrieval-Augmented Generation (RAG) has become a popular solution to address these challenges.

综述论文《Retrieval-Augmented Generation for Large Language Models: A Survey》对 RAG 的发展和相关技术进行了全面总结。作者将 RAG 范式的演进概括为三个阶段:Naive RAG, Advanced RAG, 和 Modular RAG。同时,将 RAG 相关研究的演变分为下图所示的四个阶段,这些阶段伴随着大模型能力的演进而发展,其共同目标都是为了让模型更好地利用外部知识。

The survey paper “Retrieval-Augmented Generation for Large Language Models: A Survey” provides a comprehensive summary of the development and related technologies of RAG. The authors summarize the evolution of the RAG paradigm into three stages: Naive RAG, Advanced RAG, and Modular RAG. Concurrently, the evolution of RAG-related research is divided into the four stages shown in the figure below. These stages have developed alongside the evolving capabilities of large models, with the common goal of enabling models to better utilize external knowledge.

RAG Evolution Stages

作者总结的 RAG 技术生态全景图如下:

The RAG technology ecosystem panorama summarized by the authors is as follows:

RAG Ecosystem

RAG 的定义

下图展示了一个典型的 RAG 工作流程:

The following figure illustrates a typical RAG workflow:

RAG Workflow

该流程主要包含三个核心步骤:

  1. 索引(Indexing):将原始语料切分成块(Chunk),进行向量编码,并建立索引。
  2. 检索(Retrieval):根据用户查询,从索引中检索出语义相似度高的相关文档块。
  3. 生成(Generation):LLM 基于检索到的上下文信息,生成针对问题的最终回答。

This process primarily consists of three core steps:

  1. Indexing: The raw corpus is split into chunks, vector-encoded, and an index is built.
  2. Retrieval: Based on the user query, relevant document chunks with high semantic similarity are retrieved from the index.
  3. Generation: The LLM generates the final answer to the question based on the retrieved contextual information.

RAG 技术演进中需要解决的关键问题包括:

  • 检索什么(What to retrieve):检索粒度从简单的词元(Token)逐渐发展到实体检索、文档块,再到知识图谱。
  • 何时检索(When to retrieve):检索频率从单次检索,发展到自适应检索和多次检索。
  • 如何使用检索到的信息(How to use the retrieved information):结合方式从在模型输入层结合,发展到在中间层或输出层结合。

Key issues to be addressed in the evolution of RAG technology include:

  • What to retrieve: The retrieval granularity has evolved from simple tokens to entity retrieval, document chunks, and knowledge graphs.
  • When to retrieve: The retrieval frequency has evolved from single retrieval to adaptive retrieval and multiple retrievals.
  • How to use the retrieved information: The integration method has evolved from combining at the model input layer to combining at intermediate or output layers.

RAG 的框架

Naive RAG

Naive RAG 即最基本的 RAG 流程,包含前述的索引、检索、生成三个步骤。但在实际应用中,它在检索、生成和增强三个环节面临诸多挑战:

Naive RAG refers to the most basic RAG process, encompassing the aforementioned three steps of indexing, retrieval, and generation. However, in practical applications, it faces numerous challenges in the three areas of retrieval, generation, and augmentation:

  • 检索质量:低精度可能导致幻觉,低召回率则可能使 LLM 无法全面回答问题。
  • 生成质量:可能产生与检索内容不符的幻觉。
  • 增强过程:面临如何将检索到的上下文有效整合到生成任务中的挑战,可能导致输出杂乱、不一致,存在信息冗余和重复问题。此外,还需区分多个检索文档的重要性和相关度,调和不同的写作风格以确保输出一致性,并避免模型机械重复检索内容而不提供有价值的信息。
  • Retrieval Quality: Low precision may lead to hallucinations, while low recall may prevent the LLM from comprehensively answering questions.
  • Generation Quality: May produce hallucinations inconsistent with the retrieved content.
  • Augmentation Process: Faces the challenge of effectively integrating retrieved context into the generation task, potentially leading to messy, inconsistent outputs with information redundancy and repetition. Additionally, it requires distinguishing the importance and relevance of multiple retrieved documents, harmonizing different writing styles to ensure output consistency, and avoiding the model mechanically repeating retrieved content without providing valuable information.

Advanced RAG

为了克服 Naive RAG 的缺点,Advanced RAG 在检索前(pre-retrieval)和检索后(post-retrieval)采用了优化策略,并对索引过程进行了改进,例如使用滑动窗口、细粒度分割和元数据等。

To overcome the shortcomings of Naive RAG, Advanced RAG employs optimization strategies before retrieval (pre-retrieval) and after retrieval (post-retrieval), and improves the indexing process, such as using sliding windows, fine-grained segmentation, and metadata.

检索前优化(Pre-retrieval Process)

此阶段主要优化数据索引,旨在提升被索引内容的质量。主要策略包括:

This stage primarily optimizes data indexing, aiming to improve the quality of the indexed content. Main strategies include:

  • 增强数据粒度(Enhancing data granularity):去除不相关信息、消除实体和术语的歧义、确认事实准确性、更新过期文档。
  • 优化索引结构(Optimizing index structures):调整分块大小、查询多个索引、利用图结构捕捉相关信息。
  • 添加元数据信息(Adding metadata information):添加日期、目的等元数据以过滤块,利用章节和分段信息提高检索效率。
  • 对齐优化(Alignment optimization):引入“假设问题”来对齐文档(如 HyDE 方法)。
  • 混合检索(Hybrid Search):结合多种检索方法。
  • Enhancing data granularity: Removing irrelevant information, disambiguating entities and terms, verifying factual accuracy, updating outdated documents.
  • Optimizing index structures: Adjusting chunk size, querying multiple indices, utilizing graph structures to capture relevant information.
  • Adding metadata information: Adding metadata such as dates, purposes to filter chunks, using chapter and section information to improve retrieval efficiency.
  • Alignment optimization: Introducing “hypothetical questions” to align documents (e.g., the HyDE method).
  • Hybrid Search: Combining multiple retrieval methods.

检索优化(Retrieval)

检索阶段的核心是计算查询与文档块的相似度,向量模型对此过程至关重要。优化向量模型的方法包括:

The core of the retrieval stage is calculating the similarity between the query and document chunks, where the vector model is crucial. Methods to optimize the vector model include:

  • 微调向量模型(Fine-tuning Embedding Models):可以使用 LLM(如 GPT-3.5-turbo)基于文档块生成问题,构建用于微调的语料对。
  • 动态嵌入(Dynamic Embedding):使相同的词在不同的上下文中具有不同的向量表示。(注:当前主流向量模型已普遍支持动态嵌入。)
  • Fine-tuning Embedding Models: LLMs (e.g., GPT-3.5-turbo) can be used to generate questions based on document chunks, constructing corpus pairs for fine-tuning.
  • Dynamic Embedding: Enables the same word to have different vector representations in different contexts. (Note: Current mainstream vector models generally support dynamic embedding.)

检索后优化(Post-retrieval Process)

若将所有检索到的信息直接输入 LLM,可能超出其上下文窗口限制并引入噪声。当前的检索后处理方法包括:

If all retrieved information is directly fed into the LLM, it may exceed its context window limit and introduce noise. Current post-retrieval processing methods include:

  • 重排序(Re-Ranking):对检索结果进行重新排序。可使用专门的重排序模型(如 bge-reranker, cohereAI rerank),或采用特定策略,如 LostInTheMiddleRanker(将最不重要的内容放在提示词中间)、Diversity Ranker(按片段多样性排序)。
  • 提示词压缩(Prompt Compression):压缩提示词中的不相关信息。相关研究包括 Selective Context、LLMLingua、Recomp、Context、Walking in the Memory Maze 等。
  • Re-Ranking: Reordering the retrieval results. Specialized re-ranking models (e.g., bge-reranker, cohereAI rerank) can be used, or specific strategies can be employed, such as LostInTheMiddleRanker (placing the least important content in the middle of the prompt) and Diversity Ranker (sorting by fragment diversity).
  • Prompt Compression: Compressing irrelevant information in the prompt. Related research includes Selective Context, LLMLingua, Recomp, Context, Walking in the Memory Maze, etc.

Modular RAG

作者定义的 RAG 三种范式概览如下图所示。虽然在概念上有所区分,但 Modular RAG 并非孤立存在,Advanced RAG 是 Modular RAG 的一种特殊形式,而 Naive RAG 又是 Advanced RAG 的一种特殊形式。

An overview of the three RAG paradigms defined by the authors is shown in the figure below. Although conceptually distinct, Modular RAG is not isolated; Advanced RAG is a special form of Modular RAG, and Naive RAG is a special form of Advanced RAG.

RAG Paradigms Overview

Modular RAG 引入了新的功能模块,使其架构更加灵活和强大:

Modular RAG introduces new functional modules, making its architecture more flexible and powerful:

  • 搜索模块(Search):除相似性检索外,还包括搜索引擎、数据库、知识图谱等。
  • 记忆模块(Memory):利用 LLM 的记忆能力辅助检索,代表工作如 Selfmem。
  • 融合模块(Fusion):将查询扩展为多个查询(multi-query),代表工作如 RAG-Fusion。
  • 路由模块(Routing):根据用户请求决定后续操作,例如是否搜索特定数据库、是否进行摘要等。
  • 预测模块(Predict):使用 LLM 直接生成上下文,而非先进行检索。
  • 任务适配模块(Task Adapter):使 RAG 适应不同的下游任务,相关工作如 UPRISE 和 PROMPTAGATOR。
  • Search Module: In addition to similarity retrieval, it includes search engines, databases, knowledge graphs, etc.
  • Memory Module: Utilizes the LLM’s memory capability to assist retrieval, representative work such as Selfmem.
  • Fusion Module: Expands the query into multiple queries (multi-query), representative work such as RAG-Fusion.
  • Routing Module: Determines subsequent actions based on user requests, e.g., whether to search a specific database, whether to summarize, etc.
  • Predict Module: Uses the LLM to directly generate context, rather than retrieving first.
  • Task Adapter Module: Adapts RAG to different downstream tasks, related work such as UPRISE and PROMPTAGATOR.

与由固定模块组成的 Naive/Advanced RAG 不同,Modular RAG 的模式更加多样灵活。当前研究主要集中在两个方向:添加或替换模块调整模块间的流程

Unlike Naive/Advanced RAG, which consists of fixed modules, Modular RAG patterns are more diverse and flexible. Current research mainly focuses on two directions: Adding or Replacing Modules and Adjusting the Flow between Modules.

RAG 流程中涉及的优化工作包括:

Optimization work involved in the RAG pipeline includes:

  • 混合搜索探索(Hybrid Search Exploration):应用不同的检索技术,如基于关键词的搜索、语义搜索、向量搜索。
  • 递归检索与查询引擎(Recursive Retrieval and Query Engine):既检索小文档块,也检索更大的相关块。
  • StepBack 提示(StepBack-prompt):鼓励 LLM 进行更高层次的抽象思考。
  • 子查询(Sub-Queries):采用不同的查询策略,如树状查询、向量查询、顺序查询文档块。
  • 假设文档嵌入(Hypothetical Document Embeddings, HyDE):使用 LLM 根据查询生成假设的文档,并用其进行相似度检索。
  • Hybrid Search Exploration: Applying different retrieval techniques, such as keyword-based search, semantic search, vector search.
  • Recursive Retrieval and Query Engine: Retrieving both small document chunks and larger related chunks.
  • StepBack-prompt: Encouraging the LLM to engage in higher-level abstract thinking.
  • Sub-Queries: Employing different query strategies, such as tree queries, vector queries, sequential queries of document chunks.
  • Hypothetical Document Embeddings (HyDE): Using the LLM to generate hypothetical documents based on the query and using them for similarity retrieval.

检索模块的深度优化

构建高效检索器涉及三个基础问题:1. 如何获得有效的语义表示?2. 哪些方法可以对齐查询和文档的语义空间?3. 如何使检索器的输出对齐大模型的偏好?

Building an efficient retriever involves three fundamental questions: 1. How to obtain effective semantic representations? 2. What methods can align the semantic spaces of queries and documents? 3. How to align the retriever’s output with the preferences of the large model?

增强语义表示

文档块优化

文档块过大或过小都可能导致次优结果,因此选择合适的块大小至关重要。选择时需考虑以下因素:

Both excessively large and small document chunks can lead to suboptimal results, making the choice of appropriate chunk size crucial. The following factors should be considered when selecting:

  • 索引内容的特性。
  • 向量模型及其最佳编码长度(例如,sentence-transformer 更适合句子编码,而 text-embedding-ada-002 更适合 256 或 512 个 token 的文本块)。
  • 用户查询的长度和复杂性。
  • 检索结果的应用场景(如语义搜索或问答)。
  • 所使用的 LLM 的上下文窗口大小。
  • Characteristics of the indexed content.
  • The vector model and its optimal encoding length (e.g., sentence-transformer is more suitable for sentence encoding, while text-embedding-ada-002 is more suitable for text chunks of 256 or 512 tokens).
  • The length and complexity of user queries.
  • The application scenario of the retrieval results (e.g., semantic search or Q&A).
  • The context window size of the LLM being used.

当前 RAG 相关的块优化方法总结如下表:

Current RAG-related chunk optimization methods are summarized in the following table:

优化方法 核心思想 主要优势
滑动窗口 允许合并多个检索过程的相关结果。 提高上下文连贯性。
Small2Big 初始检索使用小文本块,后续将相关的大文本块输入 LLM。 平衡检索精度与上下文完整性。
摘要嵌入技术 基于文档摘要对检索结果进行排序。 提供对文档更综合的理解。
基于元数据的过滤 利用日期、类型等元数据筛选文档块。 提升检索相关性,减少噪声。
图索引技术 将实体与关系转化为节点和边。 显著提升多跳问题的相关度。

微调向量模型

尽管当前向量模型性能强大,但在专业领域应用中仍可能力不从心。微调可以使其更好地理解领域特定的用户请求。主要方法包括:

Although current vector models are powerful, they may still be inadequate in professional domain applications. Fine-tuning can enable them to better understand domain-specific user requests. Main methods include:

  • 领域知识微调(Domain Knowledge Fine-tuning):关键在于构建涵盖领域相关的数据集,包括查询语句、语料库和相关文档。
  • 下游任务微调(Fine-tuning for Downstream Tasks):代表工作如 PROMPTAGATOR、LLM-Embedder。
  • Domain Knowledge Fine-tuning: The key lies in constructing

常见问题(FAQ)

RAG技术主要解决了大语言模型的哪些问题?

RAG技术主要应对大语言模型的幻觉、知识过时、过程不透明及推理不可回溯等挑战,通过检索外部知识来增强生成结果的准确性和时效性。

RAG框架经历了哪几个阶段的演进?

根据2023年综述论文,RAG范式演进分为三个阶段:Naive RAG(基础流程)、Advanced RAG(优化检索)和Modular RAG(模块化设计),共同目标是提升外部知识利用效率。

RAG工作流程包含哪些核心步骤?

典型RAG流程包含三个核心步骤:索引(将语料切块并向量编码)、检索(根据查询匹配相关文档块)、生成(LLM基于检索上下文生成最终答案)。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。