RAG系统架构与优化：2026年检索增强生成技术前沿综述

Q: RAG架构主要分为哪几种类型？

综述提出了RAG架构的分类法，主要包括检索器中心型系统、生成器中心型系统、混合型系统以及鲁棒性导向型设计，每种架构侧重不同的优化方向。

Q: RAG技术面临哪些主要挑战？

综述指出RAG面临检索质量、事实锚定保真度、流程效率、对噪声输入的鲁棒性等挑战，并存在检索精度与生成灵活性、效率与忠实性之间的权衡问题。

检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。：架构、增强技术与鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。前沿综述

This is a preprint under review at ACM TOIS. Do not redistribute the final version without permission.

本文是提交至 ACM TOIS 期刊的预印本。未经许可，请勿分发最终版本。

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm to enhance large language models (LLMs) by conditioning generation on external evidence retrieved at inference time. While RAG addresses critical limitations of parametric knowledge storage—such as factual inconsistency and domain inflexibility—it introduces new challenges in retrieval quality, grounding fidelity, pipeline efficiency, and robustness against noisy or adversarial inputs. This survey provides a comprehensive synthesis of recent advances in RAG systems, offering a taxonomy that categorizes architectures into retriever-centric, generator-centric, hybrid, and robustness-oriented designs. We systematically analyze enhancements across retrieval optimization, context filtering, decoding control, and efficiency improvements, supported by comparative performance analyses on short-form and multi-hop question answering tasks. Furthermore, we review state-of-the-art evaluation frameworks and benchmarks, highlighting trends in retrieval-aware evaluation, robustness testing, and federated retrieval settings. Our analysis reveals recurring trade-offs between retrieval precision and generation flexibility, efficiency and faithfulness, and modularity and coordination. We conclude by identifying open challenges and future research directions, including adaptive retrieval architectures, real-time retrieval integration, structured reasoning over multi-hop evidence, and privacy-preserving retrieval mechanisms. This survey aims to consolidate current knowledge in RAG research and serve as a foundation for the next generation of retrieval-augmented language modeling systems.

检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（RAG）已成为一种强大的范式，通过在推理时检索外部证据来增强大语言模型Advanced AI models trained on massive text data to understand and generate human language across multiple tasks.（LLM）的生成能力。尽管 RAG 解决了参数化知识存储的关键限制——如事实不一致性和领域不灵活性——但它也带来了检索质量、事实锚定保真度、流程效率以及对噪声或对抗性输入的鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。等新挑战。本综述全面梳理了 RAG 系统的最新进展，提出了一个分类法，将架构分为检索器中心型、生成器中心型、混合型和鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。导向型设计。我们系统分析了在检索优化、上下文过滤对检索到的文档进行筛选、压缩或重排，以去除冗余和噪声信息，为生成器提供最相关、最简洁的上下文。、解码控制和效率提升等方面的增强技术，并辅以在短格式和多跳问答任务上的性能比较分析。此外，我们回顾了最先进的评估框架和基准测试，重点介绍了检索感知评估、鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。测试和联邦检索一种在分布式或隐私敏感环境下进行检索的机制，无需集中所有数据。设置方面的趋势。我们的分析揭示了检索精度与生成灵活性、效率与忠实性、模块化与协调性之间反复出现的权衡。最后，我们指出了开放的挑战和未来的研究方向，包括自适应检索架构、实时检索集成、基于多跳证据的结构化推理以及隐私保护检索机制。本综述旨在整合 RAG 研究的当前知识，并为下一代检索增强语言建模系统奠定基础。

Keywords: Retrieval-Augmented Generation, Query Reformulation, Context Filtering, Reranking, Multi-hop Reasoning, Hallucination Mitigation, Robustness, Dynamic Retrieval, Evaluation Benchmarks, Federated Retrieval, Faithfulness, Efficiency Optimization, Document Ranking, LLM Alignment, Open-Domain QA

关键词：检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。，查询重构在检索前对用户查询进行优化或分解的过程，例如将复杂多跳问题分解为子问题，以提高检索的准确性和召回率。，上下文过滤对检索到的文档进行筛选、压缩或重排，以去除冗余和噪声信息，为生成器提供最相关、最简洁的上下文。，重排序，多跳推理通过多个推理步骤逐步深入分析复杂问题，每个步骤基于前一步的结果进行进一步探索。，幻觉缓解旨在减少大语言模型生成与检索证据不符或凭空捏造内容的技术和方法。，鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。，动态检索在生成过程中可能进行多轮或自适应检索的策略，而非一次性静态检索。，评估基准用于系统评估和比较RAG模型性能的标准数据集和评测指标。，联邦检索一种在分布式或隐私敏感环境下进行检索的机制，无需集中所有数据。，忠实性，效率优化，文档排序，大语言模型Advanced AI models trained on massive text data to understand and generate human language across multiple tasks.对齐，开放域问答

1. Introduction

Large Language Models (LLMs) have demonstrated impressive generalization across natural language tasks, but their reliance on static, parametric knowledge remains a fundamental limitation. This restricts their ability to handle queries requiring up-to-date, verifiable, or domain-specific information, often resulting in hallucinations or factual inconsistencies.

大语言模型Advanced AI models trained on massive text data to understand and generate human language across multiple tasks.（LLM）在自然语言任务上展现出了令人印象深刻的泛化能力，但其对静态、参数化知识的依赖仍然是一个根本性的限制。这限制了它们处理需要最新、可验证或特定领域信息查询的能力，常常导致幻觉或事实不一致。

Retrieval-Augmented Generation (RAG) addresses this issue by coupling pretrained language models with non-parametric retrieval modules that fetch external evidence during inference. By conditioning generation on retrieved documents, RAG systems offer greater transparency, factual grounding, and adaptability to evolving knowledge bases. These properties have made RAG central to tasks such as open-domain QA, biomedical reasoning, knowledge-grounded dialogue, and long-context summarization.

检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（RAG）通过将预训练的语言模型与非参数化的检索模块相结合来解决这个问题，后者在推理时获取外部证据。通过基于检索到的文档进行生成，RAG 系统提供了更高的透明度、事实锚定能力以及对不断发展的知识库的适应性。这些特性使得 RAG 成为开放域问答、生物医学推理、知识驱动对话和长上下文摘要等任务的核心技术。

However, integrating retrieval with generation introduces unique challenges: retrieval noise and redundancy can degrade output quality; misalignment between retrieved evidence and generated text can lead to hallucinations; and pipeline inefficiencies and latency make deployment costly at scale. Moreover, balancing modularity with tight retrieval–generation interaction remains an open architectural trade-off.

然而，将检索与生成相结合带来了独特的挑战：检索噪声和冗余会降低输出质量；检索到的证据与生成文本之间的错位可能导致幻觉；流程的低效率和延迟使得大规模部署成本高昂。此外，在模块化与紧密的检索-生成交互之间取得平衡，仍然是一个开放的架构权衡问题。

In this survey, we first present a high-level taxonomy of RAG architectures based on where core innovations occur—within the retriever, the generator, or through their joint coordination. We begin with a background on RAG’s mathematical formulation and components, and then explore advances in retrieval strategies, filtering, and control mechanisms. We further analyze how RAG systems are benchmarked, compare prominent frameworks, and conclude with open research challenges and future directions.

在本综述中，我们首先根据核心创新发生的位置——在检索器内部、生成器内部或通过它们的联合协调——提出了一个高层次的 RAG 架构分类法。我们从 RAG 的数学公式和组件背景开始，然后探讨检索策略、过滤和控制机制的进展。我们进一步分析了 RAG 系统的基准测试方法，比较了主要框架，并以开放的研究挑战和未来方向作为总结。

2. Background and foundations of retrieval-augmented generation

Retrieval-Augmented Generation (RAG) is a framework that augments large language models (LLMs) with external knowledge access via document retrieval. It builds on the intuition that generating grounded and verifiable responses requires not only parametric knowledge stored in model weights, but also non-parametric access to a dynamic evidence corpus. This section outlines the core components of RAG systems and presents the mathematical formulation that underpins their design.

检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（RAG）是一个通过文档检索为大型语言模型（LLM）提供外部知识访问的框架。它的基本理念是，生成有根据且可验证的响应不仅需要存储在模型权重中的参数化知识，还需要对动态证据语料库的非参数化访问。本节概述了 RAG 系统的核心组件，并介绍了支撑其设计的数学公式。

2.1. Components of a RAG System

At a high level, a RAG system consists of three modules:

在高层次上，一个 RAG 系统由三个模块组成：

Query Encoder: Encodes the input x into a query representation q, which is used to retrieve relevant documents. This can be either a neural encoder or a rule-based template.

查询编码器：将输入 x 编码为查询表示 q，用于检索相关文档。这可以是神经编码器或基于规则的模板。
Retriever: Given the query q, the retriever fetches a ranked list of documents d1, d2, ..., dk from a corpus C. Retrievers may be sparse (e.g., BM25), dense (e.g., DPR), hybrid, or generative.

检索器：给定查询 q，检索器从语料库 C 中获取一个排序的文档列表 d1, d2, ..., dk。检索器可以是稀疏的（如 BM25）、密集的（如 DPR）、混合的或生成式的。
Generator: The generator conditions on the input x and the retrieved documents di to produce the final output y. This is typically a pretrained transformer model (e.g., T5, BART, GPT).

生成器：生成器以输入 x 和检索到的文档 di 为条件，生成最终输出 y。这通常是一个预训练的 Transformer 模型（如 T5, BART, GPT）。

2.2. Mathematical Formulation

Formally, the generation process in Retrieval-Augmented Generation (RAG) can be expressed as modeling the conditional distribution:

形式上，检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（RAG）中的生成过程可以表示为对条件分布的建模：

P(y | x) = Σ_{d∈C} P(y | x, d) · P(d | x)

where:

其中：

x is the input (e.g., a question or prompt),

x 是输入（例如，一个问题或提示），
d is a retrieved document from corpus C,

d 是从语料库 C 中检索到的文档，
y is the generated response.

y 是生成的响应。

In practice, the summation is approximated by retrieving the top-k documents d1, ..., dk, yielding:

在实践中，通过检索 top-k 文档 d1, ..., dk 来近似求和，得到：

P(y | x) ≈ Σ_{i=1}^{k} P(y | x, di) · P(di | x)

This decomposition reflects two key probabilities:

这种分解反映了两个关键概率：

P(di | x): the relevance score of document di given the input x, often derived from a retriever or reranker.

P(di | x)：给定输入 x 时文档 di 的相关性分数，通常由检索器或重排序器得出。
P(y | x, di): the probability of generating output y conditioned on x and document di, modeled by the language model.

P(y | x, di)：在给定 x 和文档 di 的条件下生成输出 y 的概率，由语言模型建模。

Variants of RAG differ in how they estimate and combine these components. Some use a fixed retriever and let the generator handle noisy inputs, while others jointly optimize retrieval and generation to maximize downstream utility.

RAG 的变体在如何估计和组合这些组件方面有所不同。有些使用固定的检索器，让生成器处理噪声输入，而另一些则联合优化检索和生成，以最大化下游任务的效用。

3. Taxonomy of RAG Architectures

To contextualize recent advances in Retrieval-Augmented Generation (RAG), we propose a taxonomy that categorizes existing systems based on their architectural focus—retriever-centric, generator-centric, hybrid, and robustness-oriented designs. This classification highlights key design patterns and illustrates how different frameworks tackle the core challenges of retrieval, grounding, and reliability.

为了将检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（RAG）的最新进展置于背景中，我们提出了一个分类法，根据其架构重点——检索器中心型、生成器中心型、混合型和鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。导向型设计——对现有系统进行分类。这种分类突出了关键的设计模式，并说明了不同框架如何应对检索、事实锚定和可靠性的核心挑战。

3.1. Retriever-Based RAG Systems

Retriever-based Retrieval-Augmented Generation (RAG) systems delegate architectural responsibility primarily to the retriever, treating the generator as a passive decoder. These systems operate under the premise that the fidelity and relevance of the retrieved context are the most critical factors for generating accurate and grounded outputs. Innovations in this space typically fall into one of three design patterns: input-side query enhancement, retriever-side adaptation, and retrieval granularity optimization.

基于检索器的检索增强生成将外部知识检索与大语言模型生成相结合的技术，通过向量数据库存储和检索相关信息来增强模型的准确性和时效性。（RAG）系统将架构责任主要委托给检索器，将生成器视为被动解码器。这些系统的运作前提是，检索到的上下文的保真度和相关性是生成准确且有根据的输出最关键的因素。该领域的创新通常属于以下三种设计模式之一：输入侧查询增强、检索器侧适应和检索粒度优化。

Query-Driven Retrieval: A prominent strategy focuses on refining and structuring user intent before retrieval to maximize alignment with relevant corpus segments. This includes decomposition, rewriting, generative reformulation, and the incorporation of structured priors to guide retrieval.

查询驱动检索：一种突出的策略侧重于在检索前细化和结构化用户意图，以最大限度地与相关语料片段对齐。这包括分解、重写、生成式重构以及结合结构化先验来指导检索。
Retriever-Centric Adaptation: Another line of work modifies the retriever itself through architectural enhancements or task-specific learning.

检索器中心型适应：另一项工作通过架构增强或任务特定学习来修改检索器本身。
Granularity-Aware Retrieval: This pattern addresses retrieval precision by optimizing the unit of retrieval—from full documents to fine-grained, semantically aligned segments.

粒度感知检索：这种模式通过优化检索单元——从完整文档到细粒度、语义对齐的片段——来解决检索精度问题。

Each of these patterns anchors its innovation in the retriever, preserving modularity and interpretability. However, they also introduce trade-offs in latency, redundancy, and sensitivity to ambiguous or underspecified queries.

这些模式中的每一种都将创新锚定在检索器中，保持了模块化和可解释性。然而，它们也带来了在延迟、冗余以及对模糊或未明确指定查询的敏感性方面的权衡。

3.2. Generator-Based RAG Systems

Generator-based RAG systems concentrate architectural innovation on the decoding process, assuming the retrieved content is sufficiently relevant and shifting the burden of factual grounding and integration to the language model. These systems enhance output quality through mechanisms for self-verification, compression, and controlled generation.

基于生成器的 RAG 系统将架构创新集中在解码过程上，假设检索到的内容足够相关，并将事实锚定和整合的负担转移给语言模型。这些系统通过自验证、压缩和受控生成等机制来提高输出质量。

常见问题（FAQ）

RAG系统主要包含哪些核心组件？

根据综述，RAG系统通常包含检索器和生成器两大核心组件。检索器负责从外部知识库中查找相关文档，生成器则基于检索到的上下文信息进行内容生成。

RAG架构主要分为哪几种类型？

综述提出了RAG架构的分类法，主要包括检索器中心型系统、生成器中心型系统、混合型系统以及鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。导向型设计，每种架构侧重不同的优化方向。

RAG技术面临哪些主要挑战？

综述指出RAG面临检索质量、事实锚定保真度、流程效率、对噪声输入的鲁棒性RAG系统在面对检索噪声、对抗性查询或低质量输入时，仍能保持生成准确性和可靠性的能力。等挑战，并存在检索精度与生成灵活性、效率与忠实性之间的权衡问题。