GEO

OpenRAG如何优化RAG系统?2026年端到端检索增强生成框架解析

2026/3/13
OpenRAG如何优化RAG系统?2026年端到端检索增强生成框架解析
AI Summary (BLUF)

OpenRAG is a novel RAG framework that optimizes retrieval-augmented generation systems end-to-end by tuning retrievers to capture in-context relevance, achieving consistent performance improvements of 4.0% over original retrievers and 2.1% over state-of-the-art alternatives.

原文翻译: OpenRAG是一种新颖的RAG框架,通过调整检索器以捕获上下文相关性,端到端优化检索增强生成系统,相比原始检索器实现4.0%的持续性能提升,相比最先进替代方案提升2.1%。

摘要

In this paper, we analyze and empirically show that the learned relevance for conventional information retrieval (IR) scenarios may be inconsistent in retrieval-augmented generation (RAG) scenarios. To bridge this gap, we introduce OpenRAG, a RAG framework that is optimized end-to-end by tuning the retriever to capture in-context relevance, enabling adaptation to the diverse and evolving needs. Extensive experiments across a wide range of tasks demonstrate that OpenRAG, by tuning a retriever end-to-end, leads to a consistent improvement of 4.0% over the original retriever, consistently outperforming existing state-of-the-art retrievers by 2.1%. Additionally, our results indicate that for some tasks, an end-to-end tuned 0.2B retriever can achieve improvements that surpass those of RAG-oriented or instruction-tuned 8B large language models (LLMs), highlighting the cost-effectiveness of our approach in enhancing RAG systems.

本文中,我们分析并通过实验证明,传统信息检索(IR)场景中学到的相关性在检索增强生成(RAG)场景中可能是不一致的。为了弥合这一差距,我们提出了 OpenRAG,这是一个通过端到端微调检索器以捕捉上下文相关性的 RAG 框架,使其能够适应多样化和不断变化的需求。在广泛任务上进行的大量实验表明,OpenRAG 通过端到端微调检索器,相比原始检索器实现了 4.0% 的稳定提升,并持续优于现有最先进的检索器 2.1%。此外,我们的结果表明,对于某些任务,一个经过端到端微调的 0.2B 参数检索器所能带来的改进,甚至超过了面向 RAG 或经过指令微调的 8B 大语言模型(LLM),这凸显了我们方法在增强 RAG 系统方面的成本效益。

引言:RAG 中的检索-生成鸿沟

Retrieval-Augmented Generation (RAG) has become a cornerstone for enhancing large language models (LLMs) with external, up-to-date knowledge. The typical RAG pipeline involves two main stages: a retriever fetches relevant documents from a knowledge base, and a generator (an LLM) synthesizes an answer based on the retrieved context. While this decoupling offers modularity, it often introduces a critical misalignment: the retriever is usually pre-trained or fine-tuned for standalone information retrieval (IR) tasks, where relevance is judged in isolation. However, in a RAG pipeline, the "relevance" of a document is ultimately defined by its utility to the downstream generator for producing an accurate and coherent response. This discrepancy is what we term the retrieval-generation gap.

检索增强生成(RAG)已成为利用外部、最新知识增强大语言模型(LLM)的基石。典型的 RAG 流程包含两个主要阶段:检索器从知识库中获取相关文档,生成器(一个 LLM)则基于检索到的上下文合成答案。虽然这种解耦提供了模块化的优势,但它常常引入一个关键的不匹配问题:检索器通常是为独立的信息检索(IR)任务进行预训练或微调的,其相关性判断是孤立的。然而,在 RAG 流程中,文档的“相关性”最终是由其对下游生成器产生准确、连贯回答的效用来定义的。这种差异就是我们所说的检索-生成鸿沟

Conventional retrievers, such as dense passage retrievers (DPR) or those based on contrastive learning, optimize for metrics like recall@k or MRR. They learn to match a query to a document based on lexical or semantic similarity. Yet, a document that is semantically similar to a query may not contain the specific information needed by the LLM to formulate a correct answer, or it may contain redundant or contradictory information that confuses the generator. Conversely, a document with lower standalone IR relevance might provide the crucial "golden nugget" of information that enables perfect generation. This misalignment can lead to suboptimal RAG performance, where high retrieval scores do not translate to high-quality final outputs.

传统的检索器,如密集段落检索器(DPR)或基于对比学习的检索器,是针对召回率@k 或平均倒数排名(MRR)等指标进行优化的。它们学习基于词汇或语义相似性将查询与文档进行匹配。然而,一个在语义上与查询相似的文档,可能并不包含 LLM 形成正确答案所需的具体信息,或者可能包含冗余或矛盾的信息,从而干扰生成器。相反,一个独立 IR 相关性较低的文档,却可能提供了促成完美生成的关键“黄金信息”。这种不匹配会导致 RAG 性能欠佳,即高检索分数并不能转化为高质量的最终输出。

The OpenRAG framework addresses this core challenge head-on. Its central thesis is that for RAG to be truly effective, the retriever must be optimized end-to-end within the actual generation loop. It must learn in-context relevance—the relevance of a document given the specific task of conditioning an LLM's generation. By directly using the feedback signal from the generator's performance (e.g., answer accuracy), OpenRAG fine-tunes the retriever to prioritize documents that are most helpful for the final answer, not just those that are superficially similar to the query.

OpenRAG 框架直面这一核心挑战。其核心论点是:为了让 RAG 真正有效,检索器必须在实际的生成循环中进行端到端优化。它必须学习上下文相关性——即一个文档在给定特定任务(用于条件化 LLM 生成)下的相关性。通过直接利用生成器性能(例如,答案准确性)的反馈信号,OpenRAG 对检索器进行微调,使其优先选择对最终答案最有帮助的文档,而不仅仅是那些表面上与查询相似的文档。

核心方法论:端到端的上下文检索学习

问题定义与动机

The standard RAG process can be formalized as follows: Given a user query ( q ), the retriever ( R ) returns a set of top-k documents ( D = {d_1, d_2, ..., d_k} ) from a corpus ( C ). The generator ( G ) then produces an answer ( a ) conditioned on both ( q ) and ( D ): ( a = G(q, D) ). The conventional approach treats ( R ) and ( G ) as separate components, with ( R ) trained on IR loss ( \mathcal{L}_{IR} ) (e.g., contrastive loss) using query-document pairs ( (q, d^+) ).

标准的 RAG 过程可以形式化如下:给定用户查询 ( q ),检索器 ( R ) 从语料库 ( C ) 中返回一组 top-k 文档 ( D = {d_1, d_2, ..., d_k} )。然后,生成器 ( G ) 基于 ( q ) 和 ( D ) 生成答案 ( a ):( a = G(q, D) )。传统方法将 ( R ) 和 ( G ) 视为独立组件,其中 ( R ) 使用查询-文档对 ( (q, d^+) ) 在 IR 损失 ( \mathcal{L}_{IR} )(例如,对比损失)上进行训练。

The fundamental insight of OpenRAG is that the optimal training signal for ( R ) in a RAG system is not ( \mathcal{L}{IR} ), but a loss that reflects the downstream generation quality, ( \mathcal{L}{GEN} ). The objective becomes to train ( R ) such that the retrieved set ( D ) maximizes the probability of ( G ) generating the correct or high-quality answer ( a^* ).

OpenRAG 的基本见解是,在 RAG 系统中,对于 ( R ) 的最佳训练信号不是 ( \mathcal{L}{IR} ),而是反映下游生成质量的损失 ( \mathcal{L}{GEN} )。目标变为训练 ( R ),使得检索到的集合 ( D ) 能够最大化 ( G ) 生成正确或高质量答案 ( a^* ) 的概率。

OpenRAG 框架设计

The OpenRAG framework implements this end-to-end optimization through a differentiable pipeline that allows gradients from the generator's loss to flow back to the retriever. The key design principles are:

OpenRAG 框架通过一个可微分的流程实现这种端到端优化,允许来自生成器损失的梯度流回检索器。关键的设计原则包括:

  1. Differentiable Retrieval: Employing a retriever architecture (e.g., a dual-encoder with a softmax over the corpus) that allows for gradient propagation from the selected documents back to the retriever's parameters.

    可微分检索:采用一种检索器架构(例如,在语料库上使用 softmax 的双编码器),允许梯度从被选中的文档传播回检索器的参数。

  2. In-Context Relevance Scoring: The retriever is not just scoring query-document similarity. It is trained to predict a document's utility for the generation task. This is the "in-context" aspect.

    上下文相关性评分:检索器不仅仅是对查询-文档相似性进行评分。它被训练来预测文档对于生成任务的效用。这就是“上下文”方面。

  3. Generator-as-Supervisor: The generator ( G ) provides the training signal. In practice, this can be implemented using a cross-entropy loss on the final answer, or a reward score from a reward model or task-specific metric (e.g., exact match for QA). The loss is backpropagated through the generator and then through the retrieved document distribution to update the retriever.

    生成器作为监督器:生成器 ( G ) 提供训练信号。在实践中,这可以通过对最终答案使用交叉熵损失,或使用来自奖励模型或任务特定指标(例如,QA 的精确匹配)的奖励分数来实现。损失通过生成器反向传播,然后通过检索到的文档分布来更新检索器。

A simplified training step can be visualized as:

  1. Forward Pass: Query ( q ) → Retriever ( R_θ ) → Soft document distribution ( P_θ(D|q) ) → Sample/select top-k docs ( D ) → Generator ( G_φ ) → Predicted answer ( \hat{a} ).
  2. Loss Computation: Compute loss ( \mathcal{L}(\hat{a}, a^) ) based on the ground truth answer ( a^ ).
  3. Backward Pass: Compute gradient ( \frac{\partial \mathcal{L}}{\partial θ} ) and update retriever parameters ( θ ). The generator parameters ( φ ) may be frozen or jointly fine-tuned.

一个简化的训练步骤可以可视化如下:

  1. 前向传播:查询 ( q ) → 检索器 ( R_θ ) → 软文档分布 ( P_θ(D|q) ) → 采样/选择 top-k 文档 ( D ) → 生成器 ( G_φ ) → 预测答案 ( \hat{a} )。
  2. 损失计算:基于真实答案 ( a^* ) 计算损失 ( \mathcal{L}(\hat{a}, a^*) )。
  3. 反向传播:计算梯度 ( \frac{\partial \mathcal{L}}{\partial θ} ) 并更新检索器参数 ( θ )。生成器参数 ( φ ) 可以冻结或联合微调。

This process directly aligns the retriever's objective with the ultimate goal of the RAG system: generating correct answers.

这个过程直接将检索器的目标与 RAG 系统的最终目标——生成正确答案——对齐。

实验分析与关键结果

The authors conducted extensive experiments on diverse benchmarks, including open-domain question answering (e.g., Natural Questions, TriviaQA), fact verification, and long-form dialogue. The baselines included strong off-the-shelf retrievers (like Contriever, ANCE) and state-of-the-art RAG-specific methods.

作者在多样化的基准测试上进行了广泛的实验,包括开放域问答(例如,Natural Questions, TriviaQA)、事实核查和长格式对话。基线包括强大的现成检索器(如 Contriever, ANCE)和最先进的 RAG 特定方法。

主要性能提升

The results robustly support OpenRAG's effectiveness:

结果有力地支持了 OpenRAG 的有效性:

  • vs. Original Retriever: When taking a strong base retriever (e.g., Contriever) and fine-tuning it end-to-end with OpenRAG, the performance improved by an average of +4.0% (in terms of answer accuracy/EM). This demonstrates the significant gain achievable by closing the retrieval-generation gap.

    对比原始检索器:当采用一个强大的基础检索器(例如,Contriever)并使用 OpenRAG 进行端到端微调时,性能平均提升了 +4.0%(以答案准确率/精确匹配计)。这证明了通过弥合检索-生成鸿沟可以实现显著增益。

  • vs. State-of-the-Art (SOTA) Retrievers: OpenRAG consistently outperformed other leading retrievers (including those fine-tuned on IR tasks) by an average margin of +2.1%. This indicates that in-context, end-to-end learning provides a superior relevance signal compared to even the best IR-only training.

    对比最先进(SOTA)检索器OpenRAG 持续优于其他领先的检索器(包括那些在 IR 任务上微调的检索器),平均优势为 +2.1%。这表明,与即使是最好的纯 IR 训练相比,上下文端到端学习提供了更优的相关性信号。

成本效益的突破性发现

Perhaps the most striking finding is related to model scale and cost:

也许最引人注目的发现与模型规模和成本有关:

  • The paper shows that for certain tasks, an end-to-end tuned retriever with only 0.2 billion parameters can yield larger performance gains than employing a much larger 8 billion parameter LLM that has been specifically instruction-tuned for RAG or is a RAG-variant model.

    论文表明,对于某些任务,一个仅具有 2 亿参数、经过端到端调优的检索器所能带来的性能提升,超过了使用经过 RAG 专门指令微调或本身是 RAG 变体、参数高达 80 亿的大语言模型

  • This highlights a crucial point: improving the retrieval quality through targeted, end-to-end learning can be a more cost-effective lever for enhancing overall RAG performance than simply scaling up the generator LLM. It shifts the focus from "bigger generators" to "smarter retrieval."

    这突出了一个关键点:通过有针对性的端到端学习来提升检索质量,可能是比单纯扩大生成器 LLM 规模更具成本效益的提升整体 RAG 性能的手段。它将焦点从“更大的生成器”转向了“更智能的检索”。

分析与讨论

The success of OpenRAG can be attributed to several factors:

OpenRAG 的成功可归因于几个因素:

  1. Learning Task-Specific Relevance: The retriever learns what information "looks like" for a specific answer generation task, which may differ from general semantic similarity.

    学习任务特定相关性:检索器学习对于特定答案生成任务而言,信息“看起来”是什么样的,这可能与一般的语义相似性不同。

  2. Mitigating Distractor Documents: By penalizing retrievals that lead to wrong answers, the model learns to avoid documents that are superficially relevant but ultimately misleading (distractors).

    减少干扰文档:通过惩罚导致错误答案的检索结果,模型学会避免那些表面上相关但最终具有误导性(干扰项)的文档。

  3. Promoting Complementary Information: The end-to-end signal can encourage the retriever to fetch a set of documents that together provide comprehensive coverage for the answer, even if individually they are not the top IR matches.

    促进互补信息:端到端的信号可以鼓励检索器获取一组文档,这些文档共同为答案提供全面的覆盖,即使单独来看它们并非顶级的 IR 匹配项。

结论与未来展望

OpenRAG presents a compelling paradigm shift for building efficient and effective RAG systems. By formulating and solving the end-to-end optimization problem for retrieval-augmented generation, it directly addresses the core misalignment between retrieval and generation objectives. The empirical evidence confirms that tuning the retriever for in-context relevance is a powerful and cost-effective strategy.

OpenRAG 为构建高效且有效的 RAG 系统提出了一个引人注目的范式转变。通过形式化并解决检索增强生成的端到端优化问题,它直接解决了检索目标与生成目标之间的核心不匹配问题。实证证据证实,为上下文相关性微调检索器是一种强大且具有成本效益的策略。

The framework opens up several promising directions for future work:

  • Architecture Exploration: Applying similar end-to-end principles to different retriever architectures (e.g., cross-encoders, late interaction models).
  • Training Efficiency: Developing more efficient methods for the end-to-end gradient computation, which can be computationally challenging over large corpora.
  • Dynamic Retrieval: Extending the framework to support multi-turn or iterative retrieval, where the retriever adapts based on the ongoing generation context.
  • Broader Applications: Applying the in-context retrieval learning principle to other tasks beyond QA, such as retrieval-augmented code generation, summarization, and creative writing.

该框架为未来的工作开辟了几个有前景的方向:

  • 架构探索:将类似的端到端原则应用于不同的检索器架构(例如,交叉编码器、后期交互模型)。
  • 训练效率:开发更高效的端到端梯度计算方法,因为在大规模语料库上进行计算可能具有挑战性。
  • 动态检索:扩展框架以支持多轮或迭代检索,使检索器能够根据正在进行的生成上下文进行适应。
  • 更广泛的应用:将上下文检索学习原理应用于问答之外的其他任务,例如检索增强的代码生成、摘要和创意写作。

In conclusion, OpenRAG moves beyond treating the retriever as a static, pre-trained component. It reimagines the retriever as an adaptive, learnable module that is intrinsically aligned with the generator's success, paving the way for more intelligent, robust, and efficient knowledge-enhanced language models.

总之,OpenRAG 超越了将检索器视为静态、预训练组件的传统思路。它重新构想了检索器,使其成为一个自适应的、可学习的模块,与生成器的成功内在对齐,从而为更智能、更稳健、更高效的知识增强语言模型铺平了道路。

常见问题(FAQ)

OpenRAG相比传统RAG框架,主要解决了什么问题?

OpenRAG解决了传统检索器在RAG场景中的检索-生成鸿沟问题,通过端到端微调让检索器学习上下文相关性,确保检索的文档真正有助于生成准确答案。

OpenRAG的性能提升具体体现在哪些方面?

相比原始检索器性能提升4.0%,相比现有最先进检索器提升2.1%。在某些任务中,0.2B参数的微调检索器甚至能超越8B大语言模型的效果,具有显著成本效益。

OpenRAG框架的核心创新点是什么?

核心创新是端到端的上下文检索学习,通过直接利用生成器的反馈信号微调检索器,使其优先选择对最终生成最有帮助的文档,而非仅表面相似的文档。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。