GEO

RAG系统如何优化?企业实战经验分享:查询生成与重排序策略 | Geoz.com.cn

2026/2/16
RAG系统如何优化?企业实战经验分享:查询生成与重排序策略 | Geoz.com.cn
AI Summary (BLUF)

After 8 months building RAG systems for two enterprises (9M and 4M pages), we share what actually worked vs. wasted time. Key ROI optimizations include query generation, reranking, chunking strategy, metadata injection, and query routing. 经过8个月为两家企业(900万和400万页面)构建RAG系统的实战,我们分享真正有效的策略与时间浪费点。关键ROI优化包括查询生成、重排序、分块策略、元数据注入和查询路由。

Introduction

Over the past eight months, my team and I have been deeply immersed in building and refining Retrieval-Augmented Generation (RAG) systems for production. We implemented RAG for two distinct use cases: Usul AI, which processes 9 million pages, and a confidential enterprise legal AI application handling 4 million pages. This journey took us from initial optimism with quick prototypes to the sobering reality of subpar performance that only end-users could detect, ultimately leading to months of iterative improvements. This post distills our key learnings, ranking them by their impact on the final system's performance.

在过去的八个月里,我和我的团队深入参与了检索增强生成(RAG)系统的构建与生产环境优化。我们为两个不同的用例实施了RAG:处理900万页的Usul AI,以及一个处理400万页的机密企业级法律AI应用。这段旅程让我们从快速原型构建的初始乐观,跌落到只有最终用户才能察觉的性能不佳的严峻现实,最终通过数月的迭代改进才得以解决。本文提炼了我们的核心经验,并按其最终对系统性能的影响进行了排序。

What Actually Moved the Needle: High-ROI Improvements

Our initial prototype, built swiftly with popular frameworks, failed to meet real-world user expectations. The following changes, listed in order of their return on investment (ROI), were crucial in bridging that gap.

我们使用流行框架快速构建的初始原型未能满足真实世界的用户期望。以下按投资回报率(ROI)排序的改进,对于弥补这一差距至关重要。

Query Generation

We learned that a user's final query often fails to capture the full semantic context needed for retrieval. To address this, we implemented a step where an LLM reviews the conversation thread and generates multiple queries—combining both semantic and keyword-based approaches. These queries are processed in parallel, and their results are passed to a reranker. This strategy significantly expanded our retrieval coverage and reduced over-reliance on the computed scores from any single hybrid search.

我们认识到,用户的最终查询通常无法捕捉检索所需的完整语义上下文。为了解决这个问题,我们增加了一个步骤:让一个大语言模型(LLM)回顾整个对话线程,并生成多个查询——结合了语义和基于关键词的方法。这些查询被并行处理,其结果被传递给重排序器。这一策略极大地扩展了我们的检索覆盖范围,并减少了对任何单一混合搜索计算分数的过度依赖。

Reranking

Integrating a reranker was arguably the highest-value addition to our codebase—a minimal change with an outsized impact. We observed dramatic shifts in chunk rankings, far beyond our initial expectations. A reranker can often compensate for suboptimal retrieval setups, provided you feed it a sufficient number of initial chunks. Through experimentation, we found an optimal configuration of passing 50 chunks to the reranker and receiving 15 as output.

集成一个重排序器可以说是我们代码库中价值最高的增补——改动极小,影响巨大。我们观察到块排名的剧烈变化,远超最初的预期。只要提供足够数量的初始块,重排序器通常可以弥补检索设置上的不足。通过实验,我们找到了一个最佳配置:向重排序器传递 50个块,并接收 15个 作为输出。

Chunking Strategy

This aspect demands substantial effort and will likely consume most of your development time. We built custom chunking flows for both enterprises. The key is to deeply understand your data: manually review sample chunks, and ensure that a) chunks are not truncated mid-word or mid-sentence, and b) each chunk forms a logical, self-contained unit of information.

这方面需要投入大量精力,并可能消耗你大部分的开发时间。我们为两家企业都构建了自定义的分块流程。关键在于深入理解你的数据:手动审查样本块,并确保 a) 块不会在单词或句子中间被截断,以及 b) 每个块都是一个逻辑上自包含的信息单元。

Injecting Metadata into the LLM Context

Initially, we passed only the raw chunk text to the LLM. An experiment revealed that injecting relevant metadata (such as title, author, source) alongside the text substantially improved the quality of the context and the final answers generated by the LLM.

最初,我们只向LLM传递原始块文本。一项实验表明,注入相关元数据(如标题、作者、来源)与文本一起,能显著提升上下文的质量和LLM生成的最终答案。

Query Routing

We encountered many user questions that fell outside RAG's core competency (e.g., "summarize this article," "who wrote this?"). To handle these efficiently, we built a lightweight router that detects such intents and redirects them to a simpler pipeline—typically an API call combined with an LLM—bypassing the full RAG retrieval setup entirely.

我们遇到了许多超出RAG核心能力范围的用户问题(例如,“总结这篇文章”、“这是谁写的?”)。为了高效处理这些问题,我们构建了一个轻量级的路由器,用于检测此类意图,并将其重定向到一个更简单的流程——通常是API调用结合LLM——完全绕过完整的RAG检索设置。

Our Evolving Technical Stack

Our infrastructure choices evolved based on performance, cost, and specific feature needs.

我们的基础设施选择基于性能、成本和特定功能需求而不断演变。

  • Vector Database 向量数据库: Azure Cognitive Search → Pinecone → Turbopuffer (cost-effective with native keyword search support 性价比高,原生支持关键词搜索)
  • Document Extraction 文档提取: Custom solution 自定义解决方案
  • Chunking 分块: Unstructured.io (default 默认), Custom for enterprises (Note: Heard positive feedback about Chonkie 企业级自定义,注:听闻Chonkie评价不错)
  • Embedding Model 嵌入模型: text-embedding-3-large (We did not extensively test alternatives 未广泛测试其他模型)
  • Reranker 重排序: None → Cohere 3.5 → Zerank (lesser-known but performed well 知名度较低但表现良好)
  • LLM 大语言模型: GPT-4.1 → GPT-5 → GPT-4.1 (Leveraged Azure credits 利用了Azure额度)

Conclusion & Open-Source Contribution

The path from a working RAG prototype to a production-grade system is paved with iterative refinement, focused on foundational elements like query understanding, result reranking, and intelligent data chunking. To encapsulate these learnings and give back to the community, we have consolidated our approach into an open-source project: agentset-ai/agentset, released under the permissive MIT license. We welcome questions, feedback, and contributions.

从一个可运行的RAG原型到生产级系统的道路,铺满了迭代优化的过程,其重点在于查询理解、结果重排序和智能数据分块等基础要素。为了整合这些经验并回馈社区,我们将我们的方法整合进一个开源项目:agentset-ai/agentset,该项目在宽松的MIT许可证下发布。我们欢迎任何问题、反馈和贡献。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。