企业RAG挑战赛优胜方案如何设计架构？（附密集检索与LLM重排序详解）

Q: 企业RAG挑战赛的冠军方案用了哪些关键技术？

冠军Ilya Rice的方案采用了密集检索、LLM重排序、路由模式、o3-mini模型以及自洽性投票等复合增强策略，在49分钟内取得了最高分。

近期，一场聚焦于企业级检索增强生成（RAG）应用的竞赛吸引了众多开发者和研究者的参与。参赛者需要在复杂的金融文档（如 SEC 10-K 报告）中，准确、高效地回答多公司对比性问题。这不仅考验了 RAG 系统的检索精度，也对其推理能力、架构设计及成本控制提出了极高要求。

Recently, a competition focused on enterprise-level Retrieval-Augmented Generation (RAG) applications attracted numerous developers and researchers. Participants were required to accurately and efficiently answer multi-company comparative questions within complex financial documents (such as SEC 10-K reports). This not only tested the retrieval accuracy of RAG systems but also placed high demands on their reasoning capabilities, architectural design, and cost control.

本文旨在对本次竞赛中排名前列的解决方案进行系统性梳理与对比分析。我们将重点关注优胜者的核心架构设计、所采用的关键技术栈、详尽的实验迭代过程，并从中提炼出构建高性能企业级 RAG 系统的通用原则与最佳实践。

This article aims to systematically review and conduct a comparative analysis of the top-ranked solutions in this competition. We will focus on the core architectural designs of the winners, the key technology stacks employed, the detailed iterative process of their experiments, and extract general principles and best practices for building high-performance enterprise-level RAG systems.

优胜者概览与核心指标对比

下表汇总了本次竞赛中表现最佳的九支团队及其核心成果。通过对比其最佳实验的得分、用时及核心方法，我们可以初步洞察不同技术路线的效能差异。

The following table summarizes the nine top-performing teams in this competition and their core results. By comparing the scores, time consumption, and core methods of their best experiments, we can gain initial insights into the performance differences of various technical approaches.


排名	参赛者	最佳实验用时	检索分 (R) / 生成分 (G)	总分	核心方法简述
1	Ilya Rice	49 分钟	83.8 / 81.8	123.7	密集检索一种检索方法，使用密集嵌入向量表示查询和键，并构建近似最近邻索引以加快搜索速度。 + 路由 + LLM重排 + o3-mini + 自洽性投票
2	Emil Shagiev	55 分钟	86.3 / 78.5	121.6	LLM搜索（无向量嵌入），多步查询扩展与答案精炼
3	Dmitry Buykin	8 小时	81.1 / 76.4	117.5	动态结构化输出通过特定格式或模式控制LLM输出结构的技术，通常与思维链推理结合使用，提高答案的准确性和一致性。 + SEC本体查询扩展，无向量数据库
4	Sergey Nikonov	30 小时	85.1 / 73.9	116.4	全文档处理（gpt-4o），简单但全面的覆盖策略
5	ScrapeNinja.net	23 小时	82.6 / 71.2	112.5	Node.js + pgvector，Gemini Flash 系列模型
6	xsl777	16 小时	79.4 / 71.2	110.9	结构化PDF解析，混合搜索，查询扩展，CoT推理
7	nikolay_sheyko	25 小时	81.1 / 69.8	110.4	两阶段处理：gpt-4o-mini页面相关性评估 + o3-mini答案生成
8	Felix-TAT	7 天	80.2 / 69.3	109.4	多智能体架构：OpenAI委托 + Gemini专家代理 + OpenAI执行汇总
9	A.Rasskazov/V.Kalesnikau	30 小时	84.0 / 67.3	109.3	多智能体系统，基于相似性的检索，使用Llama-3-405B等模型

从表格中可以看出，总分最高的方案并非耗时最长的。冠军 Ilya Rice 在不到一小时内取得了最佳成绩，这凸显了高效实验流程和精准架构设计的重要性。同时，高分方案在技术选型上呈现出多样性：既有基于传统向量检索的增强方案，也有完全摒弃向量、纯靠LLM进行搜索和推理的创新路径。

From the table, it can be seen that the solution with the highest total score was not the one that took the longest time. The champion, Ilya Rice, achieved the best result in less than an hour, highlighting the importance of an efficient experimental process and precise architectural design. Meanwhile, high-scoring solutions show diversity in technology selection: there are both enhanced solutions based on traditional vector retrieval and innovative paths that completely abandon vectors, relying solely on LLMs for search and reasoning.

核心架构模式深度剖析

冠军方案：Ilya Rice 的系统化实验与复合增强策略

Ilya Rice 的获胜并非偶然，而是建立在系统化的、数据驱动的实验流程之上。他构建了一个自动化评估管道，能够在竞赛开始前快速迭代和评估数十种不同的架构组合。

Ilya Rice's victory was not accidental but built upon a systematic, data-driven experimental process. He constructed an automated evaluation pipeline that could rapidly iterate and evaluate dozens of different architectural combinations before the competition even began.

冠军方案：Ilya Rice 的系统化实验与复合增强策略
Ilya Rice's winning solution: Systematic Experimentation and Composite Enhancement Strategy

其最终获胜的架构是一个集成了多种先进模式的复合系统：

Its final winning architecture is a composite system integrating multiple advanced patterns:

文档预处理与路由：使用深度定制的 IBM Docling 库处理 PDF，保留页码引用。第一步通过路由模式在问答流程中根据查询特征选择最合适处理代理的机制，用于优化不同任务的处理效率。选择最合适的处理智能体。
密集检索一种检索方法，使用密集嵌入向量表示查询和键，并构建近似最近邻索引以加快搜索速度。与上下文扩展：基于 OpenAI 嵌入和 FAISSFacebook's open-source library for efficient similarity search and clustering of dense vectors. 进行语义相似性搜索。采用“父文档检索检索完整页面而非仅文档片段的策略，用于保留相关上下文信息，提高答案的完整性和准确性。”策略，不仅返回文本块，还加载完整页面以保留上下文。
LLM 重排与推理增强：使用 LLM 对检索结果进行重新评估和排序。在单一提示词中，通过定制化的思维链（CoT）和结构化输出通过特定格式或模式控制LLM输出结构的技术，通常与思维链推理结合使用，提高答案的准确性和一致性。（SO）模式来控制 LLM 的思考过程，提升准确性。
答案生成与自洽性校验：使用 o3-mini 模型生成最终答案。采用“自洽性多数投票”机制：生成多个答案变体，进行比较，并选择最一致的一个。

Document Preprocessing and Routing: Uses a deeply customized IBM Docling libraryIBM开发的开源文档处理库，用于PDF文档的结构化分析和内容提取。 to process PDFs, preserving page number references. The first step selects the most suitable processing agent via a routing pattern.

Dense Retrieval and Context Expansion: Performs semantic similarity search based on OpenAI embeddings and FAISSFacebook's open-source library for efficient similarity search and clustering of dense vectors.. Employs a "Parent Document Retrieval" strategy, returning not just text chunks but loading the full page to preserve context.

LLM Reranking and Reasoning Enhancement: Uses an LLM to re-evaluate and reorder retrieval results. Within a single prompt, controls the LLM's thinking process through customized Chain-of-Thought (CoT) and Structured Outputs (SO) patterns to improve accuracy.

Answer Generation and Self-Consistency Verification: Uses the o3-mini model to generate the final answer. Employs a "Self-Consistency with Majority Vote" mechanism: generates multiple answer variations, compares them, and selects the most consistent one.

其最终获胜的架构是一个集成了多种先进模式的复合系统：
Its final winning architecture is a composite system integrating multiple advanced patterns:

该方案的成功关键在于将强大的检索基础（密集检索一种检索方法，使用密集嵌入向量表示查询和键，并构建近似最近邻索引以加快搜索速度。+父文档）与精细的LLM后处理（重排、CoT/SO、自洽性投票）相结合，并在高效的实验框架下找到了性能与速度的最佳平衡点。

The key to the success of this solution lies in combining a powerful retrieval foundation (dense retrieval + parent document) with refined LLM post-processing (reranking, CoT/SO, self-consistency voting), and finding the optimal balance between performance and speed under an efficient experimental framework.

亚军方案：Emil Shagiev 的纯 LLM 搜索范式

获得亚军的 Emil Shagiev 采取了一条截然不同的技术路线：完全未使用向量嵌入和向量数据库。他的方案证明了，在某些场景下，纯依靠大语言模型的推理和搜索能力也能达到极佳效果。

The runner-up, Emil Shagiev, adopted a completely different technical approach: completely avoiding the use of vector embeddings and vector databases. His solution proved that in certain scenarios, relying solely on the reasoning and search capabilities of large language models can also achieve excellent results.

亚军方案：Emil Shagiev 的纯 LLM 搜索范式
Emil Shagiev's solution: The Pure LLM Search Paradigm

其架构是一个清晰的多步管道：

Its architecture is a clear multi-step pipeline:

查询扩展：对输入问题进行扩展，以增强搜索覆盖范围并实现语义搜索。
高效页面检索：使用一个成本低、速度快的 LLM（如 gpt-4o-mini）来识别和检索相关页面。
精准答案生成：将检索到的信息传递给一个更强大的 LLM（如 o3-mini）来生成答案。
答案精炼：对生成的答案进行优化和最终定稿。

Query Expansion: Expands the input question to enhance search coverage and enable semantic search.

Efficient Page Retrieval: Uses a cost-effective and fast LLM (e.g., gpt-4o-mini) to identify and retrieve relevant pages.

Precise Answer Generation: Passes the retrieved information to a more powerful LLM (e.g., o3-mini) to generate the answer.

Answer Refinement: Optimizes and finalizes the generated answer.

其架构是一个清晰的多步管道：
Its architecture is a clear multi-step pipeline:

这种方法的优势在于架构简单，避免了构建和维护向量索引的复杂性，并且其检索步骤本身具有强大的语义理解能力。它在检索分数（R: 86.3）上取得了全场最高分，显示了 LLM 在理解查询意图和文档内容匹配方面的卓越能力。

The advantage of this method lies in its simple architecture, avoiding the complexity of building and maintaining vector indexes, and its retrieval step itself possesses powerful semantic understanding capabilities. It achieved the highest retrieval score (R: 86.3) in the entire competition, demonstrating the exceptional ability of LLMs in understanding query intent and matching document content.

其他代表性架构

除了冠亚军，其他方案也展示了丰富的技术多样性：

Besides the champion and runner-up, other solutions also demonstrated rich technical diversity:

Dmitry Buykin（第3名）：专注于动态结构化输出通过特定格式或模式控制LLM输出结构的技术，通常与思维链推理结合使用，提高答案的准确性和一致性。和领域本体（SEC EDGAR）驱动的查询扩展。同样未使用向量检索，而是利用 LLM 对领域知识的编码来引导搜索和答案结构化，在生成质量（G）上表现不俗。
Sergey Nikonov（第4名）：采用了最直接的“暴力”方法——针对每个问题，使用 gpt-4o 处理所有文档的所有页面。这种方法计算成本最高，但确保了信息的绝对完整性，取得了很高的检索分。
Felix-TAT（第8名）：设计了经典的多智能体（Multi-Agent）架构。由一个委托管理器（OpenAI）分解问题，多个专家代理（Gemini）并行处理单个公司文档，最后由执行代理（OpenAI）汇总答案。这种模式非常适合处理涉及多个实体的复杂比较性问题。

Dmitry Buykin (3rd Place): Focused on dynamic structured output and domain ontology (SEC EDGAR) driven query expansion. Also did not use vector retrieval, but utilized LLM's encoding of domain knowledge to guide search and answer structuring, performing well in generation quality (G).

Sergey Nikonov (4th Place): Adopted the most direct "brute-force" method—for each question, using gpt-4o to process all pages of all documents. This method has the highest computational cost but ensures absolute completeness of information, achieving a very high retrieval score.

Felix-TAT (8th Place): Designed a classic Multi-Agent architecture. A delegation manager (OpenAI) decomposes the question, multiple expert agents (Gemini) process single company documents in parallel, and finally an execution agent (OpenAI) aggregates the answers. This pattern is very suitable for handling complex comparative questions involving multiple entities.

除了冠亚军，其他方案也展示了丰富的技术多样性：
Besides the champion and runner-up, other solutions also demonstrated rich technical diversity:

关键技术组件与模型选型分析

文档解析与预处理

高质量的文档解析是 RAG 的基石。多位参赛者提到了处理复杂 PDF（尤其是包含表格和格式文本的财务报表）的挑战。

High-quality document parsing is the cornerstone of RAG. Multiple participants mentioned the challenges of processing complex PDFs, especially financial statements containing tables and formatted text.


参赛者	解析工具/方法	关键改进/挑战
Ilya Rice	深度修改的 IBM Docling 库	修改以保留页码引用，这对后续的引用和上下文扩展至关重要。
Dmitry Buykin	自定义 PDF 质量启发式评估	投入大量精力优化 OCR 输入，并实施合成标签以稳定页面检测和评估模型质量。
xsl777	结构化 PDF 解析与分块	集成元数据提取，为后续的混合搜索和重排提供丰富信号。
ScrapeNinja.net	OCR 技术	用于从 PDF 中提取文本信息，结合 pgvector 进行处理。

共识在于，通用开箱即用的解析工具往往不足以满足企业级需求，针对特定文档类型（如 SEC 10-K）进行定制化优化是必要的。

The consensus is that general-purpose, out-of-the-box parsing tools are often insufficient for enterprise-level needs, and customization and optimization for specific document types (e.g., SEC 10-K) are necessary.

检索策略对比

检索是 RAG 的核心环节。本次竞赛中出现了三种主流的检索范式：

Retrieval is the core环节 of RAG. Three mainstream retrieval paradigms emerged in this competition:

密集向量检索：以冠军方案为代表，使用 OpenAI embeddings 和 FAISSFacebook's open-source library for efficient similarity search and clustering of dense vectors.。这是当前 RAG 的“标准”配置，优势在于语义匹配能力强，速度快。常需与“父文档检索检索完整页面而非仅文档片段的策略，用于保留相关上下文信息，提高答案的完整性和准确性。”结合以解决信息丢失问题。
纯 LLM 搜索：以亚军方案为代表。利用 LLM 直接理解查询和文档内容，返回相关页面。优势是无需维护向量索引，检索意图理解更精准；劣势是可能增加 LLM 调用成本和延迟。
混合/无向量检索：以第3名方案为代表。利用结构化输出通过特定格式或模式控制LLM输出结构的技术，通常与思维链推理结合使用，提高答案的准确性和一致性。、领域本体和查询扩展来引导 LLM 进行“推理式检索”，或像第4名那样进行全文档检索。

Dense Vector Retrieval: Represented by the champion's solution, using OpenAI embeddings and FAISSFacebook's open-source library for efficient similarity search and clustering of dense vectors.. This is the current "standard" configuration for RAG, with strengths in strong semantic matching and speed. It often needs to be combined with "parent document retrieval" to address information loss.

Pure LLM Search: Represented by the runner-up's solution. Utilizes LLMs to directly understand queries and document content, returning relevant pages. Advantages include no need to maintain vector indexes and more precise retrieval intent understanding; disadvantages include potentially increased LLM invocation costs and latency.

Hybrid/Non-Vector Retrieval: Represented by the 3rd place solution. Uses structured output, domain ontologies, and query expansion to guide LLMs in "reasoning-based retrieval," or performs full-document retrieval like the 4th place solution.

选择哪种策略取决于具体需求：对延迟和成本敏感且文档结构规整的场景，向量检索仍是首选；对精度要求极高、文档复杂且可接受较高成本的场景，纯 LLM 搜索或混合策略可能更优。

The choice of strategy depends on specific requirements: vector retrieval is still the preferred choice for scenarios sensitive to latency and cost with regular document structures; for scenarios with extremely high precision requirements, complex documents, and acceptance of higher costs, pure LLM search or hybrid strategies may be superior.

大语言模型选型与作用

LLM 在方案中扮演了多重角色：检索器、重排器、推理引擎和答案生成器。参赛者广泛使用了 OpenAI 和 Google 的最新模型。

LLMs played

常见问题（FAQ）

企业RAG挑战赛的冠军方案用了哪些关键技术？

冠军Ilya Rice的方案采用了密集检索一种检索方法，使用密集嵌入向量表示查询和键，并构建近似最近邻索引以加快搜索速度。、LLM重排序使用大型语言模型对初步检索结果进行重新评估和排序的过程，以提高最终答案的准确性和相关性。、路由模式在问答流程中根据查询特征选择最合适处理代理的机制，用于优化不同任务的处理效率。、o3-mini模型以及自洽性投票等复合增强策略，在49分钟内取得了最高分。

亚军Emil Shagiev的RAG方案有什么独特之处？

亚军方案完全摒弃了传统的向量嵌入和向量数据库，采用纯LLM搜索范式，通过多步查询扩展与答案精炼来实现高效检索与生成。

构建高性能企业级RAG系统有哪些通用原则？

高效实验流程与精准架构设计至关重要。高分方案技术选型多样，包括传统向量检索增强和纯LLM搜索路径，需根据场景平衡精度、推理能力与成本。

AI Summary (BLUF)