UltraRAG：清华大学开发的零代码RAG框架，革新AI知识增强应用开发

Introduction

Building and optimizing a Retrieval-Augmented Generation (RAG) system is a complex engineering endeavor. It typically involves multiple critical stages, including benchmark formulation, retrieval optimization, and model fine-tuning. This intricate workflow often presents a significant barrier to entry, making it challenging for both researchers and practitioners to get started.

构建和优化检索增强生成（RAG）系统是一项复杂的工程任务。它通常涉及多个关键阶段，包括基准制定、检索优化和模型微调。这种复杂的工作流程常常构成较高的入门门槛，使得研究人员和从业者都难以着手。

To address these challenges, the THUNLP team from Tsinghua University, in collaboration with NEUIR from Northeastern University, ModelBest AI, and the 9#AISoft team, has introduced the UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 framework. This framework revolutionizes the traditional development and configuration process for RAG systems, significantly reducing the learning curve and development cycle. UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 offers the granular, "DSLR camera"-level configuration capabilities demanded by expert users, while also providing "point-and-shoot camera"-like one-click simplicity for ease of use. This dual approach makes RAG system construction both minimalist and highly efficient.

为了应对这些挑战，清华大学THUNLP团队联合东北大学NEUIR、面壁智能以及9#AISoft团队，推出了 UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 框架。该框架革新了传统RAG系统的开发和配置流程，显著降低了学习成本和开发周期。UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。既提供了专家用户所需的、“单反相机”级别的精细化配置能力，也提供了“卡片机”式的一键便捷操作。这种双重设计使得RAG系统的构建既简洁又高效。

More importantly, compared to traditional RAG systems, UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 supports the automatic adaptation of models to user-provided knowledge bases, effectively eliminating the guesswork often involved in "model selection." Simultaneously, its modular design empowers research needs, allowing investigators to freely combine and rapidly iterate components across various scenarios. With UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。, users can effortlessly manage the entire pipeline from data to model. Whether the goal is in-depth academic exploration or rapid business deployment, UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 aims to provide a "随心所欲，得心应手" (do as you wish, with high proficiency) experience.

更重要的是，与传统RAG系统相比，UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。支持将模型自动适配到用户提供的知识库，有效避免了“模型选型”时的反复纠结。同时，其模块化设计又能为科研需求快速赋能，帮助研究者在多种场景下自由组合、快速迭代。通过UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。，用户可以轻松完成从数据到模型的全流程管理。无论是开展深度科研探索，还是进行快速业务落地，UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。都旨在提供一种“随心所欲，得心应手”的体验。

GitHub Repository: https://github.com/OpenBMB/UltraRAG

GitHub 地址: https://github.com/OpenBMB/UltraRAG

Core Features and Capabilities

No-Code WebUI for Accessible Development

A core advantage of UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 is its minimalist WebUI, which enables users, even those without programming experience, to easily accomplish model construction, training, and evaluation.

UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。的核心优势之一是其极简的WebUI，即使是没有编程经验的用户，也能轻松完成模型的构建、训练与评估。

Whether for rapid experimentation or personalized customization, UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 provides intuitive and efficient support. The framework integrates multiple preset workflows, allowing users to flexibly select the optimal path based on specific requirements. From data processing to model optimization, the entire operational pipeline can be completed without writing a single line of code.

无论是快速开展实验，还是进行个性化定制，UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。均能提供直观且高效的支持。该框架集成了多种预设工作流，用户可根据具体需求灵活选择最优路径。从数据处理到模型优化，全流程操作都无需编写代码。

One-Click Synthetic Data Generation and Model Fine-Tuning

Leveraging core self-developed methods like KBAlignUltraRAG的自研方法，用于提升大语言模型自适应知识库的能力，优化知识检索与推理过程。 and DDR, UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 offers one-click, systematic data construction. This is combined with diverse fine-tuning strategies for both retrieval and generation models to facilitate comprehensive performance optimization.

以自研的 KBAlignUltraRAG的自研方法，用于提升大语言模型自适应知识库的能力，优化知识检索与推理过程。、DDR 等方法为核心，UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。提供一键式系统化数据构建，并结合检索与生成模型的多样化微调策略，助力性能全面优化。

Data Construction: UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 covers the entire data construction pipeline for both retrieval and generation models. It supports automatic training data generation based on user-imported knowledge bases, significantly improving the effectiveness and adaptation efficiency of scenario-specific question answering.
- 数据构造： UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。覆盖从检索模型到生成模型的全流程数据构建方案，支持基于用户导入的知识库自动生成训练数据，显著提升场景问答的效果与适配效率。
Model Fine-Tuning: UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 provides complete training scripts, supporting Embedding model training and LLM fine-tuning via DPO/SFT. This helps users build more powerful and accurate models based on their constructed data.
- 模型微调： UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。提供了完备的训练脚本，支持Embedding模型用于将文本转换为向量表示的模型，在RAG中支持检索任务，UltraRAG提供相关训练支持。训练及大语言模型的 DPO/SFT 微调，帮助用户基于构建的数据打造更强大、更精准的模型。

Research-Friendly, Integrated Exploration Toolkit

UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 incorporates self-developed methods from the THUNLP-RAG group and other cutting-edge RAG technologies, supporting continuous modular exploration and development. UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 is not merely a technical framework; it serves as a powerful assistant for both researchers and developers, aiding users in efficiently finding optimal solutions across diverse task scenarios.

UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。内置THUNLP-RAG组自研方法及其他前沿RAG技术，支持整个模块化的持续探索与研发。UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。不仅是一个技术框架，更是科研人员与开发者的得力助手，助力用户在多种任务场景中高效寻优。

As its features are continuously refined and upgraded, UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 is poised to play a key role in a broader range of fields and application scenarios. It aims to persistently expand the boundaries of RAG technology, driving comprehensive development from academic research to commercial applications.

随着功能的不断完善与升级，UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。将在更广泛的领域和应用场景中发挥关键作用，持续拓展RAG技术的应用边界，推动从学术研究到商业应用的全面发展。

Its characteristics of simplicity, efficiency, flexibility, and ease of use make the deployment and application of RAG frameworks more accessible. This significantly reduces the technical complexity of both research and project development, allowing users to focus their energy on innovation and practical implementation.

其简洁、高效、灵活且易于上手的特性，使RAG框架的部署与应用更加便捷，显著降低科研与项目开发的技术复杂度，帮助用户专注于创新与实践。

The UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 Technology Suite

The UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 suite introduces several innovative technologies that optimize knowledge adaptation, task alignment, and data processing within retrieval-augmented generation, thereby enhancing the system's intelligence and efficiency.

UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。系列引入多项创新技术，优化了检索增强生成中的知识适配、任务适应和数据处理，提升了系统的智能性和高效性。

UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-KBAlignUltraRAG的自研方法，用于提升大语言模型自适应知识库的能力，优化知识检索与推理过程。: Enhances Large Language Models' ability to self-adapt to knowledge bases, optimizing the knowledge retrieval and reasoning process. A 2.4B parameter model, via self-annotation, achieves annotation performance comparable to GPT-4o and surpasses GPT-4o itself in several experiments.
- UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-KBAlignUltraRAG的自研方法，用于提升大语言模型自适应知识库的能力，优化知识检索与推理过程。：提升大语言模型自适应知识库的能力，优化知识检索与推理过程。一个24亿参数的模型通过自标注达到了与GPT-4o相当的标注性能，并在多个实验中超越了GPT-4o本身。
UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Embedding: Demonstrates excellent retrieval capabilities in both Chinese and English, supporting long-text and sparse retrieval. Its performance exceeds that of bge-m3 by over 10%.
- UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Embedding： 出色的中英文检索能力，支持长文本与稀疏检索。性能超过bge-m3 10%以上。
UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Vis: Proposes a purely visual RAG pipeline by introducing Vision-Language Models (VLMs) to encode documents. This avoids information loss caused by document parsing. Compared to traditional text-based RAG pipelines, it achieves a 25-39% improvement in end-to-end performance on certain tasks.
- UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Vis： 提出了纯视觉的RAG流程，通过引入视觉语言模型对文档进行编码，避免了文档解析造成的信息丢失。相比传统的文本RAG流程，部分任务的端到端性能提升25-39%。
UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Adaptive-Note: Improves answer quality in complex QA tasks through dynamic memory management and information gathering. Experiments on several leading models, including GPT-3.5-turbo, Llama3-8B, and Qwen2-7B, show that this adaptive strategy can achieve performance gains of 3% to 13.9% compared to baseline RAG models. It is particularly adept at handling questions with complex information retrieval requirements.
- UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Adaptive-Note： 通过动态记忆管理和信息收集，提升复杂问答任务中的解答质量。在GPT-3.5-turbo、Llama3-8B、Qwen2-7B等多个前沿模型上的实验表明，这种自适应的动态记忆管理和信息收集策略相较基础检索增强生成模型可实现3%～13.9%的性能提升，并且尤其擅长处理具有复杂信息检索需求的问题。
UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-DDR: Optimizes retrieval-augmented generation based on Differentiable Data Reward (DDR), enhancing system performance in task-specific scenarios. Experiments on models like MiniCPM-2.4B and Llama3-8B indicate that the DDR optimization strategy can achieve performance improvements of over 7% compared to the original RAG model.
- UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-DDR： 基于可微调数据奖励优化检索增强生成，提升任务特定场景的系统性能。在MiniCPM-2.4B、Llama3-8B等多个前沿模型上的实验表明，DDR优化策略相较原始检索增强生成模型可实现7%以上的性能提升。
UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Eval: An efficient evaluation solution designed for RAG scenarios. It can quickly and automatically generate professional domain-specific RAG evaluation data using a small number of seed documents and provides robust model-driven evaluation metrics and methods.
- UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。-Eval： 针对RAG场景设计的高效评测方案。通过少量种子文档，快速自动生成专业领域的RAG评测数据，并提供稳健的模型驱动评测指标与方法。

Summary

In summary, UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。 is a self-developed, integrated toolkit that combines knowledge base management, retrieval, generation, and evaluation into a cohesive framework. Its performance, ease of use, and modular design for research represent its key advantages, positioning it as a significant contribution to lowering the barrier for advanced RAG system development and experimentation.

总而言之，UltraRAG一种检索增强生成技术框架，专注于构建高效、可扩展的RAG系统。是一个集知识库管理、检索、生成、测评于一体的自研一体化工具包。其性能表现、易用性以及为科研设计的模块化特性构成了其主要优势，使其成为降低高级RAG系统开发与实验门槛的一项重要贡献。