GEO

UltraRAG:清华大学开发的零代码RAG框架,革新AI知识增强应用开发

2026/1/25
UltraRAG:清华大学开发的零代码RAG框架,革新AI知识增强应用开发
AI Summary (BLUF)

UltraRAG is a comprehensive RAG framework developed by Tsinghua University and partners, featuring zero-code WebUI, automated knowledge base adaptation, and modular design for both research and practical applications. It integrates innovative technologies like KBAlign and DDR to optimize retrieval and generation performance across various models and tasks. (UltraRAG是由清华大学等团队开发的全面RAG框架,具备零代码WebUI、自动化知识库适配和模块化设计,支持科研与业务应用。它集成了KBAlign、DDR等创新技术,优化了多模型和多任务的检索与生成性能。)

Introduction

Building and optimizing a Retrieval-Augmented Generation (RAG) system is a complex engineering endeavor. It typically involves multiple critical stages, including benchmark formulation, retrieval optimization, and model fine-tuning. This intricate workflow often presents a significant barrier to entry, making it challenging for both researchers and practitioners to get started.

构建和优化检索增强生成(RAG)系统是一项复杂的工程任务。它通常涉及多个关键阶段,包括基准制定、检索优化和模型微调。这种复杂的工作流程常常构成较高的入门门槛,使得研究人员和从业者都难以着手。

To address these challenges, the THUNLP team from Tsinghua University, in collaboration with NEUIR from Northeastern University, ModelBest AI, and the 9#AISoft team, has introduced the UltraRAG framework. This framework revolutionizes the traditional development and configuration process for RAG systems, significantly reducing the learning curve and development cycle. UltraRAG offers the granular, "DSLR camera"-level configuration capabilities demanded by expert users, while also providing "point-and-shoot camera"-like one-click simplicity for ease of use. This dual approach makes RAG system construction both minimalist and highly efficient.

为了应对这些挑战,清华大学THUNLP团队联合东北大学NEUIR、面壁智能以及9#AISoft团队,推出了 UltraRAG 框架。该框架革新了传统RAG系统的开发和配置流程,显著降低了学习成本和开发周期。UltraRAG既提供了专家用户所需的、“单反相机”级别的精细化配置能力,也提供了“卡片机”式的一键便捷操作。这种双重设计使得RAG系统的构建既简洁又高效。

More importantly, compared to traditional RAG systems, UltraRAG supports the automatic adaptation of models to user-provided knowledge bases, effectively eliminating the guesswork often involved in "model selection." Simultaneously, its modular design empowers research needs, allowing investigators to freely combine and rapidly iterate components across various scenarios. With UltraRAG, users can effortlessly manage the entire pipeline from data to model. Whether the goal is in-depth academic exploration or rapid business deployment, UltraRAG aims to provide a "随心所欲,得心应手" (do as you wish, with high proficiency) experience.

更重要的是,与传统RAG系统相比,UltraRAG支持将模型自动适配到用户提供的知识库,有效避免了“模型选型”时的反复纠结。同时,其模块化设计又能为科研需求快速赋能,帮助研究者在多种场景下自由组合、快速迭代。通过UltraRAG,用户可以轻松完成从数据到模型的全流程管理。无论是开展深度科研探索,还是进行快速业务落地,UltraRAG都旨在提供一种“随心所欲,得心应手”的体验。

GitHub Repository: https://github.com/OpenBMB/UltraRAG

GitHub 地址: https://github.com/OpenBMB/UltraRAG

Core Features and Capabilities

No-Code WebUI for Accessible Development

A core advantage of UltraRAG is its minimalist WebUI, which enables users, even those without programming experience, to easily accomplish model construction, training, and evaluation.

UltraRAG的核心优势之一是其极简的WebUI,即使是没有编程经验的用户,也能轻松完成模型的构建、训练与评估。

Whether for rapid experimentation or personalized customization, UltraRAG provides intuitive and efficient support. The framework integrates multiple preset workflows, allowing users to flexibly select the optimal path based on specific requirements. From data processing to model optimization, the entire operational pipeline can be completed without writing a single line of code.

无论是快速开展实验,还是进行个性化定制,UltraRAG均能提供直观且高效的支持。该框架集成了多种预设工作流,用户可根据具体需求灵活选择最优路径。从数据处理到模型优化,全流程操作都无需编写代码。

One-Click Synthetic Data Generation and Model Fine-Tuning

Leveraging core self-developed methods like KBAlign and DDR, UltraRAG offers one-click, systematic data construction. This is combined with diverse fine-tuning strategies for both retrieval and generation models to facilitate comprehensive performance optimization.

以自研的 KBAlignDDR 等方法为核心,UltraRAG提供一键式系统化数据构建,并结合检索与生成模型的多样化微调策略,助力性能全面优化。

  • Data Construction: UltraRAG covers the entire data construction pipeline for both retrieval and generation models. It supports automatic training data generation based on user-imported knowledge bases, significantly improving the effectiveness and adaptation efficiency of scenario-specific question answering.
    • 数据构造: UltraRAG覆盖从检索模型到生成模型的全流程数据构建方案,支持基于用户导入的知识库自动生成训练数据,显著提升场景问答的效果与适配效率。
  • Model Fine-Tuning: UltraRAG provides complete training scripts, supporting Embedding model training and LLM fine-tuning via DPO/SFT. This helps users build more powerful and accurate models based on their constructed data.
    • 模型微调: UltraRAG提供了完备的训练脚本,支持Embedding模型训练及大语言模型的 DPO/SFT 微调,帮助用户基于构建的数据打造更强大、更精准的模型。

Research-Friendly, Integrated Exploration Toolkit

UltraRAG incorporates self-developed methods from the THUNLP-RAG group and other cutting-edge RAG technologies, supporting continuous modular exploration and development. UltraRAG is not merely a technical framework; it serves as a powerful assistant for both researchers and developers, aiding users in efficiently finding optimal solutions across diverse task scenarios.

UltraRAG内置THUNLP-RAG组自研方法及其他前沿RAG技术,支持整个模块化的持续探索与研发。UltraRAG不仅是一个技术框架,更是科研人员与开发者的得力助手,助力用户在多种任务场景中高效寻优。

As its features are continuously refined and upgraded, UltraRAG is poised to play a key role in a broader range of fields and application scenarios. It aims to persistently expand the boundaries of RAG technology, driving comprehensive development from academic research to commercial applications.

随着功能的不断完善与升级,UltraRAG将在更广泛的领域和应用场景中发挥关键作用,持续拓展RAG技术的应用边界,推动从学术研究到商业应用的全面发展。

Its characteristics of simplicity, efficiency, flexibility, and ease of use make the deployment and application of RAG frameworks more accessible. This significantly reduces the technical complexity of both research and project development, allowing users to focus their energy on innovation and practical implementation.

其简洁、高效、灵活且易于上手的特性,使RAG框架的部署与应用更加便捷,显著降低科研与项目开发的技术复杂度,帮助用户专注于创新与实践。

The UltraRAG Technology Suite

The UltraRAG suite introduces several innovative technologies that optimize knowledge adaptation, task alignment, and data processing within retrieval-augmented generation, thereby enhancing the system's intelligence and efficiency.

UltraRAG系列引入多项创新技术,优化了检索增强生成中的知识适配、任务适应和数据处理,提升了系统的智能性和高效性。

  1. UltraRAG-KBAlign: Enhances Large Language Models' ability to self-adapt to knowledge bases, optimizing the knowledge retrieval and reasoning process. A 2.4B parameter model, via self-annotation, achieves annotation performance comparable to GPT-4o and surpasses GPT-4o itself in several experiments.
    • UltraRAG-KBAlign 提升大语言模型自适应知识库的能力,优化知识检索与推理过程。一个24亿参数的模型通过自标注达到了与GPT-4o相当的标注性能,并在多个实验中超越了GPT-4o本身。
  2. UltraRAG-Embedding: Demonstrates excellent retrieval capabilities in both Chinese and English, supporting long-text and sparse retrieval. Its performance exceeds that of bge-m3 by over 10%.
    • UltraRAG-Embedding: 出色的中英文检索能力,支持长文本与稀疏检索。性能超过bge-m3 10%以上。
  3. UltraRAG-Vis: Proposes a purely visual RAG pipeline by introducing Vision-Language Models (VLMs) to encode documents. This avoids information loss caused by document parsing. Compared to traditional text-based RAG pipelines, it achieves a 25-39% improvement in end-to-end performance on certain tasks.
    • UltraRAG-Vis: 提出了纯视觉的RAG流程,通过引入视觉语言模型对文档进行编码,避免了文档解析造成的信息丢失。相比传统的文本RAG流程,部分任务的端到端性能提升25-39%。
  4. UltraRAG-Adaptive-Note: Improves answer quality in complex QA tasks through dynamic memory management and information gathering. Experiments on several leading models, including GPT-3.5-turbo, Llama3-8B, and Qwen2-7B, show that this adaptive strategy can achieve performance gains of 3% to 13.9% compared to baseline RAG models. It is particularly adept at handling questions with complex information retrieval requirements.
    • UltraRAG-Adaptive-Note: 通过动态记忆管理和信息收集,提升复杂问答任务中的解答质量。在GPT-3.5-turbo、Llama3-8B、Qwen2-7B等多个前沿模型上的实验表明,这种自适应的动态记忆管理和信息收集策略相较基础检索增强生成模型可实现3%~13.9%的性能提升,并且尤其擅长处理具有复杂信息检索需求的问题。
  5. UltraRAG-DDR: Optimizes retrieval-augmented generation based on Differentiable Data Reward (DDR), enhancing system performance in task-specific scenarios. Experiments on models like MiniCPM-2.4B and Llama3-8B indicate that the DDR optimization strategy can achieve performance improvements of over 7% compared to the original RAG model.
    • UltraRAG-DDR: 基于可微调数据奖励优化检索增强生成,提升任务特定场景的系统性能。在MiniCPM-2.4B、Llama3-8B等多个前沿模型上的实验表明,DDR优化策略相较原始检索增强生成模型可实现7%以上的性能提升。
  6. UltraRAG-Eval: An efficient evaluation solution designed for RAG scenarios. It can quickly and automatically generate professional domain-specific RAG evaluation data using a small number of seed documents and provides robust model-driven evaluation metrics and methods.
    • UltraRAG-Eval: 针对RAG场景设计的高效评测方案。通过少量种子文档,快速自动生成专业领域的RAG评测数据,并提供稳健的模型驱动评测指标与方法。

Summary

In summary, UltraRAG is a self-developed, integrated toolkit that combines knowledge base management, retrieval, generation, and evaluation into a cohesive framework. Its performance, ease of use, and modular design for research represent its key advantages, positioning it as a significant contribution to lowering the barrier for advanced RAG system development and experimentation.

总而言之,UltraRAG是一个集知识库管理、检索、生成、测评于一体的自研一体化工具包。其性能表现、易用性以及为科研设计的模块化特性构成了其主要优势,使其成为降低高级RAG系统开发与实验门槛的一项重要贡献。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。