UltraRAG UI实战指南：构建标准化检索增强生成(RAG)流程

Introduction

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for enhancing large language models (LLMs) with external, up-to-date knowledge. To provide a hands-on, in-depth experience with RAG capabilities, the UltraRAG UI offers a standardized, end-to-end pipeline. This integrated workflow seamlessly combines document retrieval, source citation, and augmented text generation, allowing developers and researchers to experiment with and evaluate RAG systems efficiently. This blog post will guide you through the structure, configuration, and execution of this powerful pipeline.

检索增强生成（RAG）已成为利用外部、最新知识增强大语言模型（LLM）能力的核心技术。为了提供对RAG能力的深度实践体验，UltraRAG UI提供了一个标准化的端到端流程。这个集成的工作流无缝结合了文档检索、引用标注和增强文本生成，使开发者和研究人员能够高效地实验和评估RAG系统。本文将引导您了解这个强大流程的结构、配置和执行。

Pipeline Structure Overview

The core of the UltraRAG experience is defined in a YAML configuration file. This file orchestrates a series of steps, each handled by a dedicated Model Context Protocol (MCP) server or client component. The pipeline is designed to be modular, transparent, and reproducible.

UltraRAG体验的核心由一个YAML配置文件定义。该文件编排了一系列步骤，每个步骤由专用的模型上下文协议（MCP）服务器或客户端组件处理。该流程设计为模块化、透明且可复现的。

The following YAML snippet outlines the primary stages of the RAG demo pipeline:

以下YAML片段概述了RAG演示流程的主要阶段：

# examples/RAG.yaml
# RAG Demo for UltraRAG UI

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- retriever.retriever_search
- custom.assign_citation_ids
- prompt.qa_rag_boxed
- generation.generate

Key Stages Explained:

关键阶段解析：

benchmark.get_data: Loads the benchmark dataset containing questions and ground-truth answers for evaluation. (加载包含用于评估的问题和标准答案的基准数据集。)
retriever.retriever_init: Initializes the retrieval system, which may involve loading an embedding model and a vector index. (初始化检索系统，可能涉及加载嵌入模型和向量索引。)
generation.generation_init: Initializes the text generation backend (e.g., vLLM, OpenAI API). (初始化文本生成后端（例如，vLLM、OpenAI API）。)
retriever.retriever_search: For each question, retrieves the top-k most relevant document chunks from the knowledge base. (针对每个问题，从知识库中检索前k个最相关的文档片段。)
custom.assign_citation_ids: Assigns unique identifiers to the retrieved chunks for proper citation in the final answer. (为检索到的片段分配唯一标识符，以便在最终答案中进行正确引用。)
prompt.qa_rag_boxed: Constructs the final prompt by combining the user's question with the retrieved, cited context. (通过将用户问题与检索到的、带有引用的上下文结合，构建最终提示词。)
generation.generate: The LLM generates the final answer, conditioned on the augmented prompt. (LLM基于增强后的提示词生成最终答案。)

Compiling the Pipeline File

To transform the declarative YAML configuration into an executable pipeline, you need to compile it using the UltraRAG command-line tool.

要将声明式的YAML配置转换为可执行的流程，您需要使用UltraRAG命令行工具对其进行编译。

Execute the following command in your terminal:

在终端中执行以下命令：

ultrarag build examples/RAG.yaml

This step validates the configuration, resolves dependencies between components, and prepares the pipeline for execution within the UltraRAG UI.

此步骤验证配置，解析组件之间的依赖关系，并准备在UltraRAG UI中执行的流程。

Configuring Runtime Parameters

The behavior of each component in the pipeline is finely controlled through a separate parameters file. For a RAG system, two critical backends require configuration: the Embedding/Retrieval backend and the LLM Generation backend.

流程中每个组件的行为通过一个单独的参数文件进行精细控制。对于RAG系统，需要配置两个关键后端：嵌入/检索后端和LLM生成后端。

The main configuration file is examples/parameter/RAG_parameter.yaml. Below is a breakdown of its essential sections:

主配置文件是 examples/parameter/RAG_parameter.yaml。以下是其核心部分的解析：

1. Benchmark Configuration (`benchmark`)

This section defines the dataset used for the demo or evaluation.

此部分定义了用于演示或评估的数据集。

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers # Maps the field containing correct answers
      q_ls: question        # Maps the field containing questions
    limit: -1               # Number of samples to use (-1 for all)
    name: nq                # Benchmark name
    path: data/sample_nq_10.jsonl # Path to the data file
    seed: 42
    shuffle: false

key_map 中的 gt_ls: golden_answers 映射包含正确答案的字段；q_ls: question 映射包含问题的字段。limit: -1 表示使用所有样本。name: nq 是基准测试名称。path: data/sample_nq_10.jsonl 是数据文件路径。

2. Generation Backend Configuration (`generation`)

This is where you configure the LLM that will produce the final answer. UltraRAG supports multiple backends like vllm, openai, and direct Hugging Face (hf) models.

在此处配置将生成最终答案的LLM。UltraRAG支持多个后端，如 vllm、openai 和直接的Hugging Face (hf) 模型。

generation:
  backend: vllm # Primary backend choice. Alternatives: 'openai', 'hf'
  backend_configs:
    vllm:
      model_name_or_path: openbmb/MiniCPM4-8B
      gpu_ids: 2,3
      dtype: auto
      gpu_memory_utilization: 0.9
      trust_remote_code: true
    openai:
      model_name: qwen3-32b
      base_url: http://localhost:8000/v1
      api_key: abc
      concurrency: 8
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: '你是一个专业的UltraRAG问答助手。请一定记住使用中文回答问题。'

backend: vllm 是主要的后端选择，可选 'openai'、'hf'。backend_configs 下为各后端的详细配置，例如 vllm 后端指定了模型路径、使用的GPU等。sampling_params 控制生成参数。system_prompt 设置了系统角色指令。

3. Retriever Configuration (`retriever`)

This is the heart of the RAG configuration. It defines how documents are embedded and retrieved.

这是RAG配置的核心。它定义了文档如何被嵌入和检索。

retriever:
  backend: sentence_transformers # Embedding model backend
  backend_configs:
    sentence_transformers:
      model_name_or_path: openbmb/MiniCPM-Embedding-Light
      gpu_ids: '1'
      trust_remote_code: true
  index_backend: faiss # Vector database/index backend
  index_backend_configs:
    faiss:
      index_path: index/index.index
      index_use_gpu: true
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl # Path to your knowledge base documents
  top_k: 5 # Number of document chunks to retrieve per query

backend: sentence_transformers 指定嵌入模型后端。backend_configs 配置该后端的具体模型和资源。index_backend: faiss 指定向量数据库/索引后端。index_backend_configs 配置索引路径等。corpus_path 指向知识库文档。top_k: 5 是每个查询检索的文档片段数量。

4. Prompt Template Configuration (`prompt`)

Specifies the Jinja2 template used to format the question and retrieved contexts into the final prompt sent to the LLM.

指定用于将问题和检索到的上下文格式化为发送给LLM的最终提示词的Jinja2模板。

prompt:
  template: prompt/qa_rag_citation.jinja

(Note: The provided input content shows multiple, sometimes conflicting, values for some keys (e.g., two backend under generation, two model_name under generation.backend_configs.openai). In a production configuration, you should choose one value per key. The examples above have been streamlined for clarity.)

（注意：提供的输入内容显示某些键存在多个，有时是冲突的值（例如，generation 下有两个 backend，generation.backend_configs.openai 下有两个 model_name）。在生产配置中，每个键应选择一个值。为清晰起见，上面的示例已进行简化。）

Demonstration and Execution

Once the RAG_parameter.yaml file is correctly configured with your chosen models, API keys, and data paths, you are ready to run the demo.

一旦 RAG_parameter.yaml 文件按照您选择的模型、API密钥和数据路径正确配置后，您就可以运行演示了。

Start the UltraRAG UI.

启动 UltraRAG UI。
Within the interface, select the compiled "RAG Pipeline".

在界面中，选择已编译的 "RAG Pipeline"。
Choose the corresponding knowledge base (as defined by retriever.corpus_path).

选择相应的知识库（由 retriever.corpus_path 定义）。
Submit a query. The UI will visually demonstrate the pipeline in action:

提交查询。UI将直观地展示流程的运行情况：
- The retrieval component fetches relevant document chunks.
  
  检索组件获取相关的文档片段。
- These chunks are injected into the prompt template.
  
  这些片段被注入到提示词模板中。
- The LLM generates an answer, which now includes citations (e.g., [1], [2]) pointing back to the source documents.
  
  LLM生成答案，该答案现在包含指向源文档的引用（例如 [1], [2]）。

This end-to-end visualization allows you to not only see the final, more accurate output but also to inspect and understand the retrieval and reasoning process behind it, which is crucial for debugging and improving RAG applications.

这种端到端的可视化不仅让您能看到最终更准确的输出，还能检查和理解其背后的检索和推理过程，这对于调试和改进RAG应用至关重要。

Conclusion

The standardized RAG pipeline in UltraRAG UI abstracts away the complexities of wiring together retrieval, prompting, and generation components. By providing a clear, configurable YAML interface and a visual execution environment, it significantly lowers the barrier to experimenting with advanced RAG techniques. Developers can quickly swap out embedding models, LLMs, vector databases, or prompt templates to find the optimal combination for their specific use case, accelerating the development of robust, knowledge-grounded AI applications.

UltraRAG UI 中的标准化 RAG 流程抽象了将检索、提示和生成组件连接在一起的复杂性。通过提供清晰、可配置的 YAML 接口和可视化执行环境，它极大地降低了实验先进 RAG 技术的门槛。开发人员可以快速更换嵌入模型、LLM、向量数据库或提示词模板，以找到适合其特定用例的最佳组合，从而加速开发健壮的、基于知识的 AI 应用。