如何用RAG Web UI搭建自己的知识库问答系统？

RAG Web UI Demo

核心特性 • 快速开始 • 部署指南 • 系统架构 • 开发指南 • 贡献指南 • DeepWiki

📖 项目简介

RAG Web UI 是一个基于 RAG（检索增强生成）技术的智能对话系统，旨在帮助用户基于自有知识库构建智能问答服务。该系统通过结合文档检索与大语言模型，实现了精准、可靠的知识问答功能。

RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology, designed to help users build intelligent Q&A services based on their own knowledge bases. By combining document retrieval with large language models, it achieves accurate and reliable knowledge-based question answering.

系统支持多种 LLM 部署选项，包括 OpenAI、DeepSeek 等云服务，以及通过 Ollama 进行本地模型部署，以满足不同场景下的隐私和成本需求。

The system supports multiple LLM deployment options, including cloud services like OpenAI and DeepSeek, as well as local model deployment through Ollama, meeting privacy and cost requirements in different scenarios.

此外，系统还提供了 OpenAPI 接口，方便通过 API 调用访问知识库。

Additionally, the system provides OpenAPI interfaces for convenient access to the knowledge base via API calls.

✨ 核心特性

📚 智能文档管理

多格式支持：支持 PDF、DOCX、Markdown、Text 等多种文档格式 (Support for multiple document formats: PDF, DOCX, Markdown, Text)
自动化处理：自动文档分块与向量化 (Automatic document chunking and vectorization)
异步与增量：支持异步文档处理与增量更新 (Support for asynchronous document processing and incremental updates)

🤖 高级对话引擎

精准检索生成：基于 RAG 的精确检索与生成 (Precise retrieval and generation based on RAG)
多轮上下文：支持多轮上下文对话 (Support for multi-turn contextual dialogue)
引用溯源：对话中支持引用来源 (Support for reference citations in conversations)

🎯 健壮的系统架构

前后端分离：前后端分离设计 (Frontend-backend separation design)
分布式存储：分布式文件存储 (Distributed file storage)
高性能向量库：支持 ChromaDB、Qdrant，可通过工厂模式轻松切换 (High-performance vector database: Support for ChromaDB, Qdrant with easy switching through Factory pattern)

🖼️ 系统截图

知识库管理仪表盘 (Knowledge Base Management Dashboard)

文档处理仪表盘 (Document Processing Dashboard)

文档列表 (Document List)

带引用的智能聊天界面 (Intelligent Chat Interface with References)

API 密钥管理 (API Key Management)

API 参考文档 (API Reference)

系统架构与流程图

RAG Web UI 的系统流程清晰地划分为文档摄入和查询服务两大核心模块，并通过异步处理和向量数据库实现高效连接。

The system flow of RAG Web UI is clearly divided into two core modules: document ingestion and query service, connected efficiently through asynchronous processing and a vector database.

下图详细展示了从文档上传到智能问答的完整数据流与组件交互。

The following diagram details the complete data flow and component interactions from document upload to intelligent Q&A.

graph TB
    %% Role Definitions
    client["Caller/User"]
    open_api["Open API"]
    
    subgraph import_process["Document Ingestion Process"]
        direction TB
        %% File Storage and Document Processing Flow
        docs["Document Input<br/>(PDF/MD/TXT/DOCX)"]
        job_id["Return Job ID"]
        
        nfs["NFS"]

        subgraph async_process["Asynchronous Document Processing"]
            direction TB
            preprocess["Document Preprocessing<br/>(Text Extraction/Cleaning)"]
            split["Text Splitting<br/>(Segmentation/Overlap)"]
            
            subgraph embedding_process["Embedding Service"]
                direction LR
                embedding_api["Embedding API"] --> embedding_server["Embedding Server"]
            end
            
            store[(Vector Database)]
            
            %% Internal Flow of Asynchronous Processing
            preprocess --> split
            split --> embedding_api
            embedding_server --> store
        end
        
        subgraph job_query["Job Status Query"]
            direction TB
            job_status["Job Status<br/>(Processing/Completed/Failed)"]
        end
    end
    
    %% Query Service Flow  
    subgraph query_process["Query Service"]
        direction LR
        user_history["User History"] --> query["User Query<br/>(Based on User History)"]
        query --> query_embed["Query Embedding"]
        query_embed --> retrieve["Vector Retrieval"]
        retrieve --> rerank["Re-ranking<br/>(Cross-Encoder)"]
        rerank --> context["Context Assembly"]
        context --> llm["LLM Generation"]
        llm --> response["Final Response"]
        query -.-> rerank
    end
    
    %% Main Flow Connections
    client --> |"1.Upload Document"| docs
    docs --> |"2.Generate"| job_id
    docs --> |"3a.Trigger"| async_process
    job_id --> |"3b.Return"| client
    docs --> nfs
    nfs --> preprocess

    %% Open API Retrieval Flow
    open_api --> |"Retrieve Context"| retrieval_service["Retrieval Service"]
    retrieval_service --> |"Access"| store
    retrieval_service --> |"Return Context"| open_api

    %% Status Query Flow
    client --> |"4.Poll"| job_status
    job_status --> |"5.Return Progress"| client
    
    %% Database connects to Query Service
    store --> retrieve

    %% Style Definitions (Adjusted to match GitHub theme colors)
    classDef process fill:#d1ecf1,stroke:#0077b6,stroke-width:1px
    classDef database fill:#e2eafc,stroke:#003566,stroke-width:1px
    classDef input fill:#caf0f8,stroke:#0077b6,stroke-width:1px
    classDef output fill:#ffc8dd,stroke:#d00000,stroke-width:1px
    classDef rerank fill:#cdb4db,stroke:#5a189a,stroke-width:1px
    classDef async fill:#f8edeb,stroke:#7f5539,stroke-width:1px,stroke-dasharray: 5 5
    classDef actor fill:#fefae0,stroke:#606c38,stroke-width:1px
    classDef jobQuery fill:#ffedd8,stroke:#ca6702,stroke-width:1px
    classDef queryProcess fill:#d8f3dc,stroke:#40916c,stroke-width:1px
    classDef embeddingService fill:#ffe5d9,stroke:#9d0208,stroke-width:1px
    classDef importProcess fill:#e5e5e5,stroke:#495057,stroke-width:1px

    %% Applying classes to nodes
    class docs,query,retrieval_service input
    class preprocess,split,query_embed,retrieve,context,llm process
    class store,nfs database
    class response,job_id,job_status output
    class rerank rerank
    class async_process async
    class client,open_api actor
    class job_query jobQuery
    style query_process fill:#d8f3dc,stroke:#40916c,stroke-width:1px
    style embedding_process fill:#ffe5d9,stroke:#9d0208,stroke-width:1px
    style import_process fill:#e5e5e5,stroke:#495057,stroke-width:1px
    style job_query fill:#ffedd8,stroke:#ca6702,stroke-width:1px

核心流程解析

1. 文档摄入流程

用户上传文档后，系统立即返回一个作业ID，并将文档存入NFS。随后，一个异步处理流程被触发，该流程包含文本提取、分块、向量化，并最终将结果存储至向量数据库。用户可通过轮询作业状态接口跟踪处理进度。

After a user uploads a document, the system immediately returns a job ID and stores the document in NFS. Subsequently, an asynchronous processing pipeline is triggered, which includes text extraction, chunking, vectorization, and finally stores the results in the vector database. Users can track the processing progress by polling the job status interface.

2. 查询服务流程

当用户发起查询时，系统会结合用户历史，将查询语句转化为向量，并在向量数据库中进行检索。检索结果经过重排序模型精炼后，组装成上下文，最终提交给大语言模型生成包含引用的最终答案。

When a user initiates a query, the system combines user history, converts the query into a vector, and performs retrieval in the vector database. The retrieved results are refined by a re-ranking model, assembled into context, and finally submitted to the large language model to generate a final answer with citations.

3. OpenAPI 集成

系统提供了独立的检索服务接口，允许外部系统通过 OpenAPI 直接查询向量数据库，获取相关的知识片段，便于集成到其他应用或工作流中。

The system provides an independent retrieval service interface, allowing external systems to directly query the vector database via OpenAPI to obtain relevant knowledge fragments, facilitating integration into other applications or workflows.

🚀 快速开始

环境要求与依赖

在部署 RAG Web UI 之前，请确保满足以下基础环境要求。系统核心依赖于 Python 和 Node.js 生态。

Before deploying RAG Web UI, please ensure the following basic environment requirements are met. The system core relies on the Python and Node.js ecosystems.

组件	最低版本	推荐版本	说明
Python	3.9	3.10+	后端核心语言
Node.js	18	20 LTS	前端运行环境
Docker	20.10	Latest	容器化部署（可选）
数据库	-	PostgreSQL 15	元数据存储

部署方式对比

RAG Web UI 支持多种部署方式以适应不同场景，从快速体验的生产级容器化部署，各有侧重。

RAG Web UI supports multiple deployment methods to adapt to different scenarios, ranging from quick trials to production-grade containerized deployments, each with its own focus.

部署方式	复杂度	适用场景	核心优势	注意事项
Docker Compose	低	快速体验、开发测试	一键启动，隔离性好	需预装 Docker
源码部署	中	定制化开发、生产调试	灵活性高，便于深度定制	需手动管理依赖与环境
Kubernetes	高	云原生、大规模生产	弹性伸缩，高可用	需要 K8s 运维知识

使用 Docker Compose 快速启动（推荐）

对于大多数用户，使用 Docker Compose 是最简单快捷的部署方式。

For most users, using Docker Compose is the simplest and quickest deployment method.

# 1. 克隆仓库
git clone https://github.com/rag-web-ui/rag-web-ui.git
cd rag-web-ui

# 2. 复制并配置环境变量
cp .env.example .env
# 编辑 .env 文件，配置您的 OpenAI API 密钥等

# 3. 启动所有服务
docker-compose up -d

# 4. 访问应用
# 前端： http://localhost:3000
# 后端API： http://localhost:8000
# API文档： http://localhost:8000/docs

服务启动后，您可以通过前端界面进行知识库管理和对话，或直接调用后端 API。

After the services are started, you can manage the knowledge base and conduct conversations through the frontend interface, or directly call the backend API.

🔧 核心配置指南

系统的行为和性能很大程度上取决于配置。以下表格列出了关键配置项及其影响。

The system's behavior and performance largely depend on configuration. The following table lists key configuration items and their impacts.

配置类别	配置项	默认值	说明与建议
LLM 设置	`LLM_PROVIDER`	`openai`	可选：openai, deepseek, ollama, azure_openai
	`OPENAI_API_KEY`	-	使用 OpenAI 时必填
	`OLLAMA_BASE_URL`	`http://host.docker.internal:11434`	使用本地 Ollama 时配置
向量数据库	`VECTOR_STORE`	`chroma`	可选：chroma, qdrant
向量数据库	`CHROMA_PERSIST_DIR`	`./data/chroma`	ChromaDB 数据持久化路径
文本处理	`CHUNK_SIZE`	`1000`	文档分块大小（字符数），影响检索精度
文本处理	`CHUNK_OVERLAP`	`200`	分块重叠大小，保持上下文连贯
嵌入模型	`EMBEDDING_MODEL`	`text-embedding-3-small`	建议根据精度和成本权衡选择

🏗️ 系统架构深度解析

RAG Web UI 采用清晰的分层架构和模块化设计，确保了系统的可维护性、可扩展性和高性能。

RAG Web UI adopts a clear layered architecture and modular design, ensuring maintainability, scalability, and high performance.

架构分层

表示层 (Presentation Layer): 基于 React 构建的现代化 Web 界面，提供直观的知识库管理和对话交互。
应用层 (Application Layer): 使用 FastAPI 构建的 RESTful API，处理业务逻辑，作为前后端通信的桥梁。
服务层 (Service Layer): 核心业务逻辑实现，包括文档处理管道、检索服务、对话引擎等。
数据层 (Data Layer): 负责数据持久化，包括关系型数据库（PostgreSQL）、向量数据库（ChromaDB/Qdrant）和文件存储（NFS/MinIO）。

Presentation Layer: A modern web interface built with React, providing intuitive knowledge base management and conversational interaction.

常见问题（FAQ）

RAG Web UI支持哪些文档格式和LLM模型？

支持PDF、DOCX、Markdown、Text等多种文档格式，以及OpenAI、DeepSeek等云服务和通过Ollama的本地LLM部署，满足不同隐私和成本需求。

RAG Web UI如何保证问答的准确性和可靠性？

系统基于RAG技术，结合文档检索与大语言模型，实现精准检索与生成，并在对话中支持引用溯源，确保回答基于知识库内容。

RAG Web UI的系统架构有什么特点？

采用前后端分离设计，支持分布式文件存储和高性能向量数据库（如ChromaDB、Qdrant），通过工厂模式可轻松切换，具备健壮的可扩展性。