静态RAG和动态RAG哪个更适合我的项目?(附技术对比与代码实践)
AI Summary (BLUF)
This article systematically introduces the core principles, technical comparisons, mainstream implementation solutions, and code practices of static RAG and dynamic RAG, suitable for technical selection and in-depth learning reference.
原文翻译: 本文系统介绍静态RAG与动态RAG的核心原理、技术对比、主流实现方案及代码实践,适合技术选型和深入学习参考。
Static RAG vs. Dynamic RAG: Core Principles, Technical Comparison, and Practical Guide
本文系统介绍静态 RAG 与动态 RAG 的核心原理、技术对比、主流实现方案及代码实践,适合技术选型和深入学习参考。
This article systematically introduces the core principles, technical comparisons, mainstream implementation approaches, and code practices of Static RAG and Dynamic RAG, suitable for technical selection and in-depth learning reference.
目录
- 一、RAG 技术概述
- 二、静态 RAG
- 2.1 核心原理
- 2.2 优化技术
- 2.3 主流实践方案
- 2.4 代码示例
- 三、动态 RAG
- 3.1 核心原理
- 3.2 主流实现方案
- 四、Self-RAG 详解
- 4.1 核心原理
- 4.2 反思令牌机制
- 4.3 环境配置
- 4.4 代码实现
- 五、CRAG一种动态RAG实现方案,在工作流中检索后评估文档质量,并支持网络搜索作为兜底方案。 详解
- 5.1 核心原理
- 5.2 环境配置
- 5.3 完整代码实现(LangGraph)
- 六、RAGFlow 平台
- 6.1 平台定位
- 6.2 Agent 工作流机制
- 6.3 SDK 使用
- 七、技术对比与选型建议
- 7.1 静态 vs 动态 RAG
- 7.2 Self-RAG vs CRAG一种动态RAG实现方案,在工作流中检索后评估文档质量,并支持网络搜索作为兜底方案。
- 7.3 选型建议
- 八、参考资源
Table of Contents
- I. Overview of RAG Technology
- II. Static RAG
- 2.1 Core Principles
- 2.2 Optimization Techniques
- 2.3 Mainstream Implementation Approaches
- 2.4 Code Examples
- III. Dynamic RAG
- 3.1 Core Principles
- 3.2 Mainstream Implementation Approaches
- IV. Self-RAG Deep Dive
- 4.1 Core Principles
- 4.2 Reflection Token Mechanism
- 4.3 Environment Setup
- 4.4 Code Implementation
- V. CRAG一种动态RAG实现方案,在工作流中检索后评估文档质量,并支持网络搜索作为兜底方案。 Deep Dive
- 5.1 Core Principles
- 5.2 Environment Setup
- 5.3 Complete Code Implementation (LangGraph)
- VI. RAGFlow Platform
- 6.1 Platform Positioning
- 6.2 Agent Workflow Mechanism
- 6.3 SDK Usage
- VII. Technical Comparison and Selection Advice
- 7.1 Static vs. Dynamic RAG
- 7.2 Self-RAG vs. CRAG一种动态RAG实现方案,在工作流中检索后评估文档质量,并支持网络搜索作为兜底方案。
- 7.3 Selection Advice
- VIII. Reference Resources
一、RAG 技术概述
RAG(Retrieval-Augmented Generation,检索增强生成)是一种结合信息检索与文本生成的技术架构,通过从外部知识库检索相关信息来增强大语言模型的生成能力。
RAG (Retrieval-Augmented Generation) is a technical architecture that combines information retrieval with text generation, enhancing the generative capabilities of large language models by retrieving relevant information from external knowledge bases.
为什么需要 RAG?
Why Do We Need RAG?
| 挑战 | RAG 的解决方案 |
|---|---|
| LLM 知识截止日期 | 检索最新的外部知识 |
| 幻觉问题 | 基于检索到的事实生成 |
| 领域知识不足 | 接入专业知识库 |
| 私有数据访问 | 检索企业内部文档 |
Challenge RAG's Solution LLM Knowledge Cutoff Retrieve the latest external knowledge Hallucination Problem Generate based on retrieved facts Insufficient Domain Knowledge Connect to professional knowledge bases Private Data Access Retrieve internal corporate documents RAG 基础流程
RAG Basic Workflow
用户问题 → 向量化 → 相似度检索 → 获取相关文档 → 构建 Prompt → LLM 生成 → 返回答案User Query → Vectorization → Similarity Search → Retrieve Relevant Documents → Construct Prompt → LLM Generation → Return Answer
二、静态 RAG
II. Static RAG
2.1 核心原理
2.1 Core Principles
静态 RAG 是传统的检索增强生成方法,采用**「一次检索、一次生成」**的线性流程:
Static RAG is the traditional retrieval-augmented generation method, employing a linear workflow of "one-time retrieval, one-time generation":
┌─────────────────────────────────────────────────────────────┐ │ 静态 RAG 流程 │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ 用户 │ → │ 向量化│ → │ 检索 │ → │ 拼接 │ → │ 生成 │ │ │ │ 问题 │ │ Query │ │ Top-K│ │Prompt│ │ 答案 │ │ │ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │ │ │ │ 特点:流程固定,检索一次,上下文不再更新 │ └─────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────┐ │ Static RAG Workflow │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ User │ → │Vector│ → │Retrie│ → │Prompt│ → │Genera│ │ │ │Query │ │ Query│ │ Top-K│ │Concat│ │Answer│ │ │ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │ │ │ │ Feature: Fixed process, single retrieval, context not updated│ └─────────────────────────────────────────────────────────────┘「核心特点:」
Core Characteristics:
- 预处理阶段:文档离线分块、向量化、存入向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
- 检索一次性:用户查询时一次性检索相关文档
- 固定上下文:检索到的内容直接拼接到 prompt,不再更新
- 流程固定:查询 → 检索 → 生成,线性执行
- Preprocessing Phase: Document offline chunking, vectorization, storage in vector database.
- One-time Retrieval: Retrieve relevant documents once upon user query.
- Fixed Context: Retrieved content is directly concatenated into the prompt and not updated.
- Fixed Process: Query → Retrieve → Generate, executed linearly.
2.2 优化技术
2.2 Optimization Techniques
虽然是静态流程,但可以通过多种技术优化检索和生成质量:
Although it's a static process, retrieval and generation quality can be optimized through various techniques:
技术 原理 适用场景 「HyDE」 先让 LLM 生成假设答案,用假设答案去检索 问题表述模糊时 「Query Expansion」 扩展用户查询为多个变体,提升召回率 提高召回率 「Reranker」 检索后用交叉编码器重排序 提高精确度 「Sentence Window」 检索小块,返回时扩展上下文窗口 需要更多上下文 「Parent Document」 检索小块,返回其父文档 保持文档完整性 「Fusion RAG」 多路检索结果融合(RRF 算法) 多维度召回 「Hybrid Search」 向量检索 + BM25 关键词检索混合 兼顾语义和关键词
Technique Principle Applicable Scenario 「HyDE」 First let LLM generate a hypothetical answer, then use it for retrieval When the query is ambiguously phrased 「Query Expansion」 Expand user query into multiple variants to improve recall Improving recall rate 「Reranker」 Re-rank retrieved results using a cross-encoder Improving precision 「Sentence Window」 Retrieve small chunks, expand context window when returning Requiring more context 「Parent Document」 Retrieve small chunks, return their parent document Maintaining document integrity 「Fusion RAG」 Fuse multi-path retrieval results (RRF algorithm) Multi-dimensional recall 「Hybrid Search」 Combine vector search + BM25 keyword search Balancing semantics and keywords 2.3 主流实践方案
2.3 Mainstream Implementation Approaches
开源框架
Open Source Frameworks
- 「LangChain」:最流行的 RAG 框架,生态丰富
- 「LlamaIndex」:专注于数据索引和查询,提供多种索引类型
- 「Haystack」:模块化的 NLP 管道框架
- 「LangChain」: The most popular RAG framework with a rich ecosystem.
- 「LlamaIndex」: Focuses on data indexing and querying, offering various index types.
- 「Haystack」: A modular NLP pipeline framework.
向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
Vector Databases
- 「Milvus」:高性能分布式向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
- 「Chroma」:轻量级嵌入式向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
- 「Pinecone」:全托管云向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
- 「Weaviate」:支持多模态的向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
- 「Qdrant」:Rust 编写的高性能向量数据库A database system designed to store and perform high-dimensional semantic similarity searches on vector embeddings of data.
- 「Milvus」: High-performance distributed vector database.
- 「Chroma」: Lightweight embedded vector database.
- 「Pinecone」: Fully-managed cloud vector database.
- 「Weaviate」: Vector database supporting multimodal data.
- 「Qdrant」: High-performance vector database written in Rust.
企业级产品
Enterprise Products
- Azure AI Search + OpenAI
- Amazon Bedrock Knowledge Bases
- Google Vertex AI Search
- Azure AI Search + OpenAI
- Amazon Bedrock Knowledge Bases
- Google Vertex AI Search
2.4 代码示例
2.4 Code Examples
基础静态 RAG(LangChain)
Basic Static RAG (LangChain)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain_community.document_loaders import WebBaseLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough # 1. 加载文档 loader = WebBaseLoader("https://example.com/document") documents = loader.load() # 2. 分块 text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=50 ) splits = text_splitter.split_documents(documents) # 3. 创建向量库 embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents( documents=splits, embedding=embeddings, persist_directory="./chroma_db" ) # 4. 创建检索器 retriever = vectorstore.as_retriever( search_type="similarity", search_kwargs={"k": 4} ) # 5. 定义 Prompt prompt = ChatPromptTemplate.from_messages([ ("system", "基于以下上下文回答问题。如果不知道答案,请说不知道。\n\n上下文:{context}"), ("human", "{question}") ]) # 6. 构建 Chain llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) def format_docs(docs): return"\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # 7. 查询 answer = rag_chain.invoke("你的问题") print(answer)from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain_community.document_loaders import WebBaseLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough # 1. Load Documents loader = WebBaseLoader("https://example.com/document") documents = loader.load() # 2. Chunking text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=50 ) splits = text_splitter.split_documents(documents) # 3. Create Vector Store embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents( documents=splits, embedding=embeddings, persist_directory="./chroma_db" ) # 4. Create Retriever retriever = vectorstore.as_retriever( search_type="similarity", search_kwargs={"k": 4} ) # 5. Define Prompt prompt = ChatPromptTemplate.from_messages([ ("system", "Answer the question based on the following context. If you don't know the answer, say so.\n\nContext: {context}"), ("human", "{question}") ]) # 6. Build Chain llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) def format_docs(docs): return"\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # 7. Query answer = rag_chain.invoke("Your question") print(answer)带 Reranker 的静态 RAG
Static RAG with Reranker
from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import CrossEncoderReranker from langchain_community.cross_encoders import HuggingFaceCrossEncoder # 创建 Reranker model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base") compressor = CrossEncoderReranker(model=model, top_n=3) # 包装检索器 compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) # 使用带重排序的检索器 rag_chain = ( {"context": compression_retriever | format_docs, "question": Runnable ## 常见问题(FAQ) ### 静态RAG和动态RAG的核心区别是什么? 静态RAG采用‘一次检索、一次生成’的线性流程,而动态RAG(如Self-RAG、CRAG)在生成过程中能动态检索、评估和调整,适应性更强。 ### 如何为我的项目选择静态RAG或动态RAG? 根据文章技术对比部分,若需求简单、文档稳定可选静态RAG;若需处理复杂查询、实时更新或高准确性,动态RAG(如Self-RAG、CRAG)更合适。 ### RAGFlow平台在RAG实现中有什么优势? RAGFlow提供集成的Agent工作流机制和SDK,简化了动态RAG的部署与管理,适合需要自动化、可扩展解决方案的企业场景。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。