Piragi和传统RAG框架哪个更好用？（附核心特性对比）

Piragi: Possibly the Most Elegant RAG Interface Yet

The best RAG interface yet. 只需几行代码，即可从本地文件、云存储或网站构建一个功能齐全的检索增强生成（RAG）知识库。Piragi 内置向量存储将数据转换为高维向量并存储的技术，通过向量相似度计算实现语义搜索和内容检索，常用于AI系统的记忆和检索功能。、嵌入模型将文本转换为向量表示的模型，用于语义相似度计算。Semantic Router支持多种嵌入模型，如OpenAI、Cohere、HuggingFace等。、智能引用和自动更新功能，默认免费且本地运行。

可能是目前最优雅的 RAG 接口。 With just a few lines of code, you can build a fully functional Retrieval-Augmented Generation (RAG) knowledge base from local files, cloud storage, or websites. Piragi comes with built-in vector storage, embedding models, smart citations, and auto-updates, and is free and local by default.

from piragi import Ragi

kb = Ragi(["./docs", "s3://bucket/data/**/*.pdf", "https://api.example.com/docs"])
answer = kb.ask("How do I deploy this?")

核心特性

Core Features

Piragi 旨在简化 RAG 应用的构建流程，提供开箱即用的强大功能。其核心优势在于将复杂的文档处理、向量化、检索和生成流程封装在一个简洁的 API 之下。

Piragi is designed to simplify the construction of RAG applications, offering powerful features out of the box. Its core advantage lies in encapsulating complex document processing, vectorization, retrieval, and generation workflows under a concise API.


特性	描述	优势
零配置	默认使用免费的本地模型（如通过 Ollama）	开箱即用，无需 API 密钥或复杂设置
全格式支持	PDF, Word, Excel, Markdown, 代码, URL, 图像, 音频	统一处理多模态数据源
远程存储	支持 S3, GCS, Azure, HDFS, SFTP 及通配符模式	直接处理云端文档，无需下载
智能引用	每个答案都附带来源引用	增强答案可信度与可追溯性
自动更新	后台刷新数据源，查询永不阻塞	知识库保持最新状态
可插拔存储	LanceDB, PostgreSQL, Pinecone, Supabase 或自定义后端	灵活适配不同生产环境
高级检索	HyDE, 混合搜索Hybrid Search，结合语义搜索、全文搜索和图遍历的检索策略，在80ms内提供最大准确度的查询结果。, 交叉编码器重排序	提升检索准确性与相关性
知识图谱A structured knowledge base that represents entities and their relationships in a graph format.	实体与关系提取，增强多跳推理	更好回答涉及关系的复杂问题

快速开始

Quick Start

安装

Installation

通过 pip 安装 Piragi 核心库。如需使用本地大语言模型，可安装 Ollama。

Install the Piragi core library via pip. To use a local large language model, you can install Ollama.

pip install piragi

# 可选：安装 Ollama 以使用本地 LLM
# Optional: Install Ollama for local LLM
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

Piragi 采用模块化设计，可按需安装可选组件以扩展功能。

Piragi adopts a modular design, allowing you to install optional components as needed to extend functionality.

# 可选扩展组件
# Optional extras
pip install piragi[s3]       # S3 支持 / S3 support
pip install piragi[gcs]      # Google Cloud Storage 支持 / Google Cloud Storage support
pip install piragi[azure]    # Azure Blob Storage 支持 / Azure Blob Storage support
pip install piragi[crawler]  # 递归网络爬虫 / Recursive web crawling
pip install piragi[graph]    # 知识图谱 / Knowledge graph
pip install piragi[postgres] # PostgreSQL/pgvector 支持 / PostgreSQL/pgvector support
pip install piragi[pinecone] # Pinecone 支持 / Pinecone support
pip install piragi[supabase] # Supabase 支持 / Supabase support
pip install piragi[all]      # 安装所有组件 / Install everything

基础用法

Basic Usage

以下代码展示了 Piragi 最基础的用法：从本地文件夹创建知识库并进行提问。

The following code demonstrates the most basic usage of Piragi: creating a knowledge base from a local folder and asking questions.

from piragi import Ragi

# 从本地文件创建知识库
# Create a knowledge base from local files
kb = Ragi("./docs")

# 从多个数据源创建（支持通配符）
# Create from multiple sources (supports glob patterns)
kb = Ragi(["./docs/*.pdf", "https://api.docs.com", "./code/**/*.py"])

# 提问并获取答案
# Ask a question and get an answer
answer = kb.ask("What is the API rate limit?")
print(answer.text)

# 查看答案的引用来源
# View the citations for the answer
for cite in answer.citations:
    print(f"{cite.source}: {cite.score:.0%}")

高级功能详解

Detailed Advanced Features

远程文件系统与网络爬取

Remote Filesystems and Web Crawling

Piragi 的强大之处在于能够无缝集成多种数据源。

The power of Piragi lies in its ability to seamlessly integrate multiple data sources.

远程文件系统：您可以直接从云存储中读取文件，无需预先下载到本地。

Remote Filesystems: You can read files directly from cloud storage without pre-downloading them locally.

# 从 Amazon S3 读取
# Read from Amazon S3
kb = Ragi("s3://my-bucket/docs/**/*.pdf")

# 从 Google Cloud Storage 读取
# Read from Google Cloud Storage
kb = Ragi("gs://my-bucket/reports/*.md")

# 混合本地与远程源
# Mix local and remote sources
kb = Ragi([
    "./local-docs",
    "s3://bucket/remote-docs/**/*.pdf",
    "https://example.com/api-docs"
])

网络爬取：通过简单的 /** 后缀语法，即可递归抓取整个网站或特定部分的内容。

Web Crawling: With the simple /** suffix syntax, you can recursively crawl entire websites or specific sections.

# 爬取整个网站
# Crawl entire site
kb = Ragi("https://docs.example.com/**")

# 爬取特定 API 文档部分
# Crawl specific API docs section
kb = Ragi("https://docs.example.com/api/**")

默认爬取深度为 3，最多 100 个页面。使用此功能需安装 piragi[crawler]。
By default, it crawls to a depth of 3, with a maximum of 100 pages. This feature requires piragi[crawler].

可插拔的向量存储将数据转换为高维向量并存储的技术，通过向量相似度计算实现语义搜索和内容检索，常用于AI系统的记忆和检索功能。后端

Pluggable Vector Store Backends

Piragi 支持多种向量数据库，您可以根据性能、持久化和部署环境需求进行选择。

Piragi supports multiple vector databases, allowing you to choose based on performance, persistence, and deployment environment needs.


存储后端	配置示例	适用场景	安装命令
LanceDB (默认)	`Ragi("./docs")` 或 `Ragi("./docs", store="s3://bucket/indices")`	本地开发或 S3 持久化	内置
PostgreSQL (pgvector)	`Ragi("./docs", store="postgres://user:pass@localhost/db")`	已有 PostgreSQL 的生产环境	`pip install piragi[postgres]`
Pinecone	`Ragi("./docs", store=PineconeStore(api_key="...", index_name="my-index"))`	需要托管向量服务的场景	`pip install piragi[pinecone]`
Supabase	`Ragi("./docs", store=SupabaseStore(url="https://xxx.supabase.co", key="..."))`	使用 Supabase 生态的项目	`pip install piragi[supabase]`

智能分块与高级检索策略

Smart Chunking and Advanced Retrieval Strategies

为了提高检索质量，Piragi 提供了多种分块和检索增强策略。

To improve retrieval quality, Piragi offers various chunking and retrieval enhancement strategies.

分块策略：不同的文档类型和查询需求适合不同的分块方式。

Chunking Strategies: Different document types and query needs are suited to different chunking methods.

# 语义分块 - 在主题边界处分割
# Semantic - splits at topic boundaries
kb = Ragi("./docs", config={"chunk": {"strategy": "semantic"}})

# 分层分块 - 创建父-子块以兼顾上下文与精度
# Hierarchical - parent-child for context + precision
kb = Ragi("./docs", config={"chunk": {"strategy": "hierarchical"}})

# 上下文分块 - 为每个块生成 LLM 摘要作为上下文
# Contextual - LLM-generated context per chunk
kb = Ragi("./docs", config={"chunk": {"strategy": "contextual"}})

高级检索：通过组合多种技术来显著提升答案的相关性。

Advanced Retrieval: Significantly improve answer relevance by combining multiple techniques.

kb = Ragi("./docs", config={
    "retrieval": {
        "use_hyde": True,           # 假设文档嵌入（提升模糊查询）
        "use_hybrid_search": True,  # BM25 + 向量混合搜索
        "use_cross_encoder": True,  # 神经重排序（精排）
    }
})

知识图谱A structured knowledge base that represents entities and their relationships in a graph format.集成

Knowledge Graph Integration

对于包含大量实体和关系的文档（如公司文档、技术手册），启用知识图谱A structured knowledge base that represents entities and their relationships in a graph format.可以极大改善对“关系类”问题的回答能力。

For documents containing numerous entities and relationships (e.g., company documentation, technical manuals), enabling the knowledge graph can greatly improve the ability to answer "relationship-type" questions.

# 通过一个标志启用
# Enable with a single flag
kb = Ragi("./docs", graph=True)

# 系统会在文档摄取时自动提取实体和关系，并用其增强检索
# The system automatically extracts entities and relationships during ingestion and uses them to augment retrieval
answer = kb.ask("Who reports to Alice?")

# 直接访问图谱数据
# Direct graph access
kb.graph.entities()           # 获取所有实体 / Get all entities
kb.graph.neighbors("alice")   # 获取“alice”的邻居节点 / Get neighbors of "alice"
kb.graph.triples()            # 获取所有（主体，关系，客体）三元组 / Get all (subject, relation, object) triples

此功能需要安装 pip install piragi[graph]。
This feature requires pip install piragi[graph].

配置与异步支持

Configuration and Async Support

详细配置

Detailed Configuration

Piragi 的所有行为都可以通过一个统一的配置字典进行精细控制。

All behaviors of Piragi can be finely controlled through a unified configuration dictionary.

config = {
    "llm": { # 大语言模型配置 / Large Language Model configuration
        "model": "llama3.2",
        "base_url": "http://localhost:11434/v1"
    },
    "embedding": { # 嵌入模型配置 / Embedding model configuration
        "model": "all-mpnet-base-v2",
        "batch_size": 32
    },
    "chunk": { # 分块配置 / Chunking configuration
        "strategy": "fixed",
        "size": 512,
        "overlap": 50
    },
    "retrieval": { # 检索策略配置 / Retrieval strategy configuration
        "use_hyde": False,
        "use_hybrid_search": False,
        "use_cross_encoder": False
    },
    "auto_update": { # 自动更新配置 / Auto-update configuration
        "enabled": True,
        "interval": 300 # 每300秒检查一次更新 / Check for updates every 300 seconds
    }
}

异步 API

Async API

为了在现代 Web 框架（如 FastAPI）中实现非阻塞操作，Piragi 提供了功能完整的异步接口 AsyncRagi。

To enable non-blocking operations in modern web frameworks (like FastAPI), Piragi provides a fully functional asynchronous interface AsyncRagi.

from piragi import AsyncRagi

kb = AsyncRagi("./docs")

# 简单调用
# Simple await
await kb.add("./more-docs")
answer = await kb.ask("What is X?")

# 与 FastAPI 集成示例
# Integration example with FastAPI
@app.post("/ingest")
async def ingest(files: list[str]):
    await kb.add(files)
    return {"status": "done"}

所有核心方法（add(), ask(), retrieve(), refresh(), count(), clear()）都提供了异步版本。

All core methods (add(), ask(), retrieve(), refresh(), count(), clear()) have asynchronous versions.

总结

Conclusion

Piragi 通过其极简的 API 设计、强大的多源集成能力和可扩展的模块化架构，显著降低了构建生产级 RAG 应用的门槛。它将复杂的文档处理流水线、向量检索优化和 LLM 交互封装起来，让开发者能够专注于业务逻辑而非底层基础设施。

Piragi significantly lowers the barrier to building production-grade RAG applications through its minimalist API design, powerful multi-source integration capabilities, and extensible modular architecture. It encapsulates complex document processing pipelines, vector retrieval optimization, and LLM interactions, allowing developers to focus on business logic rather than underlying infrastructure.

无论是快速构建一个本地文档问答工具，还是开发一个集成多种企业数据源的知识中台，Piragi 都提供了一个高效、灵活的起点。其 MIT 许可证也确保了可以在商业项目中自由使用。

Whether you're quickly building a local document Q&A tool or developing a knowledge platform integrating various enterprise data sources, Piragi provides an efficient and flexible starting point. Its MIT license also ensures free use in commercial projects.

项目资源：

PyPI: pip install piragi
完整文档: https://pypi.org/project/piragi/

Project Resources:

PyPI: pip install piragi

Full Documentation: https://pypi.org/project/piragi/

常见问题（FAQ）

Piragi如何确保答案的可信度？

Piragi内置智能引用功能，每个生成的答案都会自动附带其来源引用，明确标注信息出处，从而增强答案的可信度与可追溯性。

Piragi支持从哪些地方获取数据？

Piragi支持多种数据源，包括本地文件（如PDF、Word）、远程存储（如S3、GCS）以及直接通过URL进行网络爬取，实现统一的多模态数据处理。

使用Piragi需要复杂的配置吗？

不需要。Piragi是零配置框架，默认使用免费的本地模型（如通过Ollama），开箱即用，无需API密钥或复杂设置即可快速构建RAG应用。

AI Summary (BLUF)