GEO

如何不用向量数据库实现RAG?PostgreSQL+Faiss方案详解

2026/3/17
如何不用向量数据库实现RAG?PostgreSQL+Faiss方案详解
AI Summary (BLUF)

This article details a custom RAG implementation using PostgreSQL for embedding storage instead of vector databases, achieving over 90% accuracy through techniques like HyDE and hierarchical retrieval.

原文翻译: 本文详细介绍了一种自定义RAG实现,使用PostgreSQL而非向量数据库存储嵌入向量,通过HyDE和分层检索等技术实现了超过90%的准确率。

Introduction

In the rapidly evolving landscape of Retrieval-Augmented Generation (RAG), vector databases have become the de facto standard for storing and retrieving document embeddings. However, this approach can introduce limitations in customization, cost, and architectural complexity, especially at scale. At Doclink.io, we challenged this convention by building a RAG system from the ground up, using PostgreSQL for persistent storage and Faiss for efficient similarity search. This article details our journey, the technical decisions we made, and the significant performance improvements we achieved.

在快速发展的检索增强生成(RAG)领域,向量数据库已成为存储和检索文档嵌入的事实标准。然而,这种方法可能会在定制化、成本和架构复杂性方面引入限制,尤其是在大规模应用时。在 Doclink.io,我们通过从头开始构建一个 RAG 系统来挑战这一惯例,该系统使用 PostgreSQL 进行持久化存储,并使用 Faiss 进行高效的相似性搜索。本文详细介绍了我们的历程、所做的技术决策以及我们实现的显著性能改进。

Core Architecture: A Hybrid PostgreSQL-Faiss Approach

The Problem with Conventional Vector Databases

Most contemporary RAG implementations rely heavily on specialized vector databases for chunking and retrieving document snippets. While effective, these systems often present challenges: they can be expensive to scale, offer limited options for fine-tuning retrieval logic, and create data silos by separating vector data from other application data (like user information and document metadata).

大多数当代的 RAG 实现严重依赖专门的向量数据库来进行文档片段的分块和检索。虽然有效,但这些系统常常带来挑战:扩展成本可能很高,微调检索逻辑的选项有限,并且会通过将向量数据与其他应用程序数据(如用户信息和文档元数据)分离而造成数据孤岛。

Our Solution: PostgreSQL for Storage, Faiss for Search

To overcome these limitations, we designed a hybrid architecture:

  • PostgreSQL as the Single Source of Truth: We store every sentence from processed documents as a dedicated embedding vector alongside its textual content, document ID, and hierarchical position (e.g., its relationship to parent headers and sections). This allows us to manage all structured data—user accounts, document metadata, and embeddings—within a single, robust SQL database.
  • Faiss for High-Performance Similarity Search: While PostgreSQL stores the embeddings, we use Facebook AI Similarity Search (Faiss) for the computationally intensive task of finding the most similar vectors to a given query. Faiss is optimized for this purpose and runs efficiently in memory.
  • SQL for Context Reconstruction: During retrieval, we don't just return isolated sentence embeddings. We perform SQL joins using the stored hierarchical data to reconstruct the full context around a retrieved sentence, ensuring the LLM receives logically structured information rather than arbitrary text chunks.

为了克服这些限制,我们设计了一个混合架构:

  • PostgreSQL 作为单一数据源:我们将处理过的文档中的每个句子都作为一个专用的嵌入向量存储,同时存储其文本内容、文档 ID 和层次结构位置(例如,与其父标题和章节的关系)。这使我们能够在一个强大、统一的 SQL 数据库中管理所有结构化数据——用户账户、文档元数据和嵌入向量。
  • Faiss 用于高性能相似性搜索:虽然 PostgreSQL 存储嵌入向量,但我们使用 Facebook AI 相似性搜索(Faiss)来执行计算密集型的任务,即查找与给定查询最相似的向量。Faiss 为此目的进行了优化,并且可以在内存中高效运行。
  • 使用 SQL 重建上下文:在检索过程中,我们不仅仅返回孤立的句子嵌入。我们利用存储的层次结构数据执行 SQL 连接操作,以重建检索到的句子周围的完整上下文,确保 LLM 接收到的是逻辑结构化的信息,而不是任意的文本片段。

From 45% to 90%+ Accuracy: Key Optimization Techniques

Our initial, naive RAG implementation yielded an answer relevancy score of only about 45%. Through rigorous research and experimentation, we implemented several advanced techniques to boost accuracy to over 90%.

我们最初简单的 RAG 实现仅产生了约 45% 的答案相关性得分。通过严格的研究和实验,我们实施了几项先进的技术,将准确率提升到 90% 以上。

1. Hypothetical Document Embeddings (HyDE)

This technique involves prompting the LLM to generate a hypothetical or "ideal" answer to the user's query first. The embedding of this generated document is then used for the vector similarity search, rather than the embedding of the raw query. This helps bridge the vocabulary gap between how users phrase questions and how answers are formulated in the source documents.

这项技术涉及提示 LLM 首先生成一个对用户查询的假设性或“理想”答案。然后使用这个生成文档的嵌入进行向量相似性搜索,而不是原始查询的嵌入。这有助于弥合用户提问方式与源文档中答案表述方式之间的词汇差距。

2. Header Boosting

We assign higher weight or priority to sentences that appear under specific, relevant headers (like "Installation," "API Reference," "Troubleshooting"). During retrieval, matches found within these boosted sections are ranked higher, ensuring that the LLM prioritizes content from the most pertinent parts of the documentation.

我们为出现在特定相关标题(如“安装”、“API 参考”、“故障排除”)下的句子分配更高的权重或优先级。在检索过程中,在这些提升权重的章节中找到的匹配项会获得更高的排名,确保 LLM 优先考虑文档中最相关部分的内容。

3. Hierarchical Retrieval

Instead of treating all sentences as equally independent, our retrieval logic respects the document's inherent structure. When a sentence is retrieved, the system also fetches its parent header, sub-header, and surrounding sentences via SQL joins. This provides the LLM with necessary context, dramatically reducing the likelihood of generating "hallucinated" or out-of-context information.

我们的检索逻辑并非将所有句子视为同等独立,而是尊重文档固有的层次结构。当检索到一个句子时,系统还会通过 SQL 连接获取其父标题、子标题以及周围的句子。这为 LLM 提供了必要的上下文,显著降低了生成“幻觉”或脱离上下文信息的可能性。

Implementation and Deployment

With no prior professional web development experience, our stack choice prioritized simplicity and effectiveness:

  • Backend: A lightweight Python application (using frameworks like FastAPI) handles document processing, embedding generation, and orchestrating queries between Faiss and PostgreSQL.
  • Frontend: A straightforward JavaScript interface for user interaction.
  • Deployment: The entire application is deployed on a Virtual Private Server (VPS), proving that a high-performance RAG system can be built without complex, distributed cloud infrastructure.

在没有专业 Web 开发经验的情况下,我们的技术栈选择优先考虑简单性和有效性:

  • 后端:一个轻量级的 Python 应用程序(使用 FastAPI 等框架)处理文档处理、嵌入生成以及协调 FaissPostgreSQL 之间的查询。
  • 前端:一个简单的 JavaScript 界面用于用户交互。
  • 部署:整个应用程序部署在虚拟专用服务器(VPS)上,这证明无需复杂、分布式的云基础设施也能构建高性能的 RAG 系统。

Doclink.io is free to use for most needs, with a one-time payment premium plan available for users requiring exceptionally high usage limits.

Doclink.io 对大多数需求免费使用,并为需要极高使用限制的用户提供一次性付费的高级计划。

Conclusion and Open Source

Building a custom RAG pipeline with PostgreSQL and Faiss provided us with unparalleled control over data management, retrieval logic, and cost structure. By focusing on preserving document context and implementing research-backed optimization techniques like HyDE and hierarchical retrieval, we achieved professional-grade accuracy.

使用 PostgreSQLFaiss 构建定制的 RAG 流程,使我们在数据管理、检索逻辑和成本结构方面获得了无与伦比的控制力。通过专注于保留文档上下文并实施有研究支持的优化技术(如 HyDE 和分层检索),我们实现了专业级的准确性。

We believe in transparency and community learning. Our complete implementation is open-source and available on GitHub. You can also experience the system firsthand at doclink.io.

我们相信透明度和社区学习。我们的完整实现是开源的,可在 GitHub 上获取。您也可以在 doclink.io 亲身体验该系统。

We are eager to hear from other developers and researchers who are experimenting with RAG architectures. What optimizations have you found most effective? How are you balancing performance, accuracy, and cost in your systems?

我们渴望听到其他正在试验 RAG 架构的开发人员和研究人员的反馈。您发现哪些优化最有效?您如何在系统中平衡性能、准确性和成本?

常见问题(FAQ)

为什么不用向量数据库而选择PostgreSQL存储嵌入向量?

PostgreSQL作为单一数据源可统一管理嵌入向量、文档元数据和用户信息,避免数据孤岛,同时提供更强的定制化能力和更低的扩展成本。

HyDE技术如何帮助提升RAG准确率?

HyDE让LLM先根据查询生成假设答案,用其嵌入进行搜索,有效弥合用户提问与文档表述的词汇差距,显著提升检索相关性。

分层检索在架构中起什么作用?

通过存储句子的层次位置信息,检索时能用SQL重建完整上下文,为LLM提供逻辑结构化的信息而非孤立片段,提升回答质量。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。