RAGstack如何帮助企业部署私有ChatGPT替代方案？（支持Llama 2/Falcon）

引言：企业为何需要私有化RAG方案？

检索增强生成（Retrieval Augmented Generation, RAG） 是一种通过从外部系统检索信息，并将其通过提示词注入大型语言模型（LLM）上下文窗口，从而增强模型能力的技术。这使得LLM能够获取其训练数据之外的信息，这对于几乎所有的企业用例都至关重要。例如，这些信息可以来自当前网页、Confluence或Salesforce等SaaS应用的数据，以及销售合同、PDF等文档。

检索增强生成（Retrieval Augmented Generation, RAG） 是一种通过从外部系统检索信息，并将其通过提示词注入大型语言模型（LLM）上下文窗口，从而增强模型能力的技术。这使得LLM能够获取其训练数据之外的信息，这对于几乎所有的企业用例都至关重要。例如，这些信息可以来自当前网页、Confluence或Salesforce等SaaS应用的数据，以及销售合同、PDF等文档。

与微调模型相比，RAG具有显著优势：成本更低、部署更快，并且由于每个响应都附带信息来源，因此可靠性更高。

Compared to fine-tuning models, RAG offers significant advantages: lower cost, faster deployment, and higher reliability since each response is accompanied by its source of information.

RAGstack项目旨在提供一个开箱即用的解决方案，让企业能够在自己的虚拟私有云（VPC）内部署一个私有的类ChatGPT替代方案。它可以连接到组织的知识库，充当一个“企业知识库问答助手”，并支持Llama 2、Falcon、GPT4All等开源大语言模型。

The RAGstack project aims to provide an out-of-the-box solution that enables enterprises to deploy a private ChatGPT-like alternative within their own Virtual Private Cloud (VPC). It can connect to an organization's knowledge base, serving as a "corporate knowledge oracle," and supports open-source large language models such as Llama 2, Falcon, and GPT4All.

RAGstack核心架构解析

RAGstack部署了一套完整的检索增强生成技术栈，其核心组件如下：

RAGstack deploys a complete Retrieval-Augmented Generation technology stack, with its core components as follows:

开源大语言模型（LLM）

RAGstack支持多种开源LLM，以适应不同的部署环境和硬件需求。

RAGstack supports multiple open-source LLMs to accommodate different deployment environments and hardware requirements.


模型名称	部署环境	核心特点	硬件要求
GPT4All	本地运行	由Nomic AI开发，专为消费级CPU优化，无需GPU即可运行。	消费级CPU
Falcon-7B	云端 (GKE)	由阿联酋技术创新研究所开发，7B参数规模，性能优异。	GPU支持的GKE集群
Llama 2 (7B)	云端 (GKE)	由Meta发布，7B参数版本，在社区中拥有广泛的生态和优化。	GPU支持的GKE集群

向量数据库：Qdrant

RAGstack选用Qdrant作为其向量搜索引擎。Qdrant是一个用Rust编写的高性能开源向量数据库，支持自托管，能够高效处理高维向量的相似性搜索，是构建RAG系统检索层的理想选择。

RAGstack selects Qdrant as its vector search engine. Qdrant is a high-performance, open-source vector database written in Rust, supporting self-hosting. It can efficiently handle similarity searches for high-dimensional vectors, making it an ideal choice for building the retrieval layer of a RAG system.

服务器与用户界面

RAGstack提供了一个简洁的服务器和Web用户界面，核心功能是处理PDF文件上传。用户可以通过该UI，基于上传的PDF文档内容，与后台的Qdrant向量数据库和选定的开源LLM进行对话式问答。

RAGstack provides a streamlined server and web user interface, with the core functionality of handling PDF file uploads. Users can engage in conversational Q&A based on the content of uploaded PDF documents through this UI, leveraging the backend Qdrant vector database and the selected open-source LLM.

RAGstack UI界面截图

部署指南：从本地到云端

本地运行（使用GPT4All）

在本地开发环境运行RAGstack是最快捷的体验方式，主要使用CPU运行的GPT4All模型。

Running RAGstack in a local development environment is the quickest way to experience it, primarily using the CPU-based GPT4All model.

环境配置:
- 复制 ragstack-ui/local.env 到 ragstack-ui/.env
- 复制 server/example.env 到 server/.env
- 在 server/.env 和 ragstack-ui/.env 中，将 YOUR_SUPABASE_URL 和 YOUR_SUPABASE_KEY / YOUR_SUPABASE_PUBLIC_KEY 替换为您的Supabase项目凭据（可在Supabase控制台的 Settings > API 中找到）。
- Copy ragstack-ui/local.env to ragstack-ui/.env
- Copy server/example.env to server/.env
- In server/.env and ragstack-ui/.env, replace YOUR_SUPABASE_URL and YOUR_SUPABASE_KEY / YOUR_SUPABASE_PUBLIC_KEY with your Supabase project credentials (found in the Supabase dashboard under Settings > API).

数据库准备: 在Supabase中创建 ragstack_users 表，结构如下：

Database Preparation: Create the ragstack_users table in Supabase with the following structure:


列名	类型	说明
id	uuid	主键，用户唯一标识
app_id	uuid	应用ID
secret_key	uuid	用户密钥
email	text	用户邮箱
avatar_url	text	头像链接
full_name	text	用户全名

启动服务: 运行 scripts/local/run-dev 脚本。该脚本会自动下载GPT4All模型文件到 server/llm/local/ 目录，并在本地启动服务器、LLM和Qdrant向量数据库。

Start Services: Run the scripts/local/run-dev script. This script will automatically download the GPT4All model file to the server/llm/local/ directory and start the server, LLM, and Qdrant vector database locally.

当在日志中看到 INFO: Application startup complete. 信息时，表示所有服务已就绪。

When you see the message INFO: Application startup complete. in the logs, it indicates that all services are ready.

云端部署对比

RAGstack支持在主流云平台上进行生产级部署，主要使用需要GPU的Falcon-7B或Llama 2模型。所有部署脚本均基于Terraform实现，确保基础设施即代码（IaC）的最佳实践。

RAGstack supports production-level deployment on major cloud platforms, primarily using GPU-required models like Falcon-7B or Llama 2. All deployment scripts are implemented using Terraform, ensuring Infrastructure as Code (IaC) best practices.


云平台	部署脚本	核心服务	关键配置/说明
Google Cloud Platform (GCP)	`scripts/gcp/deploy-gcp.sh`	Google Kubernetes Engine (GKE) with GPU	需提供GCP项目ID、服务账号密钥、区域等。部署后需在UI的`.env`中设置`VITE_SERVER_URL`指向Cloud Run实例。
Amazon Web Services (AWS)	`scripts/aws/deploy-aws.sh`	Amazon ECS on EC2 with GPU	需提供AWS凭证。部署后需在UI的`.env`中设置`VITE_SERVER_URL`指向应用负载均衡器(ALB)地址。
Microsoft Azure	`./azure/deploy-aks.sh`	Azure Kubernetes Service (AKS) with GPU	需提供Azure订阅信息。注意：默认使用配备NVIDIA Tesla T4加速器的节点池，并非所有订阅都可用此资源。

通用部署步骤:

运行对应云平台的部署脚本，并根据提示输入必要参数（如云凭证、模型选择、HuggingFace令牌等）。
（如适用）若在GCP部署Falcon-7B时遇到GPU驱动相关错误，可能需要手动运行提供的gcloud和kubectl命令来配置GPU驱动。
部署完成后，在 ragstack-ui 目录创建 .env 文件，并将 VITE_SERVER_URL 环境变量设置为后端服务的访问地址。

General Deployment Steps:

Run the deployment script for the corresponding cloud platform and enter necessary parameters as prompted (e.g., cloud credentials, model selection, HuggingFace token).

(If applicable) If encountering GPU driver-related errors during Falcon-7B deployment on GCP, you may need to manually run the provided gcloud and kubectl commands to configure the GPU driver.

After deployment is complete, create a .env file in the ragstack-ui directory and set the VITE_SERVER_URL environment variable to the access address of the backend service.

项目路线图与致谢

发展路线图

RAGstack项目持续演进，以下为当前的功能支持与开发状态：

The RAGstack project is continuously evolving. Below is the current status of feature support and development:

✅ GPT4all支持 (GPT4all support)
✅ Falcon-7b支持 (Falcon-7b support)
✅ GCP部署 (Deployment on GCP)
✅ AWS部署 (Deployment on AWS)
✅ Azure部署 (Deployment on Azure)
🚧 Llama-2-40b支持 (Llama-2-40b support) - 开发中

致谢

RAGstack中容器化Falcon 7B的代码源自Het Trivedi的教程仓库。您可以阅读他在Medium上关于如何将Falcon Docker化的文章以了解更多细节。感谢开源社区的贡献！

The code for containerizing Falcon 7B in RAGstack originates from Het Trivedi's tutorial repository. You can read his article on Medium about how to dockerize Falcon for more details. Thanks to the contributions of the open-source community!

总结

RAGstack为企业提供了一个强大、灵活且注重隐私的解决方案，用于构建内部的智能问答系统。通过结合开源LLM、高效的向量数据库和现代化的云原生部署方式，它使得企业能够以可控的成本，安全地利用自身知识库赋能员工与客户。无论是从本地开发测试，还是扩展到GCP、AWS、Azure等主流云平台进行生产部署，RAGstack都提供了清晰的路径，是探索私有化RAG应用实践的优秀起点。

RAGstack provides enterprises with a powerful, flexible, and privacy-focused solution for building internal intelligent Q&A systems. By combining open-source LLMs, efficient vector databases, and modern cloud-native deployment methods, it enables enterprises to securely leverage their own knowledge bases to empower employees and customers at a controllable cost. Whether starting from local development and testing or scaling to production deployments on major cloud platforms like GCP, AWS, and Azure, RAGstack offers a clear path and serves as an excellent starting point for exploring the practice of private RAG applications.

常见问题（FAQ）

RAGstack支持哪些开源大语言模型？

RAGstack支持Llama 2、Falcon和GPT4All等开源大语言模型，可根据本地或云端部署环境选择适合的模型。

RAGstack如何实现企业知识库问答？

通过集成Qdrant向量数据库，RAGstack能够检索组织知识库中的信息，并注入到LLM中生成回答，实现基于文档的对话式问答。

部署RAGstack需要什么硬件条件？

本地运行可使用消费级CPU运行GPT4All；云端部署需要GPU支持的GKE集群来运行Falcon-7B或Llama 2等模型。

AI Summary (BLUF)