Open Deep Research如何实现开源深度网络研究？（附Firecrawl与推理模型解析）

项目概述

Open Deep Research 是 OpenAI Deep Research 实验的开源复刻版本。与使用微调版 o3 模型不同，本项目采用了一种创新的方法：结合 Firecrawl 的网页提取与搜索功能，并利用推理模型进行深度网络研究。

Open Deep Research 是 OpenAI Deep Research 实验的开源复刻版本。与使用微调版 o3 模型不同，本项目采用了一种创新的方法：结合 Firecrawl 的网页提取与搜索功能，并利用推理模型进行深度网络研究。

您可以通过此链接查看演示。

Open Deep Research Hero

核心特性

Firecrawl 搜索与提取

Firecrawl Search + Extract
- 通过搜索为 AI 提供实时数据 (Feed realtime data to the AI via search)
- 通过提取功能从多个网站获取结构化数据 (Extract structured data from multiple websites via extract)

Next.js 应用框架

Next.js App Router
- 先进的路由机制，实现无缝导航与高性能 (Advanced routing for seamless navigation and performance)
- React 服务器组件与服务器操作，支持服务端渲染并提升性能 (React Server Components (RSCs) and Server Actions for server-side rendering and increased performance)

AI SDK 集成

AI SDK
- 统一的 API，用于生成文本、结构化对象和调用 LLM 工具 (Unified API for generating text, structured objects, and tool calls with LLMs)
- 用于构建动态聊天和生成式用户界面的钩子函数 (Hooks for building dynamic chat and generative user interfaces)
- 支持 OpenAI（默认）、Anthropic、Cohere 及其他模型提供商 (Supports OpenAI (default), Anthropic, Cohere, and other model providers)

用户界面与样式

shadcn/ui
- 使用 Tailwind CSS 进行样式设计 (Styling with Tailwind CSS)
- 基于 Radix UI 的组件基元，确保可访问性与灵活性 (Component primitives from Radix UI for accessibility and flexibility)

数据持久化

Data Persistence
- 使用由 Neon 驱动的 Vercel Postgres 存储聊天历史和用户数据 (Vercel Postgres powered by Neon for saving chat history and user data)
- 使用 Vercel Blob 进行高效的文件存储 (Vercel Blob for efficient file storage)

身份验证

NextAuth.js
- 简单且安全的身份验证 (Simple and secure authentication)

模型提供商支持

本模板默认使用 OpenAI 的 gpt-4o 模型。然而，借助 AI SDK，您只需几行代码即可将 LLM 提供商切换为 OpenAI、Anthropic、Cohere 以及更多其他提供商。

This template ships with OpenAI gpt-4o as the default. However, with the AI SDK, you can switch LLM providers to OpenAI, Anthropic, Cohere, and many more with just a few lines of code.

此仓库兼容 OpenRouter 和 OpenAI。如需使用 OpenRouter，您需要设置 OPENROUTER_API_KEY 环境变量。

This repo is compatible with OpenRouter and OpenAI. To use OpenRouter, you need to set the OPENROUTER_API_KEY environment variable.

函数最大执行时长

默认情况下，函数超时时间设置为 300 秒（5 分钟）。如果您使用的是 Vercel 的 Hobby 套餐，则需要将其减少到 60 秒。您可以通过修改 .env 文件中的 MAX_DURATION 环境变量来调整此设置：

By default, the function timeout is set to 300 seconds (5 minutes). If you're using Vercel's Hobby tier, you'll need to reduce this to 60 seconds. You can adjust this by changing the MAX_DURATION environment variable in your .env file:

MAX_DURATION=60

了解更多相关信息，请访问此链接。

Learn more about it here.

推理模型配置

应用程序使用一个独立的模型来处理推理任务（如研究分析和结构化输出）。这可以通过 REASONING_MODEL 环境变量进行配置。

The application uses a separate model for reasoning tasks (like research analysis and structured outputs). This can be configured using the REASONING_MODEL environment variable.

可用选项


提供商	模型	备注
OpenAI	`gpt-4o`, `o1`, `o3-mini`	原生支持 JSON Schema
TogetherAI	`deepseek-ai/DeepSeek-R1`	需要设置 `BYPASS_JSON_VALIDATION=true`

重要说明

仅特定 OpenAI 模型原生支持结构化 JSON 输出 (Only certain OpenAI models natively support structured JSON outputs)
其他模型（如 deepseek-reasoner）可以使用，但可能需要禁用 JSON Schema 验证 (Other models can be used but may require disabling JSON schema validation)
当使用不支持 JSON Schema 的模型时 (When using models that don't support JSON schema):
- 在 .env 文件中设置 BYPASS_JSON_VALIDATION=true (Set BYPASS_JSON_VALIDATION=true in your .env file)
- 这允许非 OpenAI 模型用于推理任务 (This allows non-OpenAI models to be used for reasoning tasks)
- 注意：没有 JSON 验证，模型响应的结构化程度可能降低 (Note: Without JSON validation, the model responses may be less structured)
推理模型用于需要结构化思维和分析的任务，例如 (The reasoning model is used for tasks that require structured thinking and analysis, such as):
- 研究分析 (Research analysis)
- 文档建议 (Document suggestions)
- 数据提取 (Data extraction)
- 结构化响应 (Structured responses)
如果未指定 REASONING_MODEL，则默认为 o1-mini (If no REASONING_MODEL is specified, it defaults to o1-mini)
如果指定了无效模型，将回退到 o1-mini (If an invalid model is specified, it will fall back to o1-mini)

使用方法

添加到您的 .env 文件中：

Add to your .env file:

# Choose one of: deepseek-reasoner, deepseek-ai/DeepSeek-R1
REASONING_MODEL=deepseek-ai/DeepSeek-R1

# Required when using models that don't support JSON schema (like deepseek-reasoner)
BYPASS_JSON_VALIDATION=true

无论用户为常规聊天选择了哪个模型，当应用程序需要结构化输出或复杂分析时，都会自动使用推理模型。

The reasoning model is automatically used when the application needs structured outputs or complex analysis, regardless of which model the user has selected for general chat.

部署与本地运行

一键部署

您可以通过一键点击将您自己的 Next.js AI Chatbot 版本部署到 Vercel：

You can deploy your own version of the Next.js AI Chatbot to Vercel with one click:

本地运行

您需要使用在 .env.example 中定义的环境变量来运行 Next.js AI Chatbot。建议您使用 Vercel 环境变量进行管理，但仅使用一个 .env 文件也是可行的。

You will need to use the environment variables defined in .env.example to run Next.js AI Chatbot. It's recommended you use Vercel Environment Variables for this, but a .env file is all that is necessary.

注意：您不应提交您的 .env 文件，否则会暴露密钥，使他人能够控制您对各种 OpenAI 和身份验证提供商帐户的访问权限。

Note: You should not commit your .env file or it will expose secrets that will allow others to control access to your various OpenAI and authentication provider accounts.

安装 Vercel CLI: npm i -g vercel (Install Vercel CLI: npm i -g vercel)
将本地实例与 Vercel 和 GitHub 账户关联（创建 .vercel 目录）: vercel link (Link local instance with Vercel and GitHub accounts (creates .vercel directory): vercel link)
下载环境变量: vercel env pull (Download your environment variables: vercel env pull)

1. 首先安装所有依赖

pnpm install

2. 然后运行数据库迁移

pnpm db:migrate

3. 运行应用

pnpm dev

您的应用模板现在应该运行在 http://localhost:3000/。

Your app template should now be running on localhost:3000.

模型依赖

如果您想使用默认模型之外的模型，则需要安装该模型的依赖项。

If you want to use a model other than the default, you will need to install the dependencies for that model.

TogetherAI 的 Deepseek:

TogetherAI's Deepseek:

pnpm add @ai-sdk/togetherai

注意：最大速率限制请参考 https://docs.together.ai/docs/rate-limits。

Note: Maximum rate limit https://docs.together.ai/docs/rate-limits.

总结

Open Deep Research 项目展示了一种构建开源、可定制深度研究工具的现代方法。通过将 Firecrawl 的实时数据获取能力与灵活的 AI SDK 及可配置的推理模型相结合，它为开发者和研究者提供了一个强大的基础框架，可用于探索、分析和理解网络信息。其模块化设计和清晰的配置选项，使得根据特定研究需求调整技术栈成为可能。

常见问题（FAQ）

Open Deep Research 和原版 OpenAI Deep Research 有什么区别？

Open Deep Research 是开源实现，它没有使用微调的 o3 模型，而是创新地结合了 Firecrawl 进行网络搜索/提取，并利用推理模型进行结构化数据分析。

这个工具如何获取和分析网络数据？

它通过 Firecrawl 的搜索功能为 AI 提供实时数据，并使用提取功能从多个网站获取结构化数据，然后由可配置的推理模型（如 GPT-4o、o1 等）进行分析。

部署 Open Deep Research 需要哪些主要技术栈？

核心包括 Next.js 应用框架、AI SDK 集成多种大模型、Vercel Postgres 和 Blob 进行数据持久化，以及 NextAuth.js 处理身份验证，界面采用 shadcn/ui 和 Tailwind CSS。

AI Summary (BLUF)