DeepSearch API v2.0如何提升LLM智能体的检索与推理能力?
AI Summary (BLUF)
DeepSearch API v2.0 enhances LLM agent workflows with structured citations, multimodal content retrieval (academic, biomedical, financial), and smarter ranking for reliable, traceable AI systems.
原文翻译: DeepSearch API v2.0 通过结构化引用、多模态内容检索(学术、生物医学、金融)和更智能的排名,增强了LLM智能体工作流程,构建可靠、可追溯的AI系统。
我们非常激动地发布 DeepSearch API 的重大更新!此次发布的灵感源于我们与您的每一次对话:在构建从快速内部工具到完整的智能体与搜索系统的过程中,您所分享的反馈、疑问和遇到的阻碍。感谢您。
We are thrilled to announce a major update to the DeepSearch API! This release is shaped by our ongoing conversations with you—your feedback, questions, and the challenges you've shared while building everything from quick internal tools to full-blown agentic and search systems. Thank you.
在不同的应用场景中,我们观察到一个共同的痛点:问题不在于系统无法工作,而在于接口和返回的内容未能充分满足实际需求。开发者常常需要费力处理那些无法直接融入工作流的结果:引用字段不够深入、参考文献结构不够清晰、跨模态的格式解析失败。内容本身是存在的,但其呈现形式往往无法与您在应用、链式调用、智能体或用户界面中的实际使用方式无缝衔接。因此,我们针对这些问题进行了修复,同时扩展了在生物医学研究和金融内容等领域的覆盖范围。您构建的并非简单的搜索框,而是需要能够推理、引用和组合信息的复杂系统。
Across various use cases, we observed a common pain point: the issue wasn't that the system didn't work, but that the interface and returned content weren't doing enough heavy lifting. Developers often had to work around results that weren't ready to plug directly into their workflows: citation fields that weren't deep enough, references that weren't always structured cleanly, and formatting that broke across modalities. The content was there, but not always in a form that integrated seamlessly with how you were actually using it—in apps, chains, agents, or UIs. So we fixed that, while also expanding coverage into areas like biomedical research and financial content. You're not just building search boxes; you're building systems that need to reason, cite, and compose information.
本次更新并非推倒重来,而是在原有坚实基础上进行的直接增强:包括对专有全文内容、网络和金融数据的深度搜索。新的亮点在于更广泛的覆盖范围、更多模态的支持(包括图像和图表)、更智能的重新排序,以及为智能体工作流和工具调用优化的更简洁体验。
This update doesn't reset anything; it builds directly on what already worked: deep search across proprietary full-text content, the web, and financial data. What's new is better coverage, support for more modalities (including images and figures), smarter reranking, and a cleaner experience optimized for agent workflows and tool-calling.
无论您是通过简单的提示词调用我们,还是将我们集成到多步骤的复杂链中,本次更新的核心目标都是弥合“检索到的信息”与“系统实际可用的信息”之间的鸿沟。以下是本次更新的主要内容及其重要意义。
Whether you're calling us from a simple prompt or stitching us into multi-step chains, the core goal of this update is to close the gap between what you retrieve and what your system can actually use. Here’s what’s new and why it matters.
可检索的全新内容
1. Wiley 学术资源
您现在可以访问 Wiley 旗下涵盖商业、金融和会计领域的全文期刊和教科书。
You can now access full-text journals and textbooks from Wiley across Business, Finance, and Accounting.
每项结果均包含:
Each result includes:
- 按章节结构化的全文内容
Full body text, structured by section
- 作者及所属机构列表
Lists of authors and affiliations
- 内联引用字符串(用于在生成的答案中进行归属标注)
Inline citation strings (for attribution in generated answers)
- 结构化的参考文献(用于构建知识图谱、追踪引用)
Structured references (for graph building, citation following)
示例: 一名学生构建了一个学习助手,用于检索教科书内容并生成摘要、课程计划和学习指南。该助手使用引用字符串为其答案添加脚注,并链接到原文章节以供深入阅读。
Example: A student builds a study assistant that retrieves textbook content and generates summaries, lesson plans, and study guides. The assistant uses citation strings to footnote its answers and links to source sections for follow-up reading.
2. PubMed (2022年至今)
多模态生物医学文章:包含全文、图像、图注、参考文献和元数据。
Multimodal biomedical articles: full text, images, captions, references, and metadata.
我们观察到的应用场景:
Use cases we’ve seen:
- 一个团队正在开发临床研究摘要工具,使用我们的 API 仅检索近期(2022年后)的研究,提取研究结果部分,并链接参考文献以确保可追溯性。
A team working on a clinical research summarizer that uses our API to retrieve only recent studies (post-2022), extracts findings sections, and links out references for traceability.
- 一个由大语言模型驱动的医学搜索助手,可按作者、发表日期进行筛选,并在结果中包含图表。
An LLM-powered medical search assistant that filters by author, publication date, and includes figures in the results.
3. arXiv (完整存档)
为检索而结构化的完整 arXiv 存档。
The full arXiv archive structured for retrieval.
包含:
Includes:
- 论文全文
Full paper text
- 公式和数学块
Equations and math blocks
- 结构化的引文和参考文献列表
Structured citations and reference lists
- 用于下游处理的元数据
Metadata for downstream processing
示例: 一位开发者为机器学习论文构建研究智能体,可以检索全文、解析公式并追踪参考文献,而无需进行网页抓取、数据清洗或猜测。
Example: A developer building a research agent for ML papers can retrieve the full text, parse out formulas, and follow references without scraping, cleaning, or guessing.
4. 金融数据
我们已集成:
We’ve integrated:
- 涵盖股票、外汇、期权和指数的实时与历史价格数据
Real-time and historical pricing across equities, FX, options, and indices
- 来自金融资讯源的社论和新闻内容(每月至少 18 万次检索额度)
Editorial and news content from financial sources (min. 180K retrievals/month)
这带来的可能性:
What this enables:
- 将价格查询与市场评论相结合的智能副驾
Copilots that combine pricing lookups with market commentary
- 将金融建议锚定在历史趋势和引用来源中的 RAG 系统
RAG systems that anchor financial recommendations in historical trends and cited sources
- 为财报电话会议、波动率飙升或宏观经济事件提供背景信息的仪表板或界面
Dashboards or interfaces that pull context for earnings calls, volatility spikes, or macroeconomic events
5. 网络搜索(增强版)
我们升级了网页内容提取能力:
We’ve upgraded web extraction:
- 更完整的全文内容
More complete full-text content
- 支持嵌入式图像
Support for embedded images
- 更快、更稳定的响应延迟
Faster and more consistent latency
示例: 一个大语言模型工具链同时检索关于某上市公司的专有内容和近期网络文章。API 返回图像(如图表)、作者元数据以及可直接用于摘要生成的整洁段落。
Example: An LLM toolchain retrieves both proprietary content and recent web articles about a public company. The API returns images (e.g., charts), author metadata, and clean paragraphs ready for summarization.
API v2.0:更简洁,更可控
无破坏性变更。但提供了更多的结构和控制能力。
No breaking changes. But far more structure and control.
新增功能:
What's new:
citation_string:可直接用于生成内容或脚注的引用字符串citation_string: drop-in ready citations for completions or footnotesreferences:完整的对外引用列表,确保可追溯性references: full lists of outbound citations for traceabilityauthors:结构化的作者列表,支持摘要生成或实体链接authors: structured lists to support summarization or entity linkingstart_date/end_date:按发布时间窗口筛选start_date/end_date: filter by publication windowincluded_sources:明确指定从哪些数据集中检索included_sources: explicitly choose what datasets to retrieve fromis_tool_call:为工具调用优化的输出格式(可选)is_tool_call: format optimized for tool calls (but optional)
示例: 一个 RAG 后端使用 start_date 和 category 过滤器检索论文,将 citation_string 注入提示词,并在用户界面中将 references 显示为可展开的脚注。
Example: A RAG backend retrieves papers using
start_dateandcategoryfilters, injects thecitation_stringinto the prompt, and shows thereferencesas expandable footnotes in a UI.
仅需 4 行代码即可开始使用:
Get started in 4 lines of code:
from deepsearch import DeepSearch
client = DeepSearch(api_key="your_key")
results = client.search("quantum computing", included_sources=["arxiv"])
print(results[0]['citation_string'])
像研究员一样思考的排序算法
仅有关联性是不够的。我们现在使用额外的信号为文档评分:
Relevance isn’t enough. We now score documents with additional signals:
- 引用次数
Citation count
- 出版商可信度等级
Publisher trust level
- 元数据完整性
Metadata completeness
这有助于使用检索功能的大语言模型和智能体挑选出不仅相关,而且可靠的来源。
This helps LLMs and agents that use retrieval to pick results that aren’t just related—but reliable.
示例: 一个科学助手为某个论断对多个竞争性来源进行排序。使用我们的重新排序器后,来自高引用综述论文的结果会比低引用论文的排名更高。
Example: A scientific assistant ranks competing sources for a claim. With our reranker, results from heavily cited review papers rank higher compared to lower cited papers.
开发者工具
全新的 Playground:
The new Playground:
- 按模态(文本、图像等)可视化结果
Visualizes results by modality (text, images, etc.)
- 允许您在原始输出和结构化输出之间切换
Lets you toggle between raw and structured output
- 支持时间过滤、来源定向等更多功能
Supports time filtering, source targeting, and more
我们还添加了 Google 身份验证,以便团队能更快上手(GitHub 登录即将推出)。
We also added Google Auth so teams can get started faster (GitHub coming soon).
您可以用它构建什么
本次发布专为构建卓越应用的团队而设计:
This release is designed for teams building awesome things:
用于知识工作与研究
For Knowledge Work & Research
- 具有内联引用和结构化参考文献的上下文感知工具
Context-aware tools with inline citations and structured references
- 能够呈现摘要、作者和相关引用的学习助手
Study assistants that surface summaries, authors, and relevant citations
- 需要图表的生成式视频教程:为任何论文制作 3B1B 风格的视频(请构建这个应用)。
Generative video tutorials that need diagrams: 3B1B for any paper (please build this).
用于金融领域
For Finance
- 能够结合实时数据与社论背景进行回答的助手
Assistants that answer with real-time data and editorial context
- 将历史价格走势与市场评论关联追溯的研究工作流
Research workflows that trace historical pricing alongside market commentary
- 能够检索并利用引用数据解释波动率飙升的事件驱动型智能体
Event-driven agents that retrieve and explain volatility spikes with cited data
用于高级检索系统
For Advanced Retrieval Systems
- 能够返回结构化上下文的语义搜索引擎
Semantic search engines that return structured context
- 能够追踪参考文献,而不仅仅是查找结果的工具调用链LLM智能体按顺序调用多个工具(如搜索API、计算器)完成复杂任务的流程。
Tool-calling chains that can follow references, not just find results
请告诉我们您下一步需要什么。
Let us know what you need next.
我们持续构建 🛠️
We Build 🛠️
常见问题(FAQ)
DeepSearch API v2.0 如何帮助我的LLM智能体基于大语言模型的自主程序,能够执行任务、调用工具并进行推理。生成更可靠的引用?
API v2.0 提供结构化引用机器可读的引用数据格式,包含作者、来源、时间等元数据,便于AI系统自动处理和追溯。和参考文献,包括内联引用字符串和按章节组织的全文,确保智能体生成的答案具备可追溯的归属标注,直接融入工作流。
新版本支持检索哪些类型的专业内容?
支持多模态内容检索,涵盖 Wiley 学术期刊、PubMed 生物医学文章(2022年至今)、完整 arXiv 存档、金融数据及增强版网络搜索,满足学术、生物医学和金融领域需求。
API v2.0 在智能体工作流程优化方面有哪些改进?
提供更简洁可控的接口、更智能的重新排序,并优化了对工具调用的支持,旨在弥合检索信息与实际可用信息之间的鸿沟,构建能推理和组合信息的复杂系统。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。