AI搜索工具如何演进?2023-2025年OpenAI、Gemini、Perplexity对比指南 | Geoz.com.cn
English Summary: The article evaluates the evolution of AI-powered search tools from 2023 to 2025, highlighting significant improvements in accuracy and usability. It compares implementations from OpenAI (o3/o4-mini), Google Gemini, and Perplexity, noting OpenAI's real-time reasoning with search integration as particularly effective. The author shares practical use cases including code porting and technical research, concluding that AI search has become genuinely useful for research tasks while raising questions about the future economic model of the web.
中文摘要翻译:本文评估了从2023年到2025年AI搜索工具的演进,重点强调了准确性和可用性的显著改进。比较了OpenAI(o3/o4-mini)、Google Gemini和Perplexity的实现方案,指出OpenAI的实时推理与搜索集成特别有效。作者分享了包括代码移植和技术研究在内的实际用例,得出结论:AI搜索在研究任务中已变得真正有用,同时引发了关于网络未来经济模式的疑问。
Introduction
For the past two and a half years, the feature I’ve most wanted from Large Language Models (LLMs) is the ability to autonomously handle search-based research tasks. We saw the first glimpses of this capability in early 2023 with the launch of Perplexity and the GPT-4-powered Microsoft Bing. Since then, numerous players, most notably Google Gemini and ChatGPT Search, have attempted to tackle this challenge.
在过去的两年半里,我最希望大型语言模型(LLM)具备的功能,就是能够代表我自主完成基于搜索的研究任务。我们在2023年初首次窥见了这种能力的雏形,先是Perplexity的推出,随后是GPT-4驱动的微软必应。自此之后,众多参与者,最著名的是谷歌Gemini和ChatGPT搜索,都尝试攻克这一难题。
Those early 2023-era versions were promising but ultimately disappointing. They had a strong tendency to hallucinate details not present in the search results, to the point where their outputs were fundamentally untrustworthy.
2023年初的那些版本虽然前景看好,但最终令人失望。它们有很强的倾向去捏造搜索结果中不存在的细节,以至于其输出从根本上就不可信。
However, in the first half of 2025, I believe these systems have finally crossed a critical threshold into becoming genuinely useful tools.
然而,在2025年上半年,我认为这些系统终于跨越了一个关键门槛,成为了真正有用的工具。
Key Developments in AI-Powered Search
Deep Research: The First Wave
First came the "Deep Research" implementations. Google Gemini, followed by OpenAI and Perplexity, launched products under that name. These were impressive in their scope: they could take a query, process for several minutes, and assemble lengthy reports with dozens, sometimes hundreds, of citations. Gemini's version received a significant upgrade a few weeks ago when it switched to using the Gemini 2.5 Pro model, yielding some outstanding results.
首先是"深度研究"的实现。谷歌Gemini,随后是OpenAI和Perplexity,都推出了以此为名的产品。这些产品在其范围上令人印象深刻:它们可以接受一个查询,处理几分钟,然后生成带有数十个(有时是数百个)引用的长篇报告。Gemini的版本在几周前切换至使用Gemini 2.5 Pro模型后,得到了重大升级,并产生了一些出色的结果。
However, waiting several minutes for a 10+ page report is not an ideal workflow for many users, myself included. The need for speed and interactivity remained unaddressed.
然而,等待几分钟来获取一份10多页的报告,对于包括我在内的许多用户来说,并非理想的工作流程。对速度和交互性的需求仍未得到满足。
A Leap Forward: Reasoning-Integrated Search
Last week, OpenAI released search-enabled versions of its o3 and o4-mini models through ChatGPT. On the surface, these appear similar to previous concepts: LLMs with the option to call a search tool. The critical difference lies in their execution: these models can run searches as an integral part of their chain-of-thought reasoning process, before formulating a final answer.
上周,OpenAI通过ChatGPT发布了支持搜索的o3和o4-mini模型。表面上看,它们与之前的概念相似:即可以选择调用搜索工具的LLM。关键区别在于它们的执行方式:这些模型可以将搜索作为其思维链推理过程的一个组成部分来运行,然后再形成最终答案。
This architectural shift proves to be transformative. In my testing, I've posed diverse questions to ChatGPT (using o3 or o4-mini) and received genuinely useful, search-grounded answers. Hallucinations appear significantly reduced, and the search behavior feels more intuitive and aligned with user intent.
这种架构上的转变被证明是具有变革性的。在我的测试中,我向ChatGPT(使用o3或o4-mini)提出了各种问题,并收到了真正有用的、基于搜索的答案。幻觉现象显著减少,搜索行为感觉更直观,更符合用户意图。
Example Queries Handled Effectively:
- Get me specs including VRAM for RTX 5090 and RTX PRO 6000—plus release dates and prices. (获取RTX 5090和RTX PRO 6000的规格,包括显存,以及发布日期和价格。)
- Find me a website tool that lets me paste a URL and get a word count and estimated reading time. (帮我找一个网站工具,可以让我粘贴URL并获取字数统计和预计阅读时间。)
- Figure out what search engine ChatGPT is using for o3 and o4-mini. (弄清楚ChatGPT为o3和o4-mini使用的是哪个搜索引擎。)
- Look up Cloudflare R2 pricing and use Python to calculate the cost from this dashboard screenshot. (查找Cloudflare R2的定价,并使用Python根据这个仪表盘截图计算成本。)
Interacting with o3 feels akin to using a Deep Research tool in real-time, without the lengthy wait for an overly verbose report.
与o3交互的感觉类似于实时使用深度研究工具,而无需长时间等待一份过于冗长的报告。
My hypothesis is that excelling at this task requires a robust reasoning model. Evaluating web search results is inherently difficult due to the prevalence of spam and deceptive information. Previous implementations often faltered because the web is saturated with low-quality content. Perhaps o3, o4-mini, and Gemini 2.5 Pro are the first models to cross a critical "gullibility-resistance" threshold, enabling them to perform this task effectively.
我的假设是,要出色完成这项任务需要一个强大的推理模型。由于垃圾信息和欺骗性信息普遍存在,评估网络搜索结果本身就非常困难。先前的实现常常失败,因为网络上充斥着低质量内容。也许o3、o4-mini和Gemini 2.5 Pro是首批跨越了关键"抗轻信"阈值的模型,使它们能够有效地执行此任务。
The Competitive Landscape: Who's Leading?
While OpenAI's recent releases show strong progress, the competitive field is dynamic.
Google Gemini: The user-facing Gemini app incorporates search but often operates as a black box, not showing its search queries or sources. This lack of transparency undermines trust. This is a significant missed opportunity, given Google's unparalleled search index. Furthermore, Google's AI-assisted search in its main search interface has been prone to high-profile hallucinations, which risks damaging its brand reputation.
谷歌Gemini: 面向用户的Gemini应用程序集成了搜索功能,但通常像一个黑箱一样运行,不显示其搜索查询或来源。这种透明度的缺失削弱了信任。考虑到谷歌无与伦比的搜索索引,这是一个重大的错失机会。此外,谷歌在其主要搜索界面中的AI辅助搜索一直容易出现引人注目的幻觉,这有损害其品牌声誉的风险。
Anthropic Claude: Claude added web search capability about a month ago, but its implementation doesn't feel as polished. It relies on the Brave search index, which may not be as comprehensive as Bing's or Google's. More importantly, its searches don't seem to be as seamlessly integrated into a powerful, step-by-step reasoning flow.
Anthropic Claude: Claude大约在一个月前增加了网络搜索功能,但其实现感觉不够完善。它依赖于Brave搜索索引,其全面性可能不如必应或谷歌的索引。更重要的是,其搜索似乎没有如此无缝地集成到强大的、逐步推进的推理流程中。
A Practical Breakthrough: Code Migration via AI Search
The most compelling demonstration of this technology's utility occurred recently during a code migration task.
My Gemini image segmentation tool was using the deprecated @google/generative-ai library, which has been superseded by the new @google/genai SDK. Lacking the motivation to manually perform the upgrade, I pasted the full HTML/JavaScript code into ChatGPT (using the o4-mini-high model) with the prompt:
这项技术实用性最令人信服的展示,最近发生在一个代码迁移任务中。我的Gemini图像分割工具正在使用已弃用的
@google/generative-ai库,该库已被新的@google/genaiSDK取代。由于缺乏手动执行升级的动力,我将完整的HTML/JavaScript代码粘贴到ChatGPT(使用o4-mini-high模型)中,并给出提示:
"This code needs to be upgraded to the new recommended JavaScript library from Google. Figure out what that is and then look up enough documentation to port this code to it."
"此代码需要升级到谷歌推荐的新JavaScript库。弄清楚那是什么,然后查找足够的文档来将此代码移植到它上面。"
The model processed for 21 seconds, executed multiple searches, identified the new library (which was released after its training data cutoff), located the relevant upgrade instructions, and successfully produced a fully functional ported version of the code. This was achieved from a mobile device out of idle curiosity, and the result was both impressive and surprising in its accuracy.
模型处理了21秒,执行了多次搜索,识别出新库(该库在其训练数据截止日期之后发布),找到了相关的升级说明,并成功生成了一个功能完整的移植版代码。这仅仅是出于一时好奇在移动设备上完成的,其结果在准确性上既令人印象深刻又出乎意料。
Implications and Future Considerations
Shifting User Behavior and Economic Models
I'm documenting this now because reliable AI-powered research has been a personal benchmark for over two years. These tools appear to have finally become useful as research assistants, reducing the need to meticulously fact-check every piece of information they provide.
我现在记录这一点,因为可靠的AI驱动研究两年多来一直是我个人的一个基准。这些工具似乎终于能够作为研究助手发挥作用,减少了对其提供的每一条信息进行细致事实核查的需要。
While I don't yet trust them to be error-free, my confidence is sufficient to forgo personal fact-checking for lower-stakes tasks. This usability leap, however, accelerates the realization of predicted challenges. A central question emerges: Why visit websites directly if a chatbot can provide synthesized answers instantly?
虽然我还不相信它们完全无误,但我的信心足以让我在风险较低的任务中放弃个人事实核查。然而,这种可用性的飞跃加速了预期挑战的实现。一个核心问题出现了:如果聊天机器人可以即时提供综合答案,为什么还要直接访问网站?
Legal battles over training data and content use began when LLMs were relatively primitive. The stakes are exponentially higher now that the models are effective. I can already observe my personal use of traditional Google search declining significantly. The web is likely in for a turbulent period as it adapts to a new, yet-to-be-defined economic model in the age of capable AI agents.
关于训练数据和内容使用的法律斗争在LLM还相对初级时就开始了。如今模型变得有效,利害关系呈指数级增长。我已经可以观察到我个人对传统谷歌搜索的使用正在显著下降。在强大的AI代理时代,网络很可能要经历一段动荡时期,以适应一种新的、尚未定义的经济模式。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。