GEO

WebMCP是什么?谷歌微软新标准如何让AI智能体直接调用网站工具 | Geoz.com.cn

2026/2/15
WebMCP是什么?谷歌微软新标准如何让AI智能体直接调用网站工具 | Geoz.com.cn
AI Summary (BLUF)

WebMCP (Web Model Context Protocol) is a new web standard developed by Google and Microsoft that enables websites to expose structured, callable tools directly to AI agents through browser APIs, replacing inefficient screen-scraping and DOM parsing methods with single structured function calls. This significantly reduces costs, improves reliability, and accelerates development for enterprise AI deployments. (WebMCP(Web模型上下文协议)是由谷歌和微软联合开发的新网页标准,允许网站通过浏览器API直接向AI代理暴露结构化、可调用的工具,用单一结构化函数调用取代低效的屏幕截图和DOM解析方法。这显著降低了企业AI部署的成本,提高了可靠性,并加速了开发进程。)

引言:AI 智能体——网络世界的“游客”

When an AI agent visits a website, it’s essentially a tourist who doesn’t speak the local language. Whether built on LangChain, Claude Code, or the increasingly popular OpenClaw framework, the agent is reduced to guessing which buttons to press: scraping raw HTML, firing off screenshots to multimodal models, and burning through thousands of tokens just to figure out where a search bar is.

当 AI 智能体访问一个网站时,它本质上就像一个不会说当地语言的游客。无论它是基于 LangChain、Claude Code 还是日益流行的 OpenClaw 框架构建的,智能体都只能猜测该按哪个按钮:抓取原始 HTML、将屏幕截图发送给多模态模型,并耗费数千个 Token 只是为了找到一个搜索框。

That era may be ending. Earlier this week, the Google Chrome team launched WebMCP — Web Model Context Protocol — as an early preview in Chrome 146 Canary. WebMCP, which was developed jointly by engineers at Google and Microsoft and incubated through the W3C's Web Machine Learning community group, is a proposed web standard that lets any website expose structured, callable tools directly to AI agents through a new browser API: navigator.modelContext.

那个时代可能即将结束。本周早些时候,Google Chrome 团队在 Chrome 146 Canary 版本中作为早期预览版推出了 WebMCP(Web Model Context Protocol)。WebMCP 由 Google 和 Microsoft 的工程师联合开发,并通过 W3C 的 Web 机器学习社区组孵化,是一项提议的 Web 标准。它允许任何网站通过新的浏览器 API navigator.modelContext,将结构化的、可调用的工具直接暴露给 AI 智能体。

The implications for enterprise IT are significant. Instead of building and maintaining separate back-end MCP servers in Python or Node.js to connect their web applications to AI platforms, development teams can now wrap their existing client-side JavaScript logic into agent-readable tools — without re-architecting a single page.

这对企业 IT 的影响是重大的。开发团队现在可以将他们现有的客户端 JavaScript 逻辑封装成智能体可读的工具,而无需重构任何页面,从而避免了为将 Web 应用程序连接到 AI 平台而构建和维护单独的 Python 或 Node.js 后端 MCP 服务器。

AI 智能体:昂贵且脆弱的网络“游客”

The cost and reliability issues with current approaches to web-agent (browser agents) interaction are well understood by anyone who has deployed them at scale. The two dominant methods — visual screen-scraping and DOM parsing — both suffer from fundamental inefficiencies that directly affect enterprise budgets.

任何大规模部署过 Web 智能体(浏览器智能体)的人,都深知当前交互方式的成本和可靠性问题。两种主流方法——视觉屏幕抓取和 DOM 解析——都存在根本性的低效问题,直接影响企业预算。

With screenshot-based approaches, agents pass images into multimodal models (like Claude and Gemini) and hope the model can identify not only what is on the screen, but where buttons, form fields, and interactive elements are located. Each image consumes thousands of tokens and can have a long latency. With DOM-based approaches, agents ingest raw HTML and JavaScript — a foreign language full of various tags, CSS rules, and structural markup that is irrelevant to the task at hand but still consumes context window space and inference cost.

在基于截图的方法中,智能体将图像传递给多模态模型(如 Claude 和 Gemini),并希望模型不仅能识别屏幕上的内容,还能识别按钮、表单字段和交互元素的位置。每张图像消耗数千个 Token,并可能产生较长的延迟。在基于 DOM 的方法中,智能体摄取原始的 HTML 和 JavaScript——这是一种充满各种标签、CSS 规则和结构标记的“外语”,它们与手头的任务无关,但仍会消耗上下文窗口空间和推理成本。

In both cases, the agent is translating between what the website was designed for (human eyes) and what the model needs (structured data about available actions). A single product search that a human completes in seconds can require dozens of sequential agent interactions — clicking filters, scrolling pages, parsing results — each one an inference call that adds latency and cost.

在这两种情况下,智能体都在网站的设计初衷(供人眼观看)和模型的需求(关于可用操作的结构化数据)之间进行翻译。人类几秒钟就能完成的一次产品搜索,可能需要智能体进行数十次顺序交互——点击筛选器、滚动页面、解析结果——每一次交互都是一次推理调用,增加了延迟和成本。

WebMCP 工作原理:两个 API,一个标准

WebMCP proposes two complementary APIs that serve as a bridge between websites and AI agents.

WebMCP 提出了两个互补的 API,作为网站和 AI 智能体之间的桥梁。

声明式 API

The Declarative API handles standard actions that can be defined directly in existing HTML forms. For organizations with well-structured forms already in production, this pathway requires minimal additional work; by adding tool names and descriptions to existing form markup, developers can make those forms callable by agents. If your HTML forms are already clean and well-structured, you are probably already 80% of the way there.

声明式 API 处理可以直接在现有 HTML 表单中定义的标准操作。对于已经拥有结构良好的生产环境表单的组织来说,这条路径只需要最少量的额外工作;通过向现有表单标记中添加工具名称和描述,开发人员就可以使这些表单能被智能体调用。如果你的 HTML 表单已经清晰且结构良好,那么你可能已经完成了 80% 的工作。

命令式 API

The Imperative API handles more complex, dynamic interactions that require JavaScript execution. This is where developers define richer tool schemas — conceptually similar to the tool definitions sent to the OpenAI or Anthropic API endpoints, but running entirely client-side in the browser. Through the registerTool(), a website can expose functions like searchProducts(query, filters) or orderPrints(copies, page_size) with full parameter schemas and natural language descriptions.

命令式 API 处理需要执行 JavaScript 的更复杂、动态的交互。在这里,开发人员可以定义更丰富的工具模式——概念上类似于发送给 OpenAI 或 Anthropic API 端点的工具定义,但完全在浏览器的客户端运行。通过 registerTool(),网站可以暴露诸如 searchProducts(query, filters)orderPrints(copies, page_size) 这样的函数,并附带完整的参数模式和自然语言描述。

The key insight is that a single tool call through WebMCP can replace what might have been dozens of browser-use interactions. An e-commerce site that registers a searchProducts tool lets the agent make one structured function call and receive structured JSON results, rather than having the agent click through filter dropdowns, scroll through paginated results, and screenshot each page.

关键在于,一次通过 WebMCP 的工具调用,可以替代原本可能需要数十次的浏览器交互。一个注册了 searchProducts 工具的电子商务网站,可以让智能体进行一次结构化的函数调用并接收结构化的 JSON 结果,而不是让智能体点击筛选下拉菜单、滚动分页结果并对每个页面截图。

企业案例:成本、可靠性与脆弱抓取的终结

For IT decision makers evaluating agentic AI deployments, WebMCP addresses three persistent pain points simultaneously.

对于评估智能体 AI 部署的 IT 决策者而言,WebMCP 同时解决了三个长期存在的痛点。

成本降低是最直接可量化的好处。通过用单一的结构化工具调用替代一系列截图捕获、多模态推理调用和迭代式 DOM 解析,组织可以预期显著降低 Token 消耗。

Cost reduction is the most immediately quantifiable benefit. By replacing sequences of screenshot captures, multimodal inference calls, and iterative DOM parsing with single structured tool calls, organizations can expect significant reductions in token consumption.

可靠性提升是因为智能体不再需要猜测页面结构。当网站明确发布一个工具契约——“这是我支持的函数,这是它们的参数,这是它们返回的内容”——智能体就能基于确定性而非推理来操作。对于任何由已注册工具覆盖的交互,由于 UI 变更、动态内容加载或元素识别模糊而导致的交互失败将基本被消除。

Reliability improves because agents are no longer guessing about page structure. When a website explicitly publishes a tool contract — "here are the functions I support, here are their parameters, here is what they return" — the agent operates with certainty rather than inference. Failed interactions due to UI changes, dynamic content loading, or ambiguous element identification are largely eliminated for any interaction covered by a registered tool.

开发速度加快是因为 Web 团队可以利用他们现有的前端 JavaScript,而无需搭建单独的后端基础设施。该规范强调,用户通过页面 UI 可以完成的任何任务,都可以通过重用页面现有的 JavaScript 代码的大部分来制作成工具。团队无需学习新的服务器框架,也无需为智能体消费者维护单独的 API 接口。

Development velocity accelerates because web teams can leverage their existing front-end JavaScript rather than standing up separate backend infrastructure. The specification emphasizes that any task a user can accomplish through a page's UI can be made into a tool by reusing much of the page's existing JavaScript code. Teams do not need to learn new server frameworks or maintain separate API surfaces for agent consumers.

设计初衷:人机协同,而非全自动

A critical architectural decision separates WebMCP from the fully autonomous agent paradigm that has dominated recent headlines. The standard is explicitly designed around cooperative, human-in-the-loop workflows — not unsupervised automation.

一个关键的架构决策将 WebMCP 与近期占据头条的全自动智能体范式区分开来。该标准明确围绕协同的、人在回路的工作流程设计,而非无监督的自动化。

According to Khushal Sagar, a staff software engineer for Chrome, the WebMCP specification identifies three pillars that underpin this philosophy.

据 Chrome 高级软件工程师 Khushal Sagar 介绍,WebMCP 规范确立了支撑这一理念的三个支柱。

  • 上下文:智能体理解用户正在做什么所需的所有数据,包括通常当前不在屏幕上显示的内容。
    • Context: All the data agents need to understand what the user is doing, including content that is often not currently visible on screen.
  • 能力:智能体可以代表用户执行的操作,从回答问题到填写表单。
    • Capabilities: Actions the agent can take on the user's behalf, from answering questions to filling out forms.
  • 协调:当智能体遇到无法自主解决的情况时,控制用户和智能体之间的交接。
    • Coordination: Controlling the handoff between user and agent when the agent encounters situations it cannot resolve autonomously.

The specification's authors at Google and Microsoft illustrate this with a shopping scenario: a user named Maya asks her AI assistant to help find an eco-friendly dress for a wedding. The agent suggests vendors, opens a browser to a dress site, and discovers the page exposes WebMCP tools like getDresses() and showDresses(). When Maya's criteria go beyond the site's basic filters, the agent calls those tools to fetch product data, uses its own reasoning to filter for "cocktail-attire appropriate," and then calls showDresses() to update the page with only the relevant results. It's a fluid loop of human taste and agent capability, exactly the kind of collaborative browsing that WebMCP is designed to enable.

Google 和 Microsoft 的规范作者用一个购物场景说明了这一点:一位名叫 Maya 的用户请她的 AI 助手帮忙为婚礼找一件环保的连衣裙。智能体推荐了商家,打开浏览器访问一个连衣裙网站,并发现该页面暴露了 WebMCP 工具,如 getDresses()showDresses()。当 Maya 的标准超出网站的基本筛选条件时,智能体调用这些工具来获取产品数据,运用自己的推理筛选出“适合鸡尾酒会着装”的款式,然后调用 showDresses() 更新页面,只显示相关结果。这是一个人类品味与智能体能力流畅循环的过程,正是 WebMCP 旨在实现的那种协同浏览。

This is not a headless browsing standard. The specification explicitly states that headless and fully autonomous scenarios are non-goals. For those use cases, the authors point to existing protocols like Google's Agent-to-Agent (A2A) protocol. WebMCP is about the browser — where the user is present, watching, and collaborating.

这不是一个无头浏览标准。该规范明确指出,无头浏览和全自动场景并非其目标。对于这些用例,作者指向了现有的协议,如 Google 的 Agent-to-Agent (A2A) 协议。WebMCP 关注的是浏览器——用户在场、观看并协作的地方。

并非替代 MCP,而是补充

WebMCP is not a replacement for Anthropic's Model Context Protocol, despite sharing a conceptual lineage and a portion of its name. It does not follow the JSON-RPC specification that MCP uses for client-server communication. Where MCP operates as a back-end protocol connecting AI platforms to service providers through hosted servers, WebMCP operates entirely client-side within the browser.

WebMCP 并非 Anthropic 的 Model Context Protocol 的替代品,尽管它们共享概念渊源和部分名称。它不遵循 MCP 用于客户端-服务器通信的 JSON-RPC 规范。MCP 作为一种后端协议,通过托管服务器连接 AI 平台和服务提供商,而 WebMCP 则完全在浏览器内以客户端方式运行。

The relationship is complementary. A travel company might maintain a back-end MCP server for direct API integrations with AI platforms like ChatGPT or Claude, while simultaneously implementing WebMCP tools on its consumer-facing website so that browser-based agents can interact with its booking flow in the context of a user's active session. The two standards serve different interaction patterns without conflict.

两者的关系是互补的。一家旅游公司可能维护一个后端 MCP 服务器,用于与 ChatGPT 或 Claude 等 AI 平台直接进行 API 集成;同时在其面向消费者的网站上实现 WebMCP 工具,以便基于浏览器的智能体能够在用户活跃会话的上下文中与其预订流程交互。这两个标准服务于不同的交互模式,互不冲突。

The distinction matters for enterprise architects. Back-end MCP integrations are appropriate for service-to-service automation where no browser UI is needed. WebMCP is appropriate when the user is present and the interaction benefits from shared visual context — which describes the majority of consumer-facing web interactions that enterprises care about.

这种区别对企业架构师很重要。后端 MCP 集成适用于不需要浏览器 UI 的服务间自动化。而当用户在场,且交互受益于共享的视觉上下文时——这描述了企业所关注的大多数面向消费者的 Web 交互——WebMCP 则是合适的。

未来展望:从实验标志到正式标准

WebMCP is currently available in Chrome 146 Canary behind the "WebMCP for testing" flag at chrome://flags. Developers can join the Chrome Early Preview Program for access to documentation and demos. Other browsers have not yet announced implementation timelines, though Microsoft's active co-authorship of the specification suggests Edge support is likely.

WebMCP 目前已在 Chrome 146 Canary 版本中提供,位于 chrome://flags 下的“WebMCP for testing”标志后。开发人员可以加入 Chrome 早期预览计划以获取文档和演示。其他浏览器尚未宣布实施时间表,但 Microsoft 作为规范的积极合著者,表明 Edge 支持的可能性很大。

Industry observers expect formal browser announcements by mid-to-late 2026, with Google Cloud Next and Google I/O as probable venues for broader rollout announcements. The specification is transitioning from community incubation within the W3C to a formal draft — a process that historically takes months but signals serious institutional commitment.

行业观察家预计,正式的浏览器公告将在 2026 年中后期发布,Google Cloud Next 和 Google I/O 可能是更广泛推广公告的场所。该规范正从 W3C 内部的社区孵化阶段过渡到正式草案阶段——这一过程历来需要数月时间,但标志着严肃的机构承诺。

The comparison that Sagar has drawn is instructive: WebMCP aims to become the USB-C of AI agent interactions with the web. A single, standardized interface that any agent can plug into, replacing the current tangle of bespoke scraping strategies and fragile automation scripts.

Sagar 所做的比喻很有启发性:WebMCP 旨在成为 AI 智能体与 Web 交互的 USB-C 接口。一个单一的、标准化的接口,任何智能体都可以接入,取代当前定制化抓取策略和脆弱自动化脚本的混乱局面。

Whether that vision is realized depends on adoption — by both browser vendors and web developers. But with Google and Microsoft jointly shipping code, the W3C providing institutional scaffolding, and Chrome 146 already running the implementation behind a flag, WebMCP has cleared the most difficult hurdle any web standard faces: getting from proposal to working software.

这一愿景能否实现,取决于浏览器厂商和 Web 开发者的采用情况。但随着 Google 和 Microsoft 联合发布代码,W3C 提供制度框架,以及 Chrome 146 已经在标志后运行其实现,WebMCP 已经清除了任何 Web 标准面临的最困难障碍:从提案到可运行的软件。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。