Cloudflare的Markdown for Agents功能如何为AI爬虫节省80%令牌？

引言：网络流量格局的转变

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines, and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

在线内容和业务的发现方式正在迅速改变。过去，流量主要来自传统搜索引擎，SEO（搜索引擎优化）决定了谁被优先发现。如今，流量越来越多地来自 AI 爬虫和代理，它们需要从为人类构建的、通常是非结构化的网络中获取结构化数据。

As a business, to continue to stay ahead, now is the time to consider not just human visitors, or traditional wisdom for SEO-optimization, but start to treat agents as first-class citizens.

对于企业而言，为了保持领先地位，现在不仅要考虑人类访客或传统的 SEO 优化策略，更要开始将 AI 代理视为“一等公民”。

为什么 Markdown 至关重要

Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside. A simple ## About Us on a page in markdown costs roughly 3 tokens; its HTML equivalent – <h2 class="section-title" id="about">About Us</h2> – burns 12-15, and that's before you account for the <div> wrappers, nav bars, and script tags that pad every real web page and have zero semantic value.

将原始 HTML 喂给 AI，就像按字数付费去读包装纸，而不是里面的信。页面上的一个简单 Markdown 标题 ## About Us 大约消耗 3 个令牌（tokens）；而其等效的 HTML 代码 <h2 class="section-title" id="about">About Us</h2> 则会消耗 12-15 个令牌，这还没算上那些填充每个真实网页、毫无语义价值的 <div> 包装器、导航栏和脚本标签。

This blog post you’re reading takes 16,180 tokens in HTML and 3,150 tokens when converted to markdown. That’s a 80% reduction in token usage.

您正在阅读的这篇博客文章，其 HTML 版本需要 16,180 个令牌，而转换为 Markdown 后仅需 3,150 个令牌。令牌使用量减少了 80%。

Markdown has quickly become the lingua franca for agents and AI systems as a whole. The format’s explicit structure makes it ideal for AI processing, ultimately resulting in better results while minimizing token waste.

Markdown 已迅速成为整个 AI 代理和系统的通用语。其明确的格式结构使其成为 AI 处理的理想选择，最终能在最小化令牌浪费的同时，获得更好的处理结果。

The problem is that the Web is made of HTML, not markdown, and page weight has been steadily increasing over the years, making pages hard to parse. For agents, their goal is to filter out all non-essential elements and scan the relevant content.

问题在于，网络是由 HTML 而非 Markdown 构建的，而且页面体积多年来一直在稳步增长，使得页面解析变得困难。对于 AI 代理而言，其目标是过滤掉所有非必要元素，只扫描相关内容。

The conversion of HTML to markdown is now a common step for any AI pipeline. Still, this process is far from ideal: it wastes computation, adds costs and processing complexity, and above all, it may not be how the content creator intended their content to be used in the first place.

如今，将 HTML 转换为 Markdown 已成为任何 AI 处理流程中的常见步骤。然而，这个过程远非理想：它浪费计算资源，增加成本和处理复杂性，最重要的是，这可能并非内容创作者最初希望其内容被使用的方式。

What if AI agents could bypass the complexities of intent analysis and document conversion, and instead receive structured markdown directly from the source?

如果 AI 代理能够绕过意图分析和文档转换的复杂性，直接从源头接收结构化的 Markdown，那会怎样？

自动将 HTML 转换为 Markdown

Cloudflare's network now supports real-time content conversion at the source, for enabled zones using content negotiation headers. Now when AI systems request pages from any website that uses Cloudflare and has Markdown for Agents enabled, they can express the preference for text/markdown in the request. Our network will automatically and efficiently convert the HTML to markdown, when possible, on the fly.

Cloudflare 的网络现在支持在源头进行实时内容转换，适用于已启用此功能的区域，使用的是内容协商标头。现在，当 AI 系统向任何使用 Cloudflare 并启用了“为代理提供 Markdown”功能的网站请求页面时，它们可以在请求中表达对 text/markdown 格式的偏好。我们的网络将自动、高效地在可能的情况下，即时将 HTML 转换为 Markdown。

工作原理

Here’s how it works. To fetch the markdown version of any page from a zone with Markdown for Agents enabled, the client needs to add the Accept negotiation header with text/markdown as one of the options. Cloudflare will detect this, fetch the original HTML version from the origin, and convert it to markdown before serving it to the client.

其工作原理如下：要从启用了“为代理提供 Markdown”功能的区域获取任何页面的 Markdown 版本，客户端需要在请求中添加 Accept 协商标头，并将 text/markdown 作为选项之一。Cloudflare 会检测到此标头，从源站获取原始 HTML 版本，并在将其提供给客户端之前转换为 Markdown。

以下是使用 Accept 协商标头请求我们开发者文档页面的 curl 示例：

Here's a curl example with the Accept negotiation header requesting a page from our developer documentation:

curl https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
  -H "Accept: text/markdown"

如果您正在使用 Workers 构建 AI 代理，可以使用 TypeScript：

Or if you’re building an AI Agent using Workers, you can use TypeScript:

const r = await fetch(
  `https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/`,
  {
    headers: {
      Accept: "text/markdown, text/html",
    },
  },
);
const tokenCount = r.headers.get("x-markdown-tokens");
const markdown = await r.text();

我们已经看到当今一些最流行的编码代理（如 Claude Code 和 OpenCode）在请求内容时发送这些 Accept 标头。现在，对此请求的响应将以 Markdown 格式返回。就这么简单。

We already see some of the most popular coding agents today – like Claude Code and OpenCode – send these accept headers with their requests for content. Now, the response to this request is formatted in markdown. It's that simple.

HTTP/2 200
date: Wed, 11 Feb 2026 11:44:48 GMT
content-type: text/markdown; charset=utf-8
content-length: 2899
vary: accept
x-markdown-tokens: 725
content-signal: ai-train=yes, search=yes, ai-input=yes

---
title: Markdown for Agents · Cloudflare Agents docs
---

## What is Markdown for Agents

The ability to parse and convert HTML to Markdown has become foundational for AI.
...

请注意，我们在转换后的响应中包含了一个 x-markdown-tokens 标头，用于指示 Markdown 文档中估计的令牌数量。您可以在您的流程中使用此值，例如计算上下文窗口的大小或决定分块策略。

Note that we include an x-markdown-tokens header with the converted response that indicates the estimated number of tokens in the markdown document. You can use this value in your flow, for example to calculate the size of a context window or to decide on your chunking strategy.

以下是其工作原理的示意图：

Here’s a diagram of how it works:

BLOG-3162 2

内容信号策略

During our last Birthday Week, Cloudflare announced Content Signals — a framework that allows anyone to express their preferences for how their content can be used after it has been accessed.

在上一个生日周期间，Cloudflare 宣布了内容信号（Content Signals）——这是一个框架，允许任何人表达其内容在被访问后如何被使用的偏好。

When you return markdown, you want to make sure your content is being used by the Agent or AI crawler. That’s why Markdown for Agents converted responses include the Content-Signal: ai-train=yes, search=yes, ai-input=yes header signaling that indicates content can be used for AI Training, Search results and AI Input, which includes agentic use. Markdown for Agents will provide options to define custom Content Signal policies in the future.

当您返回 Markdown 时，您希望确保您的内容正在被 AI 代理或爬虫使用。这就是为什么“为代理提供 Markdown”的转换响应包含 Content-Signal: ai-train=yes, search=yes, ai-input=yes 标头，表示该内容可用于 AI 训练、搜索结果和 AI 输入（包括代理使用）。未来，“为代理提供 Markdown”将提供定义自定义内容信号策略的选项。

请查看我们专门的内容信号页面以获取有关此框架的更多信息。

Check our dedicated Content Signals page for more information on this framework.

在 Cloudflare 博客和开发者文档中试用

We enabled this feature in our Developer Documentation and our Blog, inviting all AI crawlers and agents to consume our content using markdown instead of HTML.

我们已在开发者文档和博客中启用了此功能，邀请所有 AI 爬虫和代理使用 Markdown 而非 HTML 来消费我们的内容。

现在，通过使用 Accept: text/markdown 请求此博客来尝试一下。

Try it out now by requesting this blog with Accept: text/markdown.

curl https://blog.cloudflare.com/markdown-for-agents/ \
  -H "Accept: text/markdown"

结果是：

The result is:

---
description: The way content is discovered online is shifting, from traditional search engines to AI agents that need structured data from a Web built for humans. It’s time to consider not just human visitors, but start to treat agents as first-class citizens. Markdown for Agents automatically converts any HTML page requested from our network to markdown.
title: Introducing Markdown for Agents
image: https://blog.cloudflare.com/images/markdown-for-agents.png
---

# Introducing Markdown for Agents

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

...

其他转换为 Markdown 的方式

If you’re building AI systems that require arbitrary document conversion from outside Cloudflare or Markdown for Agents is not available from the content source, we provide other ways to convert documents to Markdown for your applications:

如果您正在构建需要从 Cloudflare 外部进行任意文档转换的 AI 系统，或者内容源不提供“为代理提供 Markdown”功能，我们还为您的应用程序提供了其他将文档转换为 Markdown 的方式：

Workers AI AI.toMarkdown()：支持多种文档类型（不仅仅是 HTML）以及摘要生成。

Workers AI AI.toMarkdown() supports multiple document types, not just HTML, and summarization.
Browser Rendering /markdown REST API：如果您需要在转换之前先在真实浏览器中渲染动态页面或应用程序，此 API 支持 Markdown 转换。

Browser Rendering /markdown REST API supports markdown conversion if you need to render a dynamic page or application in a real browser before converting it.

为了更清晰地对比这两种方案，以下是它们的关键特性：

To provide a clearer comparison between these two alternative solutions, here are their key characteristics:


特性 / 方案	Workers AI `AI.toMarkdown()`	Browser Rendering `/markdown` API
核心能力	文档转换与摘要	动态页面渲染后转换
支持输入格式	多种 (HTML, PDF, DOCX, TXT 等)	URL (需浏览器渲染)
处理动态内容	依赖输入文档的静态内容	优秀 (可执行JavaScript)
额外功能	文本摘要	完整的浏览器环境模拟
适用场景	处理已有文档文件，需要内容提炼	抓取现代单页应用(SPA)或交互式网站

追踪 Markdown 使用情况

Anticipating a shift in how AI systems browse the Web, Cloudflare Radar now includes content type insights for AI bot and crawler traffic, both globally on the AI Insights page and in the individual bot information pages.

预见到 AI 系统浏览网络方式的转变，Cloudflare Radar 现在包含了对 AI 机器人和爬虫流量的内容类型洞察，既体现在全局的 AI Insights 页面上，也体现在单个机器人信息页面中。

The new content_type dimension and filter shows the distribution of content types returned to AI agents and crawlers, grouped by MIME type category.

新的 content_type 维度和过滤器显示了返回给 AI 代理和爬虫的内容类型分布，按 MIME 类型类别分组。

BLOG-3162 3

您还可以查看特定代理或爬虫请求 Markdown 的情况。以下是返回给 OAI-Searchbot（OpenAI 用于驱动 ChatGPT 搜索的爬虫）的 Markdown 请求：

You can also see the requests for markdown filtered by a specific agent or crawler. Here are the requests that return markdown to OAI-Searchbot, the crawler used by OpenAI to power ChatGPT’s search:

BLOG-3162 4

这些新数据将使我们能够追踪 AI 机器人、爬虫和代理随时间推移消费网络内容的演变。与往常一样，Radar 上的所有数据都可以通过公共 API 和数据浏览器免费访问。

This new data will allow us to track the evolution of how AI bots, crawlers, and agents are consuming Web content over time. As always, everything on Radar is freely accessible via the public APIs and the Data Explorer.

立即开始使用

To enable Markdown for Agents for your zone, log into the Cloudflare dashboard, select your account, select the zone, look for Quick Actions and toggle the Markdown for Agents button to enable. This feature is available today in Beta at no cost for Pro, Business and Enterprise plans, as well as SSL for SaaS customers.

要为您的区域启用“为代理提供 Markdown”功能，请登录 Cloudflare 仪表板，选择您的账户和区域，找到“快速操作”并切换“为代理提供 Markdown”按钮以启用。该功能目前处于 Beta 测试阶段，Pro、Business 和 Enterprise 套餐用户以及 SSL for SaaS 客户可免费使用。

BLOG-3162 5

您可以在我们的开发者文档中找到关于“为代理提供 Markdown”的更多信息。我们欢迎您在我们继续完善和增强此功能时提供反馈。我们好奇地期待看到 AI 爬虫和代理如何适应并驾驭不断演变的、非结构化的网络世界。

You can find more information about Markdown for Agents on our Developer Docs. We welcome your feedback as we continue to refine and enhance this feature. We’re curious to see how AI crawlers and agents navigate and adapt to the unstructured nature of the Web as it evolves.

常见问题（FAQ）

Cloudflare的Markdown for Agents功能具体能节省多少AI处理成本？

该功能在网络边缘实时将HTML转换为Markdown，可减少高达80%的令牌使用量。例如，一篇博客的HTML版本需16180个令牌，转换后仅需3150个令牌，大幅优化了AI代理的内容处理效率。

AI代理如何获取Markdown格式的内容？

AI系统在请求时通过内容协商标头表达对text/markdown的偏好。Cloudflare网络会自动为已启用此功能的区域实时转换HTML为Markdown，让代理直接从源头获取结构化内容，无需自行解析。

为什么Markdown比HTML更适合AI处理？

Markdown结构明确、无冗余代码，语义价值高。例如，一个##标题在Markdown中约3个令牌，等效HTML则需12-15个令牌。这能最小化令牌浪费，提升AI处理效果，已成为AI系统的通用格式。

AI Summary (BLUF)