GEO

llms.txt:为大型语言模型量身定制的网站内容新标准

2026/1/26
llms.txt:为大型语言模型量身定制的网站内容新标准
AI Summary (BLUF)

The llms.txt proposal introduces a standardized markdown file at website roots to provide LLM-friendly content, addressing context window limitations by offering curated, structured information with links to detailed resources. (llms.txt提案通过在网站根目录引入标准化markdown文件,为大型语言模型提供友好内容,通过精选结构化信息和详细资源链接,解决上下文窗口限制问题。)

Background: The LLM Web Content Challenge

Large Language Models (LLMs) are increasingly reliant on web-based information. However, they face a critical limitation: their context windows are often too small to process the full content of most websites. Converting complex HTML pages—filled with navigation elements, advertisements, and JavaScript—into LLM-friendly plain text is both difficult and imprecise.

While websites serve both human readers and LLMs, the latter benefit significantly from concise, focused information centralized in an easily accessible location. This is especially crucial for use cases like development environments, where an LLM needs rapid access to programming documentation and API references.

大型语言模型(LLMs)越来越依赖于网络信息,但它们面临一个关键限制:其上下文窗口通常太小,无法处理大多数网站的全部内容。将充满导航元素、广告和JavaScript的复杂HTML页面转换为LLM友好的纯文本既困难又不精确。
虽然网站同时服务于人类读者和LLM,但后者尤其受益于集中在一个易于访问位置的、简洁且重点突出的信息。这对于开发环境等用例至关重要,因为在这些场景中,LLM需要快速访问编程文档和API参考。

The Proposal: A Dedicated llms.txt File

The Core Concept

We propose the addition of an /llms.txt markdown file to websites to provide LLM-friendly content. This file offers brief background, guidance, and links to detailed markdown documents. While human-readable, the llms.txt file also adheres to a precise format that allows for deterministic processing using standard programming techniques like parsers and regular expressions.

Furthermore, we suggest that pages on a website which might be useful to an LLM should provide a clean markdown version of that page by appending .md to the original URL. (URLs without a filename should instead append index.html.md.)

我们建议在网站中添加一个 /llms.txt Markdown文件,以提供对LLM友好的内容。该文件提供简要的背景信息、指引以及指向详细Markdown文档的链接。虽然人类可读,但llms.txt文件也遵循一种精确的格式,允许使用解析器和正则表达式等标准编程技术进行确定性处理。
此外,我们建议网站上可能对LLM有用的页面,应通过在其原始URL后附加.md来提供该页面的纯净Markdown版本。(不带文件名的URL应改为附加index.html.md。)

Implementation and Versatility

The FastHTML project has adopted both proposals in its documentation. For instance, here is the llms.txt for FastHTML docs. Here is also an example of a regular HTML documentation page and its .md appended version.

This proposal does not prescribe specific handling methods for the llms.txt file, as this will depend on the application. For example, the FastHTML project uses an XML-based structure to automatically expand llms.txt into two markdown files containing the content from linked URLs, suitable for use with LLMs like Claude. These files are created using the llms_txt2ctx command-line application.

The versatility of llms.txt files means they can serve multiple purposes—from helping developers navigate software documentation to providing businesses with a way to outline their structure, or even breaking down complex regulations for stakeholders.

FastHTML项目已在其文档中采用了这两项提议。例如,这是FastHTML文档的llms.txt。这里还有一个常规HTML文档页面及其附加了.md的版本的示例。
本提案并未规定处理llms.txt文件的具体方法,因为这取决于具体应用。例如,FastHTML项目使用基于XML的结构,自动将llms.txt扩展为两个包含链接URL内容的Markdown文件,适用于Claude等LLM。这些文件是使用llms_txt2ctx命令行应用程序创建的。
llms.txt文件的多功能性意味着它们可以服务于多种目的——从帮助开发者浏览软件文档,到为企业提供概述其结构的方式,甚至为利益相关者分解复杂的法规。

Format Specification

Why Markdown?

Currently, the most widely and easily understood format for language models is Markdown. Simply indicating the location of key Markdown files is an excellent first step. Providing some basic structure helps language models locate the source of the information they need.

The llms.txt file is unusual because it uses Markdown to organize information rather than a traditional structured format like XML. The reason is that we anticipate many such files will be read by language models and agents. Nevertheless, the information within llms.txt follows a specific format that can be read using standard programmatic tools.

目前,对语言模型来说最广泛且最容易理解的格式是Markdown。简单地指明关键Markdown文件的位置是一个很好的第一步。提供一些基本结构有助于语言模型找到所需信息的来源。
llms.txt文件的不同寻常之处在于它使用Markdown来组织信息,而不是像XML这样的传统结构化格式。原因在于我们预计许多这类文件将被语言模型和智能体读取。尽管如此,llms.txt中的信息遵循特定格式,可以使用标准的编程工具进行读取。

File Structure

The llms.txt file specification applies to files located at the website root path /llms.txt (or optionally, in a subpath). A file conforming to this specification contains the following Markdown sections, in this specific order:

  1. A single H1 heading containing the name of the project or website. This is the only mandatory section.
  2. A blockquote containing a short summary of the project, including key information necessary to understand the rest of the file.
  3. Zero or more markdown sections of any type except headings (e.g., paragraphs, lists) containing more detailed information about the project and how to interpret the provided files.
  4. Zero or more markdown sections delimited by H2 headings, containing "file lists" of URLs that provide further detail.
    • Each "file list" is a markdown list containing a required markdown hyperlink [name](url), optionally followed by a : and a note about the file.

llms.txt文件规范适用于位于网站根路径/llms.txt(或可选地在子路径中)的文件。符合该规范的文件包含以下Markdown部分,并按此特定顺序排列:

  1. 一个H1标题,包含项目或网站的名称。这是唯一必须的部分。
  2. 一个引用块,包含项目的简短摘要,其中包含理解文件其余部分所必需的关键信息。
  3. 零个或多个除标题外的任何类型的markdown部分(例如,段落、列表等),包含有关项目以及如何解读所提供文件的更详细信息。
  4. 零个或多个由H2标题分隔的markdown部分,包含提供进一步详细信息的URL的“文件列表”。
    • 每个“文件列表”都是一个markdown列表,包含一个必需的markdown超链接[名称](url),然后可以选择性地跟一个:和关于文件的注释。

Example Structure

# Title

> Optional description goes here

Optional details go here

## Section name

- [Link title](https://link_url): Optional link details

## Optional

- [Link title](https://link_url)

Note that the "Optional" section has special meaning—if included, the URLs provided within it can be skipped when a shorter context is needed. Use it for secondary information that can typically be omitted.

请注意,“Optional”(可选)部分具有特殊含义——如果包含此部分,则在需要较短上下文时可以跳过其中提供的URL。请用它来存放通常可以省略的次要信息。

Relationship with Existing Web Standards

Coexistence, Not Replacement

llms.txt is designed to coexist with current web standards. While a sitemap lists all pages for search engines, llms.txt provides a curated overview for LLMs. It can complement robots.txt by providing context for allowed content. The file can also reference structured data markup used on the site, helping LLMs understand how to interpret that information in context.

The approach of standardizing on a file path follows the precedent set by /robots.txt and /sitemap.xml. The purposes of robots.txt and llms.txt differ—robots.txt is typically used to inform automated tools what access to a website is considered acceptable, e.g., for search indexing bots. llms.txt, on the other hand, is information typically used on-demand when a user explicitly requests information on a topic.

We anticipate llms.txt will be primarily useful for inference (when a user is seeking help) rather than for training. However, if the use of llms.txt becomes widespread, future training processes may also leverage the information in llms.txt files.

llms.txt旨在与当前的Web标准共存。虽然站点地图为搜索引擎列出所有页面,但llms.txt为LLM提供了一个精心策划的概览。它可以通过为允许的内容提供上下文来补充robots.txt。该文件还可以引用网站上使用的结构化数据标记,帮助LLM理解如何在上下文中解释这些信息。
在文件路径上进行标准化的方法,效仿了/robots.txt/sitemap.xml的做法。robots.txtllms.txt的用途不同——robots.txt通常用于告知自动化工具对网站的何种访问被视为可接受的,例如对于搜索索引机器人。而另一方面,llms.txt的信息通常在用户明确请求某个主题的信息时按需使用。
我们预计llms.txt主要对推理有用(即用户寻求帮助时),而不是用于训练。然而,如果llms.txt的使用变得普遍,未来的训练过程或许也能利用llms.txt文件中的信息。

Distinction from Sitemaps

sitemap.xml is a list of all indexable, human-readable information on a website. It is not a replacement for llms.txt because it:

  • Usually does not list LLM-readable versions of pages.
  • Does not contain URLs to external websites, even if those might be helpful for understanding the information.
  • Typically contains a volume of documents too large to fit into an LLM's context window and includes much information non-essential for understanding the site.

sitemap.xml是一个网站上所有可索引的人类可读信息的列表。它不能替代llms.txt,因为它:

  • 通常不会列出页面的LLM可读版本。
  • 不包含外部网站的URL,即使这些URL可能有助于理解信息。
  • 通常会包含总量太大而无法放入LLM上下文窗口的文档,并且会包含许多理解网站所非必需的信息。

Practical Example

Here is an example llms.txt, a condensed version of the file used for the FastHTML project (see also the full version):

# FastHTML

> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications.

Important notes:
- Although parts of its API are inspired by FastAPI, it is *not* compatible with FastAPI syntax and is not targeted at creating API services
- FastHTML is compatible with JS-native web components and any vanilla JS library, but not with React, Vue, or Svelte.

## Docs
- [FastHTML quick start](https://fasthtml.cn/docs/tutorials/quickstart_for_web_devs.html.md): A brief overview of many FastHTML features
- [HTMX reference](https://github.com/bigskysoftware/htmx/blob/master/www/content/reference.md): Brief description of all HTMX attributes, CSS classes, headers, events, extensions, js lib methods, and config options

## Examples
- [Todo list application](https://github.com/AnswerDotAI/fasthtml/blob/main/examples/adv_app.py): Detailed walk-thru of a complete CRUD app in FastHTML showing idiomatic use of FastHTML and HTMX patterns.

## Optional
- [Starlette full documentation](https://gist.githubusercontent.com/jph00/809e4a4808d4510be0e3dc9565e9cbd3/raw/9b717589ca44cedc8aaf00b2b8cacef922964c0f/starlette-sml.md): A subset of the Starlette documentation useful for FastHTML development.

Guidelines for Creation

To create an effective llms.txt file, consider the following guidelines:

  • Use concise, clear language.
  • When linking to resources, include a short, informative description.
  • Avoid ambiguous terms or unexplained jargon.
  • Run a tool that expands your llms.txt file into an LLM context file and test multiple language models to see if they can answer questions about your content.

要创建有效的llms.txt文件,请考虑以下准则:

  • 使用简洁、清晰的语言。
  • 链接到资源时,请包含简短、信息丰富的描述。
  • 避免使用模棱两可的术语或未经解释的行话。
  • 运行一个工具,将你的llms.txt文件扩展为LLM上下文文件,并测试多个语言模型,看它们是否能回答关于你内容的问题。

(Note: Due to length constraints, the remaining sections on Directories, Integration Tools, and Next Steps are summarized below. The full content can be referenced in the original input.)

Integration and Community

The llms.txt ecosystem is supported by a growing set of tools and a collaborative community.

Directories list available llms.txt files across the web, such as llmstxt.site and directory.llmstxt.cloud.

Integration Tools facilitate adoption across various tech stacks:

  • llms_txt2ctx - A CLI and Python module for parsing llms.txt and generating LLM contexts.
  • vitepress-plugin-llms - A VitePress plugin for auto-generating LLM-friendly docs.
  • docusaurus-plugin-llms - A similar plugin for Docusaurus.
  • Drupal LLM Support - A Drupal Recipe for comprehensive llms.txt support.
  • llms-txt-php - A PHP library for reading/writing llms.txt files.

Next Steps: The llms.txt specification is open to community input. A GitHub repository hosts this informal overview for versioning and public discussion, and a Community Discord channel is available for sharing implementation experiences.

集成与社区llms.txt生态系统得到了不断增长的工具集和协作社区的支持。
目录列出了网络上可用的llms.txt文件,例如llmstxt.sitedirectory.llmstxt.cloud
集成工具促进了跨各种技术栈的采用:

  • llms_txt2ctx - 用于解析llms.txt并生成LLM上下文的CLI和Python模块。
  • vitepress-plugin-llms - 用于自动生成LLM友好文档的VitePress插件。
  • docusaurus-plugin-llms - 用于Docusaurus的类似插件。
  • Drupal LLM支持 - 为全面支持llms.txt提供的Drupal配方。
  • llms-txt-php - 用于读写llms.txt文件的PHP库。
    后续步骤llms.txt规范对社区意见开放。一个GitHub仓库托管了这份非正式概述以便进行版本控制和公开讨论,还有一个社区Discord频道可供分享实施经验。

Conclusion

The llms.txt proposal offers a pragmatic, lightweight path to make web content more accessible and useful for Large Language Models. By providing a structured, curated entry point and linking to clean, focused markdown versions of key content, it addresses the core challenges of context window limits and noisy HTML. As a community-driven standard designed to complement existing web protocols, it has the potential to significantly improve how LLMs interact with and retrieve information from the vast resources of the web.

llms.txt提案为让网络内容对大型语言模型更易访问和更有用,提供了一条务实、轻量级的路径。通过提供一个结构化的、精心策划的入口点,并链接到关键内容的纯净、聚焦的Markdown版本,它解决了上下文窗口限制和嘈杂HTML的核心挑战。作为一个旨在补充现有网络协议的社区驱动标准,它有可能显著改善LLM与海量网络资源交互和检索信息的方式。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。