llms.txt:为大型语言模型量身定制的网站内容新标准
The llms.txt proposal introduces a standardized markdown file at website roots to provide LLM-friendly content, addressing context window limitations by offering curated, structured information with links to detailed resources. (llms.txt提案通过在网站根目录引入标准化markdown文件,为大型语言模型提供友好内容,通过精选结构化信息和详细资源链接,解决上下文窗口限制问题。)
Background: The LLM Web Content Challenge
Large Language Models (LLMs) are increasingly reliant on web-based information. However, they face a critical limitation: their context windows are often too small to process the full content of most websites. Converting complex HTML pages—filled with navigation elements, advertisements, and JavaScript—into LLM-friendly plain text is both difficult and imprecise.
While websites serve both human readers and LLMs, the latter benefit significantly from concise, focused information centralized in an easily accessible location. This is especially crucial for use cases like development environments, where an LLM needs rapid access to programming documentation and API references.
大型语言模型(LLMs)越来越依赖于网络信息,但它们面临一个关键限制:其上下文窗口LLM处理输入文本时的长度限制,超出部分可能被截断或忽略,影响模型对长内容的整体理解。通常太小,无法处理大多数网站的全部内容。将充满导航元素、广告和JavaScript的复杂HTML页面转换为LLM友好的纯文本既困难又不精确。
虽然网站同时服务于人类读者和LLM,但后者尤其受益于集中在一个易于访问位置的、简洁且重点突出的信息。这对于开发环境等用例至关重要,因为在这些场景中,LLM需要快速访问编程文档和API参考。
The Proposal: A Dedicated llms.txt File
The Core Concept
We propose the addition of an /llms.txt markdownA lightweight markup language for creating formatted text using a plain-text editor. file to websites to provide LLM-friendly content. This file offers brief background, guidance, and links to detailed markdownA lightweight markup language for creating formatted text using a plain-text editor. documents. While human-readable, the llms.txt file also adheres to a precise format that allows for deterministic processing using standard programming techniques like parsers and regular expressions.
Furthermore, we suggest that pages on a website which might be useful to an LLM should provide a clean markdownA lightweight markup language for creating formatted text using a plain-text editor. version of that page by appending .md to the original URL. (URLs without a filename should instead append index.html.md.)
我们建议在网站中添加一个
/llms.txtMarkdownA lightweight markup language for creating formatted text using a plain-text editor.文件,以提供对LLM友好的内容。该文件提供简要的背景信息、指引以及指向详细MarkdownA lightweight markup language for creating formatted text using a plain-text editor.文档的链接。虽然人类可读,但llms.txt文件也遵循一种精确的格式,允许使用解析器和正则表达式等标准编程技术进行确定性处理。
此外,我们建议网站上可能对LLM有用的页面,应通过在其原始URL后附加.md来提供该页面的纯净MarkdownA lightweight markup language for creating formatted text using a plain-text editor.版本。(不带文件名的URL应改为附加index.html.md。)
Implementation and Versatility
The FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。 project has adopted both proposals in its documentation. For instance, here is the llms.txt for FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。 docs. Here is also an example of a regular HTML documentation page and its .md appended version.
This proposal does not prescribe specific handling methods for the llms.txt file, as this will depend on the application. For example, the FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。 project uses an XML-based structure to automatically expand llms.txt into two markdownA lightweight markup language for creating formatted text using a plain-text editor. files containing the content from linked URLs, suitable for use with LLMs like Claude. These files are created using the llms_txt2ctx command-line application.
The versatility of llms.txt files means they can serve multiple purposes—from helping developers navigate software documentation to providing businesses with a way to outline their structure, or even breaking down complex regulations for stakeholders.
FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。项目已在其文档中采用了这两项提议。例如,这是FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。文档的
llms.txt。这里还有一个常规HTML文档页面及其附加了.md的版本的示例。
本提案并未规定处理llms.txt文件的具体方法,因为这取决于具体应用。例如,FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。项目使用基于XML的结构,自动将llms.txt扩展为两个包含链接URL内容的MarkdownA lightweight markup language for creating formatted text using a plain-text editor.文件,适用于Claude等LLM。这些文件是使用llms_txt2ctx命令行应用程序创建的。llms.txt文件的多功能性意味着它们可以服务于多种目的——从帮助开发者浏览软件文档,到为企业提供概述其结构的方式,甚至为利益相关者分解复杂的法规。
Format Specification
Why MarkdownA lightweight markup language for creating formatted text using a plain-text editor.?
Currently, the most widely and easily understood format for language models is MarkdownA lightweight markup language for creating formatted text using a plain-text editor.. Simply indicating the location of key MarkdownA lightweight markup language for creating formatted text using a plain-text editor. files is an excellent first step. Providing some basic structure helps language models locate the source of the information they need.
The llms.txt file is unusual because it uses MarkdownA lightweight markup language for creating formatted text using a plain-text editor. to organize information rather than a traditional structured format like XML. The reason is that we anticipate many such files will be read by language models and agents. Nevertheless, the information within llms.txt follows a specific format that can be read using standard programmatic tools.
目前,对语言模型来说最广泛且最容易理解的格式是MarkdownA lightweight markup language for creating formatted text using a plain-text editor.。简单地指明关键MarkdownA lightweight markup language for creating formatted text using a plain-text editor.文件的位置是一个很好的第一步。提供一些基本结构有助于语言模型找到所需信息的来源。
llms.txt文件的不同寻常之处在于它使用MarkdownA lightweight markup language for creating formatted text using a plain-text editor.来组织信息,而不是像XML这样的传统结构化格式。原因在于我们预计许多这类文件将被语言模型和智能体读取。尽管如此,llms.txt中的信息遵循特定格式,可以使用标准的编程工具进行读取。
File Structure
The llms.txt file specification applies to files located at the website root path /llms.txt (or optionally, in a subpath). A file conforming to this specification contains the following MarkdownA lightweight markup language for creating formatted text using a plain-text editor. sections, in this specific order:
- A single H1 heading containing the name of the project or website. This is the only mandatory section.
- A blockquote containing a short summary of the project, including key information necessary to understand the rest of the file.
- Zero or more markdownA lightweight markup language for creating formatted text using a plain-text editor. sections of any type except headings (e.g., paragraphs, lists) containing more detailed information about the project and how to interpret the provided files.
- Zero or more markdownA lightweight markup language for creating formatted text using a plain-text editor. sections delimited by H2 headings, containing "file lists" of URLs that provide further detail.
- Each "file list" is a markdownA lightweight markup language for creating formatted text using a plain-text editor. list containing a required markdownA lightweight markup language for creating formatted text using a plain-text editor. hyperlink
[name](url), optionally followed by a:and a note about the file.
- Each "file list" is a markdownA lightweight markup language for creating formatted text using a plain-text editor. list containing a required markdownA lightweight markup language for creating formatted text using a plain-text editor. hyperlink
llms.txt文件规范适用于位于网站根路径/llms.txt(或可选地在子路径中)的文件。符合该规范的文件包含以下MarkdownA lightweight markup language for creating formatted text using a plain-text editor.部分,并按此特定顺序排列:
- 一个H1标题,包含项目或网站的名称。这是唯一必须的部分。
- 一个引用块,包含项目的简短摘要,其中包含理解文件其余部分所必需的关键信息。
- 零个或多个除标题外的任何类型的markdownA lightweight markup language for creating formatted text using a plain-text editor.部分(例如,段落、列表等),包含有关项目以及如何解读所提供文件的更详细信息。
- 零个或多个由H2标题分隔的markdownA lightweight markup language for creating formatted text using a plain-text editor.部分,包含提供进一步详细信息的URL的“文件列表”。
- 每个“文件列表”都是一个markdownA lightweight markup language for creating formatted text using a plain-text editor.列表,包含一个必需的markdownA lightweight markup language for creating formatted text using a plain-text editor.超链接
[名称](url),然后可以选择性地跟一个:和关于文件的注释。
Example Structure
# Title
> Optional description goes here
Optional details go here
## Section name
- [Link title](https://link_url): Optional link details
## Optional
- [Link title](https://link_url)
Note that the "Optional" section has special meaning—if included, the URLs provided within it can be skipped when a shorter context is needed. Use it for secondary information that can typically be omitted.
请注意,“Optional”(可选)部分具有特殊含义——如果包含此部分,则在需要较短上下文时可以跳过其中提供的URL。请用它来存放通常可以省略的次要信息。
Relationship with Existing Web Standards
Coexistence, Not Replacement
llms.txt is designed to coexist with current web standards. While a sitemap lists all pages for search engines, llms.txt provides a curated overview for LLMs. It can complement robots.txt by providing context for allowed content. The file can also reference structured data markup used on the site, helping LLMs understand how to interpret that information in context.
The approach of standardizing on a file path follows the precedent set by /robots.txt and /sitemap.xml. The purposes of robots.txt and llms.txt differ—robots.txt is typically used to inform automated tools what access to a website is considered acceptable, e.g., for search indexing bots. llms.txt, on the other hand, is information typically used on-demand when a user explicitly requests information on a topic.
We anticipate llms.txt will be primarily useful for inference (when a user is seeking help) rather than for training. However, if the use of llms.txt becomes widespread, future training processes may also leverage the information in llms.txt files.
llms.txt旨在与当前的Web标准共存。虽然站点地图为搜索引擎列出所有页面,但llms.txt为LLM提供了一个精心策划的概览。它可以通过为允许的内容提供上下文来补充robots.txt。该文件还可以引用网站上使用的结构化数据标记,帮助LLM理解如何在上下文中解释这些信息。
在文件路径上进行标准化的方法,效仿了/robots.txt和/sitemap.xml的做法。robots.txt和llms.txt的用途不同——robots.txt通常用于告知自动化工具对网站的何种访问被视为可接受的,例如对于搜索索引机器人。而另一方面,llms.txt的信息通常在用户明确请求某个主题的信息时按需使用。
我们预计llms.txt主要对推理有用(即用户寻求帮助时),而不是用于训练。然而,如果llms.txt的使用变得普遍,未来的训练过程或许也能利用llms.txt文件中的信息。
Distinction from Sitemaps
sitemap.xml is a list of all indexable, human-readable information on a website. It is not a replacement for llms.txt because it:
- Usually does not list LLM-readable versions of pages.
- Does not contain URLs to external websites, even if those might be helpful for understanding the information.
- Typically contains a volume of documents too large to fit into an LLM's context window and includes much information non-essential for understanding the site.
sitemap.xml是一个网站上所有可索引的人类可读信息的列表。它不能替代llms.txt,因为它:
- 通常不会列出页面的LLM可读版本。
- 不包含外部网站的URL,即使这些URL可能有助于理解信息。
- 通常会包含总量太大而无法放入LLM上下文窗口LLM处理输入文本时的长度限制,超出部分可能被截断或忽略,影响模型对长内容的整体理解。的文档,并且会包含许多理解网站所非必需的信息。
Practical Example
Here is an example llms.txt, a condensed version of the file used for the FastHTMLPython库,整合Starlette、Uvicorn、HTMX和fastcore的FT标签,用于创建服务器渲染的超媒体应用,是llms.txt的早期采用者。 project (see also the full version):
# FastHTML
> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications.
Important notes:
- Although parts of its API are inspired by FastAPI, it is *not* compatible with FastAPI syntax and is not targeted at creating API services
- FastHTML is compatible with JS-native web components and any vanilla JS library, but not with React, Vue, or Svelte.
## Docs
- [FastHTML quick start](https://fasthtml.cn/docs/tutorials/quickstart_for_web_devs.html.md): A brief overview of many FastHTML features
- [HTMX reference](https://github.com/bigskysoftware/htmx/blob/master/www/content/reference.md): Brief description of all HTMX attributes, CSS classes, headers, events, extensions, js lib methods, and config options
## Examples
- [Todo list application](https://github.com/AnswerDotAI/fasthtml/blob/main/examples/adv_app.py): Detailed walk-thru of a complete CRUD app in FastHTML showing idiomatic use of FastHTML and HTMX patterns.
## Optional
- [Starlette full documentation](https://gist.githubusercontent.com/jph00/809e4a4808d4510be0e3dc9565e9cbd3/raw/9b717589ca44cedc8aaf00b2b8cacef922964c0f/starlette-sml.md): A subset of the Starlette documentation useful for FastHTML development.
Guidelines for Creation
To create an effective llms.txt file, consider the following guidelines:
- Use concise, clear language.
- When linking to resources, include a short, informative description.
- Avoid ambiguous terms or unexplained jargon.
- Run a tool that expands your
llms.txtfile into an LLM context file and test multiple language models to see if they can answer questions about your content.
要创建有效的
llms.txt文件,请考虑以下准则:
- 使用简洁、清晰的语言。
- 链接到资源时,请包含简短、信息丰富的描述。
- 避免使用模棱两可的术语或未经解释的行话。
- 运行一个工具,将你的
llms.txt文件扩展为LLM上下文文件,并测试多个语言模型,看它们是否能回答关于你内容的问题。
(Note: Due to length constraints, the remaining sections on Directories, Integration Tools, and Next Steps are summarized below. The full content can be referenced in the original input.)
Integration and Community
The llms.txt ecosystem is supported by a growing set of tools and a collaborative community.
Directories list available llms.txt files across the web, such as llmstxt.site and directory.llmstxt.cloud.
Integration Tools facilitate adoption across various tech stacks:
llms_txt2ctx- A CLI and Python module for parsingllms.txtand generating LLM contexts.vitepress-plugin-llms- A VitePress plugin for auto-generating LLM-friendly docs.docusaurus-plugin-llms- A similar plugin for Docusaurus.Drupal LLM Support- A Drupal Recipe for comprehensivellms.txtsupport.llms-txt-php- A PHP library for reading/writingllms.txtfiles.
Next Steps: The llms.txt specification is open to community input. A GitHub repository hosts this informal overview for versioning and public discussion, and a Community Discord channel is available for sharing implementation experiences.
集成与社区:
llms.txt生态系统得到了不断增长的工具集和协作社区的支持。
目录列出了网络上可用的llms.txt文件,例如llmstxt.site和directory.llmstxt.cloud。
集成工具促进了跨各种技术栈的采用:
llms_txt2ctx- 用于解析llms.txt并生成LLM上下文的CLI和Python模块。vitepress-plugin-llms- 用于自动生成LLM友好文档的VitePress插件。docusaurus-plugin-llms- 用于Docusaurus的类似插件。Drupal LLM支持- 为全面支持llms.txt提供的Drupal配方。llms-txt-php- 用于读写llms.txt文件的PHP库。
后续步骤:llms.txt规范对社区意见开放。一个GitHub仓库托管了这份非正式概述以便进行版本控制和公开讨论,还有一个社区Discord频道可供分享实施经验。
Conclusion
The llms.txt proposal offers a pragmatic, lightweight path to make web content more accessible and useful for Large Language Models. By providing a structured, curated entry point and linking to clean, focused markdownA lightweight markup language for creating formatted text using a plain-text editor. versions of key content, it addresses the core challenges of context window limits and noisy HTML. As a community-driven standard designed to complement existing web protocols, it has the potential to significantly improve how LLMs interact with and retrieve information from the vast resources of the web.
llms.txt提案为让网络内容对大型语言模型更易访问和更有用,提供了一条务实、轻量级的路径。通过提供一个结构化的、精心策划的入口点,并链接到关键内容的纯净、聚焦的MarkdownA lightweight markup language for creating formatted text using a plain-text editor.版本,它解决了上下文窗口LLM处理输入文本时的长度限制,超出部分可能被截断或忽略,影响模型对长内容的整体理解。限制和嘈杂HTML的核心挑战。作为一个旨在补充现有网络协议的社区驱动标准,它有可能显著改善LLM与海量网络资源交互和检索信息的方式。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。