Llms.txt
Overview
llms.txt is a proposed convention for a Markdown file placed at a website's root (/llms.txt) that provides large language models with a concise, curated map of the site's most important content. It was proposed in 2024 by Jeremy Howard, framed as a way to help AI systems find and use a site's primary information at inference time, given finite context windows.[1]
llms.txt is an emerging proposal rather than a ratified standard; it has no RFC or W3C status, and its adoption and interpretation by AI vendors are not uniform.
How it works
The file is human- and machine-readable Markdown with a defined structure: an H1 site title, a blockquote summary, optional notes, and H2 sections listing key URLs with short descriptions. A companion /llms-full.txt may contain expanded content. It expresses what content matters, and is distinct from access-control files that express what may be crawled.
| File | Purpose | Governs |
|---|---|---|
| llms.txt | Curated content guide for AI | Which content to prioritize |
| robots.txt | Crawler access control | Whether a path may be crawled |
| ai.txt / proposals | Usage/training permissions | Whether content may be used for AI training |
| XML sitemap | Exhaustive URL inventory | Discovery of all pages |
llms.txt is not an access-control mechanism: it does not block crawling or training (that is the role of robots.txt and usage directives), and it does not guarantee that any model will read or honor it.
Examples
- A documentation site publishes
/llms.txtlinking its API reference and key guides so AI assistants surface the canonical pages. - This site's own llms.txt indexes its primary reference pages.
See also
References
- ↑ Howard, J. (2024). "The /llms.txt file." https://llmstxt.org/