markdownify
Markdownify is a Drupal module that provides a seamless solution for generating Markdown versions of your site's content. Via any of the six supported request patterns, this module enables bots, AI agents, and developers to access a lightweight, Markdown-based representation of your site’s content for easier parsing and consumption.
Why MarkDown?
While modern LLMs can ingest and parse raw HTML, Markdown offers significant advantages:
- Cost Efficiency: AI services charge based on token usage, typically in increments of one million. In our testing, Markdown reduces token count by a 10:1 ratio compared to HTML, leading to lower costs.
- Faster Processing: The reduction in token count results in quicker processing of your content in Markdown format.
- Zero distractions: The MarkDown output omits headers, footers, ads, and other irrelevant output, providing a clean and concise context for AI models.
- Universal Format: Markdown is widely supported and easily understood by AI models, making it the lingua franca for structured text. 1
Six Ways to Access Markdown Content
Once enabled, Markdownify provides six ways to access Markdown-formatted content. With the Markdownify Path submodule enabled, you can also access Markdown content via human-readable path aliases.
-
Appending
.mdto the canonical entity URL:curl -I https://yourwebsite.com/node/1.md -
Appending
.mdto to Path Aliases (viaMarkdownify PathSubmodule):curl -I https://yourwebsite.com/en/articles/give-your-oatmeal-the-ultimate-makeover.md -
Using the
/markdownifyPath Prefix:curl -I https://yourwebsite.com/markdownify/node/1 -
Using the
_formatQuery Parameter:curl -I "https://yourwebsite.com/node/1?_format=markdown"See https://www.drupal.org/node/2501221 for more information about the "
_format" query parameter. -
Using the
AcceptHeader:curl -I https://yourwebsite.com/node/1 -H "Accept: text/markdown" -
Using the
Content-TypeHeader:curl -I https://yourwebsite.com/node/1 -H "Content-Type: text/markdown"
How it Works
The Markdownify module leverages the power of the League HTML-to-Markdown Library to convert HTML-rendered content into clean Markdown format.
The required library, league/html-to-markdown, is installed via Composer as part of the module’s dependencies.
When a request for a .md version of an entity is received, the module:
- Renders the Entity: The module uses the standard Drupal render pipeline
to generate the HTML representation of the requested entity. - Converts HTML to Markdown: Using the
League\HTMLToMarkdown\HtmlConverter, the module processes the HTML output and transforms it into Markdown format. - Delivers Markdown Output: The Markdown version of the content is served to the client, omitting unnecessary elements like headers, footers, or advertisements.
By removing unnecessary elements like headers, footers, and ads, the output remains concise and AI-friendly.