Drupal is a registered trademark of Dries Buytaert

AI File to Text automatically extracts content from uploaded document files and converts them to plain text, HTML, or Markdown. If you use the Drupal AI module and need to process uploaded documents — for example, extracting the text of a PDF so an AI agent can summarize it, or converting a Word file into HTML for a text field — this module handles it without needing any external service. Everything runs in pure PHP on your server.

Features

  • Extracts content from Word (.docx, .doc), OpenDocument (.odt, .ods), PDF, CSV, TXT, and Markdown (.md) files.
  • Three output formats: plain text, styled HTML, or Markdown.
  • HTML output preserves headings, bold, italic, underline, font sizes, colors, links, lists, and tables.
  • Provides AI Automator plugins for text_long and string_long fields — upload a file and the text is extracted automatically.
  • Registers a file_to_text function call for AI Agents, so agents can read and process documents on their own.
  • No external services, APIs, or server-side applications required. Everything is handled by PHP libraries bundled via Composer.

Post-Installation

  1. On any content type, add a File field (to accept uploads) and a Text (formatted, long) or Text (plain, long) field (to store the output).
  2. On the destination field, enable AI Automators and select "File to Text" as the automator type and choose your default output format (text, HTML, or Markdown).
  3. Upload a file.
  4. When content is saved with a file attachment, the document text is automatically extracted and placed into the target field.

Additional Requirements

AI module

PHP libraries (installed automatically via Composer):

Optional system package for improved PDF extraction:

  • poppler-utils — When installed on the server, the module uses pdftohtml for higher-quality PDF output with better table, link, and style detection. Falls back to smalot/pdfparser if not available.
  • AI Agents — Enables AI agent workflows where agents can call file_to_text to read and process uploaded documents autonomously.
  • AI Context — Provides additional context to AI operations, useful when combining file extraction with other AI tasks.

Similar projects

  • AI Simple PDF to Text — Handles PDF files, converts to plain text only.
  • Unstructured — Supports a similar range of file types but requires an external Unstructured API server or a SaaS account.

Supporting this Module

Contributions, bug reports, and feature requests are welcome in the issue queue.

Community Documentation

Documentation and usage examples are included in the module's README.md file.

Activity

Total releases
1
First release
Feb 2026
Latest release
3 weeks ago
Release cadence
Stability
0% stable

Releases

Version Type Release date
1.0.x-dev Dev Feb 11, 2026