ai_file_to_text
No security coverage
AI File to Text automatically extracts content from uploaded document files and converts them to plain text, HTML, or Markdown. If you use the Drupal AI module and need to process uploaded documents — for example, extracting the text of a PDF so an AI agent can summarize it, or converting a Word file into HTML for a text field — this module handles it without needing any external service. Everything runs in pure PHP on your server.
Features
- Extracts content from Word (.docx, .doc), OpenDocument (.odt, .ods), PDF, CSV, TXT, and Markdown (.md) files.
- Three output formats: plain text, styled HTML, or Markdown.
- HTML output preserves headings, bold, italic, underline, font sizes, colors, links, lists, and tables.
- Provides AI Automator plugins for
text_longandstring_longfields — upload a file and the text is extracted automatically. - Registers a
file_to_textfunction call for AI Agents, so agents can read and process documents on their own. - No external services, APIs, or server-side applications required. Everything is handled by PHP libraries bundled via Composer.
Post-Installation
- On any content type, add a File field (to accept uploads) and a Text (formatted, long) or Text (plain, long) field (to store the output).
- On the destination field, enable AI Automators and select "File to Text" as the automator type and choose your default output format (text, HTML, or Markdown).
- Upload a file.
- When content is saved with a file attachment, the document text is automatically extracted and placed into the target field.
Additional Requirements
PHP libraries (installed automatically via Composer):
- phpoffice/phpword — Word (.docx, .doc) and ODT extraction
- phpoffice/phpspreadsheet — Spreadsheet (.xlsx, .xls, .ods, .csv) extraction
- smalot/pdfparser — PDF extraction (pure PHP)
- league/html-to-markdown — Markdown output conversion
Optional system package for improved PDF extraction:
- poppler-utils — When installed on the server, the module uses
pdftohtmlfor higher-quality PDF output with better table, link, and style detection. Falls back to smalot/pdfparser if not available.
Recommended modules/libraries
- AI Agents — Enables AI agent workflows where agents can call
file_to_textto read and process uploaded documents autonomously. - AI Context — Provides additional context to AI operations, useful when combining file extraction with other AI tasks.
Similar projects
- AI Simple PDF to Text — Handles PDF files, converts to plain text only.
- Unstructured — Supports a similar range of file types but requires an external Unstructured API server or a SaaS account.
Supporting this Module
Contributions, bug reports, and feature requests are welcome in the issue queue.
Community Documentation
Documentation and usage examples are included in the module's README.md file.