ai_automator_pandoc
An AI Automators plugin that converts uploaded Word (.docx), PDF, ODT, and RTF files to clean HTML using the pandoc command-line tool.
Once configured, the automator runs automatically when a node is saved, writing the converted HTML into any text_long target field — no custom code required.
Features
- Converts Word (.docx / .doc), PDF, ODT, RTF, HTML, and plain text files to HTML5 or HTML4
- Input format is detected automatically from the file MIME type and extension
- Configurable output options: text wrapping, standalone document wrapper, embedded resources (base64 images), section numbering, table of contents, and freeform extra pandoc arguments
- Uses
proc_open()with an argument array — no shell injection risk - Settings form at
/admin/config/content/pandocwith live pandoc version display and validation - Appears in the Drupal status report with OK / Warning / Error indicators
- Post-install warning banner with a direct link to the configuration page
- Drush command
drush ai-automator-pandoc:testfor command-line testing and diagnosis
Requirements
- Drupal 10.4, 11, or 12
- AI module with the AI Automators sub-module enabled
- pandoc installed on the server and the binary path configured in the module settings
Installation
drush en ai_automator_pandocAfter enabling, visit Administration → Content authoring → Pandoc settings and enter the full path to the pandoc binary (e.g. /usr/local/bin/pandoc). Run which pandoc on the server to find the correct path.
Installing pandoc
- macOS (Homebrew):
brew install pandoc - Ubuntu / Debian:
sudo apt-get install pandoc - Alpine Linux / Docker:
apk add --no-cache pandoc - RHEL / CentOS:
sudo yum install pandoc - Windows: download the installer from pandoc.org/installing.html
Docker note: images based on Alpine Linux use
apk, notapt-get. Useapk add --no-cache pandocor add it to your Dockerfile.
Drush test command
The module ships with a Drush command for end-to-end testing of the conversion without going through the UI:
drush ai-automator-pandoc:test /path/to/document.docxThe command prints the resolved pandoc path, version, exit code, conversion time, and an HTML preview. Use --save to write the full HTML output next to the source file.
Related modules
- AI — required base module and automator framework
- Doc HTML Chunker — splits the converted HTML into JSON-encoded chunks for AI processing
- AI Document Proofreader — full node-based AI proofreading workflow using this module for document conversion