Drupal is a registered trademark of Dries Buytaert

An AI Automators plugin that converts uploaded Word (.docx), PDF, ODT, and RTF files to clean HTML using the pandoc command-line tool.

Once configured, the automator runs automatically when a node is saved, writing the converted HTML into any text_long target field — no custom code required.

Features

  • Converts Word (.docx / .doc), PDF, ODT, RTF, HTML, and plain text files to HTML5 or HTML4
  • Input format is detected automatically from the file MIME type and extension
  • Configurable output options: text wrapping, standalone document wrapper, embedded resources (base64 images), section numbering, table of contents, and freeform extra pandoc arguments
  • Uses proc_open() with an argument array — no shell injection risk
  • Settings form at /admin/config/content/pandoc with live pandoc version display and validation
  • Appears in the Drupal status report with OK / Warning / Error indicators
  • Post-install warning banner with a direct link to the configuration page
  • Drush command drush ai-automator-pandoc:test for command-line testing and diagnosis

Requirements

  • Drupal 10.4, 11, or 12
  • AI module with the AI Automators sub-module enabled
  • pandoc installed on the server and the binary path configured in the module settings

Installation

drush en ai_automator_pandoc

After enabling, visit Administration → Content authoring → Pandoc settings and enter the full path to the pandoc binary (e.g. /usr/local/bin/pandoc). Run which pandoc on the server to find the correct path.

Installing pandoc

  • macOS (Homebrew): brew install pandoc
  • Ubuntu / Debian: sudo apt-get install pandoc
  • Alpine Linux / Docker: apk add --no-cache pandoc
  • RHEL / CentOS: sudo yum install pandoc
  • Windows: download the installer from pandoc.org/installing.html

Docker note: images based on Alpine Linux use apk, not apt-get. Use apk add --no-cache pandoc or add it to your Dockerfile.

Drush test command

The module ships with a Drush command for end-to-end testing of the conversion without going through the UI:

drush ai-automator-pandoc:test /path/to/document.docx

The command prints the resolved pandoc path, version, exit code, conversion time, and an HTML preview. Use --save to write the full HTML output next to the source file.

Related modules

  • AI — required base module and automator framework
  • Doc HTML Chunker — splits the converted HTML into JSON-encoded chunks for AI processing
  • AI Document Proofreader — full node-based AI proofreading workflow using this module for document conversion

Activity

Total releases
2
First release
Mar 2026
Latest release
5 days ago
Release cadence
0 days
Stability
0% stable

Releases

Version Type Release date
1.0.0-rc1 Pre-release Mar 9, 2026
1.0.x-dev Dev Mar 9, 2026