document_loader_phpword
No security coverage
This module allows extracting content from Word and RTF documents for use with Document Loader, using the phpoffice/phpword PHP library.
Supported Input Formats:
- Word 2007+ (
.docx) - Word 2003 (
.doc) - OpenDocument Text (
.odt) - Rich Text Format (
.rtf)
Supported Output Formats:
texthtmlmarkdown
Note on RTF: RTF support is best-effort as PHPWord's RTF reader has limitations. It does not preserve headings or lists, and may drop special characters like smart quotes, accented letters, and dashes.
Requirements
This module requires the following modules:
Installation
composer require drupal/document_loader_phpwordConfiguration
- Enable the module at Administration > Extend
- See PHPWord as an available plugin in the Document Loader configuration at admin/config/media/document-loader
Similar Projects
- AI File To Text: Leverages the AI module to improve the output of loaded documents