Drupal is a registered trademark of Dries Buytaert
drupal 11.3.7 Update released for Drupal core (11.3.7)! drupal 11.2.11 Update released for Drupal core (11.2.11)! drupal 10.6.7 Update released for Drupal core (10.6.7)! drupal 10.5.9 Update released for Drupal core (10.5.9)! cms 2.1.1 Update released for Drupal core (2.1.1)! drupal 11.3.6 Update released for Drupal core (11.3.6)! drupal 10.6.6 Update released for Drupal core (10.6.6)! cms 2.1.0 Update released for Drupal core (2.1.0)! bootstrap 8.x-3.40 Minor update available for theme bootstrap (8.x-3.40). menu_link_attributes 8.x-1.7 Minor update available for module menu_link_attributes (8.x-1.7). eca 3.1.1 Minor update available for module eca (3.1.1). layout_paragraphs 2.1.3 Minor update available for module layout_paragraphs (2.1.3). ai 1.3.3 Minor update available for module ai (1.3.3). ai 1.2.14 Minor update available for module ai (1.2.14). node_revision_delete 2.0.3 Minor update available for module node_revision_delete (2.0.3). moderated_content_bulk_publish 2.0.52 Minor update available for module moderated_content_bulk_publish (2.0.52). klaro 3.0.10 Minor update available for module klaro (3.0.10). klaro 3.0.9 Minor update available for module klaro (3.0.9). layout_paragraphs 2.1.2 Minor update available for module layout_paragraphs (2.1.2). geofield_map 11.1.8 Minor update available for module geofield_map (11.1.8).

file_extractor

344 sites Security covered
View on drupal.org

Synopsis

This module adds a new computed field on File entity: "File extractor: extracted file".

This new field allows to access the content of the file:

  • in webservices like JSON:API
  • in a field formatter (file field)
  • in Search API

The module provides the following extraction methods:

  • Docconv binary
  • Pdftotext binary
  • Python Pdf2txt binary
  • Solr built-in extractor (Search API Solr)
  • Tika App JAR
  • Tika Server JAR

History

This project is a fork of Search API Attachments. More information on the module origins on: #3126845: Version 2.0.0

Requirements

Each extractor plugin can require different modules/libraries, if the requirements are not satisfied the plugin doesn't show up in the settings.

Each extractor plugin can require different binary on your server, when configuring the extraction, a test will be done to see if the extraction works. Also you can read the module documentation to see installation instructions for extractor plugins.

Configuration

  • Enable the File Extractor module on your site.
  • Go to the configuration page (/admin/config/media/file-extractor) and configure the extraction settings.

The module provides its own cache bin 'file_extractor', so in your settings.php file you can override the cache backend for this cache bin. For example if you want to use the File Cache module:

$settings['cache']['bins']['file_extractor'] = 'cache.backend.file_system';

Maintainers

Activity

Total releases
3
First release
Mar 2025
Latest release
3 months ago
Release cadence
143 days
Stability
67% stable

Release Timeline

Releases

Version Type Release date
5.0.0 Stable Dec 20, 2025
5.x-dev Dev Dec 5, 2025
4.2.1 Stable Mar 10, 2025