File Extractor

337 sites Security covered

Synopsis

This module adds a new computed field on File entity: "File extractor: extracted file".

This new field allows to access the content of the file:

in webservices like JSON:API
in a field formatter (file field)
in Search API

The module provides the following extraction methods:

Docconv binary
Pdftotext binary
Python Pdf2txt binary
Solr built-in extractor (Search API Solr)
Tika App JAR
Tika Server JAR

History

This project is a fork of Search API Attachments. More information on the module origins on: #3126845: Version 2.0.0

Requirements

Each extractor plugin can require different modules/libraries, if the requirements are not satisfied the plugin doesn't show up in the settings.

Each extractor plugin can require different binary on your server, when configuring the extraction, a test will be done to see if the extraction works. Also you can read the module documentation to see installation instructions for extractor plugins.

Configuration

Enable the File Extractor module on your site.
Go to the configuration page (/admin/config/media/file-extractor) and configure the extraction settings.

The module provides its own cache bin 'file_extractor', so in your settings.php file you can override the cache backend for this cache bin. For example if you want to use the File Cache module:

$settings['cache']['bins']['file_extractor'] = 'cache.backend.file_system';

Maintainers

Florent Torregrosa (Grimreaper)

Version

Type

Release date

5.0.0

Stable

Dec 20, 2025

5.x-dev

Dev

Dec 5, 2025

4.2.1

Stable

Mar 10, 2025

File Extractor

Synopsis

History

Requirements

Configuration

Maintainers

Activity

Release Timeline

Releases