ai_text_cleaner
AI Text Cleaner
AI Text Cleaner is a small Drupal module that provides a text filter and a Drush command to clean formatting quirks from ChatGPT or other LLM outputs.
It removes hidden or problematic characters, converts non-breaking spaces, normalizes dashes and quotes, converts ellipses, and strips Markdown-style headings and stray asterisks.
Features
- Text filter plugin that can be enabled on input formats to clean content on save.
- Options supported:
- Remove hidden/control characters
- Convert non-breaking spaces to regular spaces
- Normalize dashes (hyphen/en dash/em dash) to a standard hyphen
- Normalize quotes (curly “smart” quotes → straight quotes)
- Convert ellipses (…) to three periods (...)
- Remove trailing whitespace
- Remove stray asterisks and stray list artifacts
- Strip Markdown-style headings (leading # characters)
- Drush command
ai:text-clean(aliasai-text-clean) to run the cleaner across nodes in bulk. - Per-option statistics exported for reporting or CSV-like table output.
- Analysis mode (dry run) that reports what would change without saving.
- Ability to limit by content type (bundle), language, and maximum node count.
Dependencies
- Drupal core (module developed for Drupal 11)
- PHP 8.1 or newer
- Optional: Drush (to use the
ai:text-cleancommand) - Filter system: integrates with Drupal's core filter subsystem (ensure input formats are configured).
Installation
- Place the module in
web/modules/custom/ai_text_cleaner. - Enable the module (example using Drush):
drush en ai_text_cleaner -yTo automatically apply the filter, go to Configuration → Text formats and editors, edit the target input format, and enable the "AI Text Cleaner" filter at the desired weight.
Usage with Drush
Analyze (dry run) all nodes of type article:
drush ai:text-clean --types=article --analysis=TRUEApply cleaning and save changes for up to 50 nodes across all languages:
drush ai:text-clean --limit=50 --analysis=FALSEYou can supply multiple content types or languages as comma-separated lists, for example --types=article,page or --languages=en,de.