fuzzy_term_merge

Fuzzy Term Merge helps site administrators find and clean up near-duplicate taxonomy terms. It uses fuzzy string matching to surface likely duplicates within a vocabulary, then walks the administrator through a guided three-step wizard to review candidates and merge them. The actual merge — re-pointing all entity references to the surviving term and deleting the duplicate — is delegated to the Term Merge module.

A Drush command is also provided for command-line audits, useful as a first pass before opening the UI.

Use case

Taxonomy vocabularies on long-lived sites accumulate duplicate or near-duplicate terms over time — especially when terms are created by multiple editors or imported from external sources. Examples include "Aluminium" vs "Aluminum", "T-Shirt" vs "Tshirt", or "New York" vs "New York City". These duplicates fragment content, break faceted search, and make reporting unreliable. Fuzzy Term Merge automates the discovery step and provides a safe, reviewable workflow for resolving them.

Features

Fuzzy duplicate detection using PHP's built-in similar_text() (percentage similarity) and levenshtein() (edit distance) — no external libraries required.
Configurable similarity threshold and minimum term length to tune how aggressively candidates are surfaced.
Three-step wizard UI: configure analysis → choose merge direction per pair → confirm and execute.
AJAX-driven results table — adjust parameters without a full page reload.
Node usage count displayed per term to help identify which term is more canonical.
Multilingual support — analyze any installed content language; term labels fall back to the default translation when no translation exists in the selected language.
Batch API execution — both the fuzzy analysis and the merges run through Drupal's batch system, making the module safe for vocabularies of any size.
Drush command (ftm-tfd) for command-line audits, including exact case-insensitive duplicate detection.
No custom database schema — wizard state uses PrivateTempStore (session-scoped, per-user, discarded on logout).
No custom permissions — access is gated on the existing merge taxonomy terms permission from Term Merge Manager.

Requirements

Requirement Notes Drupal 10 or 11 PHP 8.1+ term_merge Performs the actual merge and reference re-pointing term_merge_manager Ships with term_merge; provides the merge taxonomy terms permission Drush 9, 10, or 12 Optional — only required for the ftm-tfd Drush command

similar_text() and levenshtein() are part of PHP core; no additional extensions are required.

Installation

Install via Composer (recommended):

composer require drupal/fuzzy_term_merge

Enable the module alongside its dependencies:

drush en term_merge term_merge_manager fuzzy_term_merge -y
drush cr

Grant the merge taxonomy terms permission to any role that should access the wizard. This permission is provided by the term_merge_manager module and is typically assigned to the administrator role or a dedicated content editor role.

Where it appears in Drupal

Tab on each vocabulary admin page

After enabling the module, a Fuzzy merge tab appears alongside the standard tabs (List terms, Manage fields, etc.) on every vocabulary administration page at /admin/structure/taxonomy/manage/{vocabulary}.

Operation link on the vocabulary list

A Fuzzy merge link is added to the per-vocabulary operations dropdown at /admin/structure/taxonomy, providing a direct shortcut to the analysis step for that vocabulary.

Both entry points are hidden from users who lack the merge taxonomy terms permission.

The wizard

Step 1 — Analysis

Select the vocabulary to analyze and configure three detection parameters:

Parameter Default Description Language Current content language Only terms with a translation in this language are compared. Similarity threshold 80% Pairs below this percentage are excluded. Lower values surface more — and looser — candidates. Minimum term length 3 Terms shorter than this are excluded. Prevents short codes or abbreviations from flooding results.

Changing any parameter rebuilds the results table via AJAX. Each row shows both term names, their term IDs (tids), similarity percentage, and Levenshtein edit distance. Check the pairs you want to act on, then click Review selected pairs.

Large vocabularies: For vocabularies with more than 500 terms, the pairwise comparison runs as a Drupal batch job rather than a synchronous request. A progress bar is displayed while analysis runs; results are presented in the same table once complete.

Step 2 — Direction

For each selected pair, choose one of three actions:

Option Effect Keep A Term A survives; Term B is deleted and all its references re-pointed to A. Keep B Term B survives; Term A is deleted and all its references re-pointed to B. Skip Exclude this pair from the current merge run.

Each term displays its tid and node usage count — the number of nodes that reference it — to help identify the more canonical term worth keeping.

Step 3 — Confirm

A summary table lists all planned merges (term to delete → term to keep). A prominent warning makes clear that the operation is permanent and cannot be undone through this UI. Click Merge terms to proceed or Cancel to return to the direction step.

Merges execute through Drupal's Batch API, one operation per pair, so large sets complete safely without hitting PHP timeouts. On completion, the wizard redirects to the vocabulary overview and displays a status message with the count of merged and skipped terms.

Drush command

The fuzzy_term_merge:taxonomy-fuzzy-duplicates command (alias: ftm-tfd) audits one or all vocabularies from the command line and prints a report without making any changes.

# Audit all vocabularies at the default 80% threshold
drush ftm-tfd

# Audit a single vocabulary
drush ftm-tfd --vocabulary=tags

# Loosen the threshold to surface more candidates
drush ftm-tfd --vocabulary=categories --threshold=70

# Exclude short terms from comparison
drush ftm-tfd --min-length=5

# Audit a non-default language
drush ftm-tfd --langcode=fr

Options

Option Default Description --vocabulary all vocabularies Limit analysis to a single vocabulary machine name. --threshold 80 Minimum similarity percentage (0–100). --min-length 3 Exclude terms shorter than this many characters. --langcode site default language Language code to analyze.

The report includes two sections per vocabulary:

Exact duplicates (case-insensitive) — terms with identical names after normalisation. These are unambiguous duplicates and strong candidates for immediate cleanup.
Fuzzy matches — pairs above the threshold, sorted by similarity descending, with percentage, edit distance, term name, and tid.

When findings are present, the command also prints the URL of the UI wizard for that vocabulary so you can jump straight to the merge workflow.

Permissions

This module defines no permissions of its own. All access is controlled by the merge taxonomy terms permission, which is provided by the term_merge_manager module (part of the Term Merge project).

Known limitations

Node usage counts are drawn from Drupal core's taxonomy_index table, which covers nodes only. References from other entity types (commerce products, paragraphs, media, etc.) are not reflected in the count.
Merges are permanent. There is no undo — ensure you have a database backup before running a large merge set.
The Levenshtein function in PHP is limited to strings of 255 bytes or fewer. Longer term names display "n/a" for edit distance (the similarity percentage is unaffected).

Technical notes

The matching algorithm normalises term names with mb_strtolower(trim()) before comparison and skips pairs whose normalised names are identical (those appear in the exact duplicates section of the Drush report instead).
For vocabularies over 500 terms, the O(n²) pairwise analysis runs as a Drupal batch job to avoid PHP timeouts on large term sets.
Wizard state is stored in PrivateTempStore, keyed per user. No state leaks between users; state is automatically discarded on logout.
The module installs no database tables. It reads from core's taxonomy_term_field_data and taxonomy_index tables directly.
All merge execution is handled by the term_merge.term_merger service from the Term Merge module — this module only orchestrates discovery and user workflow.