Drupal is a registered trademark of Dries Buytaert

Search API Japanese Normalizer is a module that provides a processor for the Drupal Search API module. This processor standardizes variations in Japanese text, improving search accuracy.

Features

This module normalizes Japanese text variations according to the following rules:

  • Convert full-width alphanumeric characters to half-width.
  • Convert half-width Katakana to full-width Katakana.
  • Normalize characters similar to hyphen-minus.
  • Normalize characters similar to the long vowel mark.
  • Replace consecutive long vowel marks with a single one.
  • Remove characters similar to the tilde (~).
  • Convert full-width symbols commonly used in half-width form to half-width.
  • Convert half-width symbols commonly used in full-width form to full-width.
  • Convert full-width spaces to half-width spaces.
  • Replace multiple consecutive half-width spaces with a single one.
  • Remove half-width spaces between "Hiragana, full-width Katakana, half-width Katakana, Kanji, and full-width symbols."
  • Remove half-width spaces between "Hiragana, full-width Katakana, half-width Katakana, Kanji, full-width symbols" and "half-width alphanumeric characters."

This module is implemented with reference to the normalization rules used in NEologd, a dictionary for morphological analyzers. For detailed conversion rules, please refer to NEologd Normalization Rules.

Example Conversions

Before After ドルーパル ドルーパル スーーパーーー スーパー アルゴリズム C アルゴリズムC

Post-Installation

After installation, the "Japanese Normalizer" processor will be added to the "Processors" tab in the Search API index settings. Enabling this processor will automatically correct variations in Japanese text, improving search accuracy.

Additional Requirements

The Search API module is required for this module to function. For setup instructions, please refer to the Search API module documentation.

Similar projects

  • Search API Kana Convert - A module specializing in converting between Hiragana, Katakana, and Romaji representations.

日本語による説明はこちら

Activity

Total releases
2
First release
Jan 2025
Latest release
1 year ago
Release cadence
2 days
Stability
0% stable

Releases

Version Type Release date
1.0.x-dev Dev Feb 2, 2025
1.0.0-beta1 Pre-release Jan 31, 2025