Search API Japanese Tokenizer is a Drupal module that segments and indexes Japanese text at the word level. By default, Drupal's standard search and the Search API module use N-gram segmentation, which can be imprecise for Japanese. This module improves search performance using advanced natural language processing without requiring external search engines like Apache Solr or Elasticsearch.

Japanese differs from English and many other languages in that there are no spaces between words. For example, in English, This is a pen. clearly separates words with spaces, making tokenization straightforward. However, in Japanese, これはペンです。 does not include spaces, making it challenging to determine word boundaries. Therefore, a specialized natural language processing technique is required to segment text properly.

By default, Drupal's standard search and the Search API module use N-gram segmentation, which has the following issues:

It is practically impossible to index and search for single-character words.
Unintended articles may appear in search results.
It only supports exact phrase matching.

The Search API Japanese Tokenizer module addresses these issues, enhancing Drupal’s search capabilities and improving the accuracy of Japanese search queries without relying on external search engines.

Features

Enables indexing and searching at the single-character level
Improves search accuracy by indexing at the word level
Uses machine learning-based tokenization
Supports index exclusion based on character type (TinySegmenter only)
Resolves variations in spelling when using morphological analysis
Allows exclusion of index entries based on part of speech when using morphological analysis

Post-Installation

Enable the Search API module

This module requires the Search API module to be enabled.
Install the module

Enable the search_api_japanese_tokenizer module.
Configure the search server and index

Go to /admin/config/search/search-api and configure the search server and search index.
Select the tokenizer

In the Search API processor settings, choose one of the following tokenizers:
- TinySegmenter tokenizer
- MeCab tokenizer
- Sudachi tokenizer
Disable the default Tokenizer processor

The default Tokenizer processor should be disabled, as enabling it may cause incorrect indexing.

Additional Requirements

This module requires the following components:

Search API module
Optional morphological analysis engines:
- TinySegmenter (no additional installation required)
- MeCab (must be installed on the server)
- Sudachi (must be installed on the server; since it is written in Java, a JRE is required)

If using MeCab or Sudachi, they must be installed on the server beforehand.

Recommended modules/libraries

Search API Japanese Normalizer: helps improve search accuracy by normalizing text, including unifying hiragana and katakana and converting between full-width and half-width characters.

Version	Type	Release date
2.0.0-beta1	Pre-release	Jul 13, 2026
1.0.0-alpha4	Pre-release	Feb 20, 2025
1.0.0-alpha3	Pre-release	Feb 18, 2025
1.0.0-alpha2	Pre-release	Feb 3, 2025
1.0.0-alpha1	Pre-release	Feb 2, 2025
1.0.x-dev	Dev	Feb 2, 2025

Search API Japanese Tokenizer

Features

Post-Installation

Additional Requirements

Recommended modules/libraries

Related articles (in Japanese)

Activity

Release Timeline

Releases