ai_vdb_provider_elasticsearch

Elasticsearch AI VDB Provider

An Elasticsearch backend for the Drupal AI module and its AI Search submodule. Stores and retrieves embeddings using Elasticsearch's native dense_vector field with HNSW kNN, so semantic search and RAG work directly against your existing Elastic Stack — no separate vector database to operate.

Why this module

As of April 2026, this is the only Drupal AI VDB provider that runs hybrid search (kNN + BM25) in a single Elasticsearch request, merged with Reciprocal Rank Fusion (RRF).

Capability This module Other VDB providers Approximate kNN (HNSW) Yes Yes Access-control pre-filtering Yes Yes Hybrid search (kNN + BM25 + RRF) Yes No Reuses an existing Elastic Stack Yes No

Requirements

Drupal 10.3 or 11
AI module (includes the AI Search submodule)
Key module for credential storage
An Elasticsearch 8.x cluster with dense_vector support — 8.8+ for hybrid search (RRF)
An AI Provider module for embeddings (OpenAI, Anthropic, Ollama, etc.)

Install

composer require drupal/ai_vdb_provider_elasticsearch
drush en ai_vdb_provider_elasticsearch

Configure

Store credentials. At /admin/config/system/keys, create a Key entity holding either an Elasticsearch API key (recommended) or a Basic Auth password. Skip for unauthenticated local clusters.
Configure the provider. At /admin/config/ai/vdb_providers/elasticsearch, set the host URL, point at your Key entity, choose an Index Prefix (e.g. drupal_) and a Similarity Metric (cosine for normalized embeddings from commercial LLMs — the safe default).
Wire up Search API. At /admin/config/search/search-api, add a Server with AI Search (VDB) as the backend, select Elasticsearch as the VDB Provider, attach your AI Provider for embeddings, then add an Index and run it.

The Elasticsearch index is created automatically on first use. The similarity metric and embedding dimensions are baked into the mapping at creation time — changing either requires dropping and re-indexing.

Hybrid search (kNN + BM25)

Pure vector search is excellent at conceptual matching but fails on exact lexical lookups — product SKUs, acronyms, proper names. With hybrid search enabled, every query runs both a kNN vector search and a BM25 keyword match in one Elasticsearch request, then merges the two ranked lists with RRF. No manual weight tuning needed.

Content profile Recommendation Mostly narrative prose, queried by concept Pure kNN (hybrid off) Codes, acronyms, proper names mixed in Hybrid on RAG agent over a mixed knowledge base Hybrid on Elasticsearch < 8.8 Hybrid not available

License note. RRF is a commercial Elastic feature on 8.x. The basic (free) license returns 403 license non-compliant for [Reciprocal Rank Fusion (RRF)]; either start a Platinum trial or leave the toggle off. Pure kNN works on the basic license.

Documentation

Detailed walkthroughs live in the project repository:

Indexing PDFs, DOCX and Markdown — the Search API Attachments + bundled php_pdfparser_extractor pipeline.
Building a RAG chatbot — AI Assistant + AI Chatbot wired to this module.
Local development with DDEV — Elasticsearch + Kibana sidecars, smoke tests.
Troubleshooting — common errors and fixes.

Index field mapping

The module maps these fields explicitly at index creation; everything else ai_search sends through is accepted via dynamic: true.

Field Type Purpose vector dense_vector Embedding (HNSW-indexed) entity_id keyword Drupal entity ID entity_type keyword Pre-filter (e.g. node) bundle keyword Bundle pre-filter langcode keyword Language pre-filter chunk_id keyword Chunk identifier within an entity content text Plain text used by BM25 in hybrid mode

Maintainers

Ricardo Amaro
Looking for co-maintainers — open an issue or reach out.

Version	Type	Release date
1.0.0-rc1	Pre-release	May 9, 2026
1.0.0-alpha5	Pre-release	May 2, 2026
1.0.0-alpha4	Pre-release	Mar 28, 2026
1.0.x-dev	Dev	Mar 25, 2026
1.0.0-alpha3	Pre-release	Mar 23, 2026
1.0.0-alpha2	Pre-release	Mar 23, 2026
1.0.0-alpha1	Pre-release	Mar 23, 2026