ai_vdb_provider_elasticsearch
An Elasticsearch backend for the Drupal AI module and its AI Search submodule. Stores and retrieves embeddings using Elasticsearch's native dense_vector field with HNSW kNN, so semantic search and RAG work directly against your existing Elastic Stack — no separate vector database to operate.
Why this module
As of April 2026, this is the only Drupal AI VDB provider that runs hybrid search (kNN + BM25) in a single Elasticsearch request, merged with Reciprocal Rank Fusion (RRF).
Capability This module Other VDB providers Approximate kNN (HNSW) Yes Yes Access-control pre-filtering Yes Yes Hybrid search (kNN + BM25 + RRF) Yes No Reuses an existing Elastic Stack Yes NoRequirements
- Drupal 10.3 or 11
- AI module (includes the AI Search submodule)
- Key module for credential storage
- An Elasticsearch 8.x cluster with
dense_vectorsupport — 8.8+ for hybrid search (RRF) - An AI Provider module for embeddings (OpenAI, Anthropic, Ollama, etc.)
Install
composer require drupal/ai_vdb_provider_elasticsearch drush en ai_vdb_provider_elasticsearch
Configure
- Store credentials. At
/admin/config/system/keys, create a Key entity holding either an Elasticsearch API key (recommended) or a Basic Auth password. Skip for unauthenticated local clusters. - Configure the provider. At
/admin/config/ai/vdb_providers/elasticsearch, set the host URL, point at your Key entity, choose an Index Prefix (e.g.drupal_) and a Similarity Metric (cosinefor normalized embeddings from commercial LLMs — the safe default). - Wire up Search API. At
/admin/config/search/search-api, add a Server with AI Search (VDB) as the backend, select Elasticsearch as the VDB Provider, attach your AI Provider for embeddings, then add an Index and run it.
The Elasticsearch index is created automatically on first use. The similarity metric and embedding dimensions are baked into the mapping at creation time — changing either requires dropping and re-indexing.
Hybrid search (kNN + BM25)
Pure vector search is excellent at conceptual matching but fails on exact lexical lookups — product SKUs, acronyms, proper names. With hybrid search enabled, every query runs both a kNN vector search and a BM25 keyword match in one Elasticsearch request, then merges the two ranked lists with RRF. No manual weight tuning needed.
Content profile Recommendation Mostly narrative prose, queried by concept Pure kNN (hybrid off) Codes, acronyms, proper names mixed in Hybrid on RAG agent over a mixed knowledge base Hybrid on Elasticsearch < 8.8 Hybrid not availableLicense note. RRF is a commercial Elastic feature on 8.x. The basic (free) license returns 403 license non-compliant for [Reciprocal Rank Fusion (RRF)]; either start a Platinum trial or leave the toggle off. Pure kNN works on the basic license.
Documentation
Detailed walkthroughs live in the project repository:
- Indexing PDFs, DOCX and Markdown — the Search API Attachments + bundled
php_pdfparser_extractorpipeline. - Building a RAG chatbot — AI Assistant + AI Chatbot wired to this module.
- Local development with DDEV — Elasticsearch + Kibana sidecars, smoke tests.
- Troubleshooting — common errors and fixes.
Index field mapping
The module maps these fields explicitly at index creation; everything else ai_search sends through is accepted via dynamic: true.
vector
dense_vector
Embedding (HNSW-indexed)
entity_id
keyword
Drupal entity ID
entity_type
keyword
Pre-filter (e.g. node)
bundle
keyword
Bundle pre-filter
langcode
keyword
Language pre-filter
chunk_id
keyword
Chunk identifier within an entity
content
text
Plain text used by BM25 in hybrid mode
Maintainers
- Ricardo Amaro
- Looking for co-maintainers — open an issue or reach out.