rag_search

RAG Search introduces advanced Retrieval-Augmented Generation (RAG) capabilities to your Drupal site, bridging the gap between your content architecture and Large Language Models (LLMs).

By leveraging vector embeddings and semantic search, this module allows developers and site builders to create highly contextual, conversational search experiences that go beyond traditional keyword matching.

A user question is matched against a Search API index backed by a vector database, the retrieved chunks are injected as context into an LLM prompt, and the answer is returned to the user and cached (with optional L1 exact-match and L2 semantic layers).

The module ships with an optional submodule — AI Search - Semantic Chunking (ai_search_sc) — that adds an embedding-similarity chunking strategy to AI Search as a drop-in alternative to token-based chunking. Recommended for long-form prose where fixed-token windows produce awkward splits mid-paragraph. See modules/ai_search_sc/README.md.

Two entry points, one pipeline. Both paths converge on RagSearchProcessService::handleQuestion():

Dynamic route — enable it at RAG Search Settings, set a path and a title, and a public form becomes available at the chosen path. Toggling the route or changing its path/title requires a router rebuild (drush cr or drush router:rebuild).
Block — place a RAG Search block on any page. Every setting can be overridden at the block level; empty fields fall back to module config.

Context-aware responses. Drupal-indexed content is fed to the LLM as context so answers stay grounded in your site's data. The pipeline:

Run a Search API ->keys($question) query against the configured index.
Concatenate chunk content into a single context blob.
Substitute {{ context }}, {{ empty }}, and {{ format }} in the main prompt and hand the resulting System + User messages to the default chat provider.
Return the LLM's text answer (or the configured empty fallback on error / no results).

Two-layer cache. Both layers have independent enable flags and TTLs (in minutes).

L1 — exact-match cache. Keyed on a normalized question (transliterated + non-alphanumeric stripped) together with the format hash, the source index ID, and the current user's roles. Stored in cache.default and tagged with search_api_index:<id> so it invalidates when the source index is cleared.
L2 — semantic cache. Backed by a dedicated l2_cache_entry content entity. A new question is matched semantically against previous questions via a second Search API index; on a hit, the cached answer is returned (and promoted into L1). Access to every source chunk is re-verified against the current user before a cached answer is served, so cache hits cannot leak content the user has lost access to. Expired entries are purged in bounded batches on cron.

Chunk limit. Cap the number of chunks passed to the LLM per request so you can trade context breadth against prompt cost and latency.

Formatting options. A free-form formatting instruction (Markdown, bullet list, tone, etc.) is automatically injected into the main prompt via the dedicated formatting-options prompt, keeping formatting behavior editable from the AI prompt UI rather than hard-coded.

Rate limiting. An optional site-wide fixed-window rate limiter protects the question endpoint from spam and abuse. The limit is shared across the block and the dynamic route — the same bucket is consulted regardless of entry point, so an abuser cannot sidestep the ceiling by switching surfaces. It's backed by Drupal core's FloodInterface and is off by default; admins opt in from the settings form. When the limit is exceeded, a configurable (translatable) message is shown in place of the answer, no embedding or LLM call is made, and the blocked response is never written to the L1/L2 caches.

Requirements

This module requires the following modules:

In addition, you will need an AI provider and a vector database provider enabled and pre-configured. The module is provider-agnostic — it uses whichever provider is set as the default for the chat operation type via the ai.provider service.

Recommended modules

Tested provider combinations:

AI provider, e.g. Gemini Provider
VDB provider, e.g. Milvus VDB Provider (ships with a ddev integration)

Optional submodule shipped with this project (already mentioned in the Introduction):

AI Search - Semantic Chunking (ai_search_sc) — adds a Semantic Embedding Strategy to AI Search that splits chunks at embedding-similarity breakpoints instead of fixed token windows. Preferable for long-form prose. See modules/ai_search_sc/README.md.

Installation

Install as you would normally install a contributed Drupal module. For further information, see Installing Drupal Modules.

Configuration

Prerequisites on other modules:

Set up AI provider at AI Providers.
Set up AI settings (default provider for the chat operation type) at AI Settings.
Set up Search API server + index + fields + AI processor at Search API.

Module configuration:

Main settings form: RAG Search Settings.
Every setting available in the settings form can be overridden on a per-block basis when placing a RAG Search block. Leaving a block field empty means "inherit from the module settings".

Permissions

The module defines two permissions:

Administer RAG Search settings (administer rag_search settings) — required to configure the module, edit the dynamic route, manage block defaults, and administer L2 cache entries. Restricted access.
Bypass RAG Search rate limit (bypass rag_search rate limit) — grants the account a full pass on the rate-limiter gate. Intended for trusted operator/QA roles that need to burst through the limit while testing. Restricted access; not granted to any role by default.

Prompts

The two prompts used by the module are shipped as install config and exposed through the AI module's prompt UI, so you can edit them without a code deploy:

Config object Purpose ai.ai_prompt_type.rag The prompt type that groups the RAG prompts. ai.ai_prompt.rag__rag_search_main_prompt The main system prompt. Supports {{ context }}, {{ empty }}, {{ format }}. ai.ai_prompt.rag__rag_search_formatting_options The optional instruction appended to the main prompt when formatting options are provided. Supports {{ format }}.

L2 semantic cache

L2 requires its own Search API server and index configured against the l2_cache_entry entity. Recommended indexing settings per base field:

Field Indexing option Question Main Content Answer Ignore Chunk IDs Ignore Expires Filterable attributes Format hash Filterable attributes Source index ID Filterable attributes

Rate limiting

Configured from the Rate limiting fieldset on the main settings form. The limit is a single site-wide policy (not per-block overridable) and ships disabled.

Setting Purpose Enable rate limiting Master on/off toggle. Disabled by default. Max requests per window Questions allowed per uid:ip bucket in the window. 0 no-ops the gate (fail-open). Window (seconds) Rolling window size. 0 no-ops the gate. Rate limit exceeded message Message shown to the user when the gate fires. Translatable per language via Config Translation.

Grant the Bypass RAG Search rate limit permission to any role that should skip the limiter (e.g. an internal operator/QA role). Block events are logged at notice level on the rag_search watchdog channel with the uid:ip identifier, so operators can distinguish throttling from "no results" in dblog.

Uninstall

On uninstall, all stored L2 cache entries (l2_cache_entry) are deleted in bounded batches so leftover rows don't persist after the module is removed.

Troubleshooting

The dynamic route path doesn't work after I enable or change it: Route registration is not hot-reloaded. Run drush cr (or drush router:rebuild) after toggling the route state, changing its path, or changing its title.

Every question returns the empty fallback message: Confirm a default chat provider is set at AI Settings, the Search API index returns results for the query, and the index has the AI Search backend with vector embeddings configured.

L2 cache never hits: Confirm L2 is enabled in module settings, its TTL is non-zero, and the dedicated Search API index for l2_cache_entry exists and is populated. Entries are promoted to L1 only after an L2 hit, so L1 misses alone don't indicate L2 is broken.

Every test question returns the rate-limit message: You've enabled the rate limiter and are hitting the ceiling from the same uid:ip bucket used across tests. Either raise the ceiling in RAG Search Settings, grant your test account the Bypass RAG Search rate limit permission, or clear the flood table (DELETE FROM flood WHERE event = 'rag_search.question').

FAQ

Q: Do I need both L1 and L2 caches?

A: No. Each layer has an independent enable flag and TTL. L1 alone gives exact-match caching; L2 adds semantic recall but requires a second Search API index against l2_cache_entry.

Q: Can I use a different LLM provider for RAG than for the rest of my site?

A: The module uses whichever provider is set as the default for the chat operation type. Provider selection is delegated to the AI module rather than wrapped here, so RAG-specific routing is configured at the AI layer.

Q: How do I edit the prompt without a code deploy?

A: The prompts ship as install config and are editable through the AI module's prompt UI. See Prompts under Configuration for the config object names.

Q: Is the rate limiter shared between the block and the dynamic route?

A: Yes. Both entry points reach the same handleQuestion() chokepoint, which consults one flood event (rag_search.question) with a uid:ip identifier. A user cannot sidestep the limit by switching from the block to the route (or vice-versa). Grant Bypass RAG Search rate limit to roles that legitimately need to burst through the gate.

Maintainers

Dany Almeida Kairouz - dany.almeida.kairouz

Supporting this Module

Buy me a hot chocolate :)

Version	Type	Release date
1.0.5	Stable	Apr 17, 2026
1.0.4	Stable	Apr 16, 2026
1.0.3	Stable	Mar 29, 2026
1.0.2	Stable	Mar 28, 2026
1.0.1	Stable	Mar 24, 2026
1.0.0	Stable	Mar 24, 2026
1.0.x-dev	Dev	Mar 16, 2026