drupal_rag

Drupal RAG

Transforms your Drupal site into a Retrieval-Augmented Generation (RAG) system. Content entities are indexed as vector embeddings and retrieved at query time to provide relevant context to large language models - all without sending your data to third-party services.

𝗪𝗵𝘆 𝘆𝗼𝘂'𝗱 𝘂𝘀𝗲 𝗶𝘁. Internal knowledge bases, technical documentation, public administration records, document archives, customer support. Anywhere you have Drupal content that needs to be queried in natural language, without sending data to third parties.

👇 Check out the “Coming Soon” section to see what's in store for upcoming releases.

Requirements

Drupal 11
PostgreSQL with the pgvector extension
Ollama running locally or on your network

How it works

Indexing pipeline

When content is created, updated, or deleted, the module intercepts the entity hook events and pushes them into a processing queue:

Entity event - entity_insert, entity_update, or entity_delete fires (node, media, file, etc.)
Filter - only entity types selected in the configuration form are queued; unpublished entities are skipped
Extract - the queue worker extracts plain text from the entity. File entities (PDF, DOCX, TXT, etc.) are parsed by the FileTextExtractor. All other entities are rendered using the configured view mode and stripped of HTML.
Chunk - text is split into overlapping chunks (configurable size and overlap). The chunker respects sentence boundaries to keep logical units intact.
Embed - each chunk is sanitized, prepended with a search_document: prefix, and sent to Ollama's /api/embed endpoint using the configured model (e.g. nomic-embed-text) to generate a vector embedding.
Store - chunks and their embeddings are stored in a pgvector-enabled PostgreSQL table with a native vector column and HNSW index for fast similarity search. Old embeddings for the same entity are deleted before insertion (upsert pattern).

Query pipeline

Three API endpoints are available:

POST /api/rag/query - accepts a query string and returns the most semantically similar chunks with similarity scores.

The query text is sanitized, prepended with a search_query: prefix, and embedded using the same Ollama model
A cosine similarity search is executed against the vector store using pgvector's <=> operator
Results are filtered by min_score and limited by the limit parameter
Each result includes entity_type, entity_id, entity_label, bundle, chunk text, embedding model, similarity score, and language code

POST /api/rag/prompt - returns an assembled prompt ready to be sent to any LLM.

Retrieves relevant chunks (same as /api/rag/query)
Formats each chunk as [Document: entity_type:entity_id `label`] followed by the text
Loads the configurable prompt template from the settings form
Replaces the {{context}} and {{query}} placeholders
Returns the assembled prompt string plus source metadata

POST /api/rag/augment - assembles the prompt, sends it to Ollama for generation, and returns the LLM response along with source metadata and the assembled prompt.

Builds the prompt (same as /api/rag/prompt)
Sends the prompt to Ollama's /api/chat endpoint using the configured chat model (or embedding model as fallback)
Returns the generated response plus source metadata and the full prompt

Configuration

The admin can configure everything via the settings form at /admin/config/search/drupal-rag:

Enabled entity types - select which content types should be indexed
Chunk size - maximum characters per chunk (100–10000)
Chunk overlap - characters shared between consecutive chunks (0–5000)
Ollama base URL - the Ollama server address
Embedding model - the model used for generating embeddings (fetched live from Ollama)
Chat model - the model used for response generation via /api/rag/augment (optional, defaults to embedding model)
View mode - how entities are rendered for text extraction
RAG prompt template - custom template for the augmented prompt with {{context}} and {{query}} placeholders

Status page

A read-only status page at /admin/reports/drupal-rag shows indexed entities grouped by type with chunk counts.

Drush commands

drupal-rag:queue-all (alias rag:qa) - queue all published entities of enabled types for indexing

File text extraction

Supported file formats for text extraction from file/media entities:

Plain text: txt, csv, json, xml, md, markdown, log, yml, yaml
Office documents: docx, xlsx, pptx, odt, ods, odp
PDF: pdf (via prinsfrank/pdf-parser)

Database

The module requires a PostgreSQL connection named 'pgvector' in settings.php. It uses two tables:

drupal_rag_embeddings - stores entity metadata, chunk text, and native pgvector embeddings with an HNSW index for cosine similarity search
drupal_rag_queue - dedicated queue table for entity processing, isolated from the default Drupal queue

Deduplication

Entity types are filtered by the enabled_entity_types configuration before queuing
Only published entities are indexed (entities implementing EntityPublishedInterface)
The entity_presave hook is disabled to prevent double queuing on updates
storeEmbedding() deletes existing chunks for an entity before inserting new ones, so reprocessing is idempotent

Permissions

access rag query - allows access to the API endpoints
administer drupal rag - allows configuration of the module (restricted)

Services

The module is fully object-oriented with registered Drupal services:

OllamaClient - HTTP client for the Ollama API (embed, chat, model listing)
FileTextExtractor - parses files into plain text (TXT, DOCX, XLSX, PPTX, ODF, PDF, and more)
EntityExtractor - converts Drupal entities into text (file entities via FileTextExtractor, others via view mode)
Chunker - splits text into overlapping chunks with sentence boundary awareness
EmbeddingService - sanitizes text and generates embeddings via Ollama
VectorStorage - pgvector database operations (table management, store, similarity search)
RagQueryService - retrieval logic: query → embed → similarity search → results
AugmentService - prompt assembly with configurable template + Ollama chat generation
EntityHooks - event-to-queue bridge with entity type and published status filtering

Coming soon... (1.0.0-alpha6)

Caching

The module implements a query embedding cache (Layer 1). When a user submits a query, the text is hashed together with the current model name and checked against the cache.drupal_rag_bin cache bin. On a hit, the embedding step is skipped entirely. The cache is invalidated automatically when the embedding model is changed in the settings form.

A second caching layer is planned: caching the extracted text of each entity to avoid re-rendering and re-parsing files (PDF, DOCX, etc.) on repeated processing.

Vector dimension

The vector dimension is no longer hardcoded. It is configurable via the drupal_rag.settings form (embedding_dimension), allowing the module to work with any embedding model regardless of output size. The dimension is stored in the vector(n) column definition at index time. The HNSW index is rebuilt automatically when the dimension changes.

Version	Type	Release date
1.0.0-alpha5	Pre-release	May 21, 2026
1.0.0-alpha4	Pre-release	May 21, 2026
1.0.0-alpha3	Pre-release	May 21, 2026
1.0.0-alpha2	Pre-release	May 21, 2026
1.0.0-alpha1	Pre-release	May 21, 2026