drupal_rag
Drupal RAG
Transforms your Drupal site into a Retrieval-Augmented Generation (RAG) system. Content entities are indexed as vector embeddings and retrieved at query time to provide relevant context to large language models - all without sending your data to third-party services.
Requirements
- Drupal 11
- PostgreSQL with the pgvector extension
- Ollama running locally or on your network
How it works
Indexing pipeline
When content is created, updated, or deleted, the module intercepts the entity hook events and pushes them into a processing queue:
- Entity event - entity_insert, entity_update, or entity_delete fires (node, media, file, etc.)
- Filter - only entity types selected in the configuration form are queued; unpublished entities are skipped
- Extract - the queue worker extracts plain text from the entity. File entities (PDF, DOCX, TXT, etc.) are parsed by the FileTextExtractor. All other entities are rendered using the configured view mode and stripped of HTML.
- Chunk - text is split into overlapping chunks (configurable size and overlap). The chunker respects sentence boundaries to keep logical units intact.
- Embed - each chunk is sanitized, prepended with a search_document: prefix, and sent to Ollama's /api/embed endpoint using the configured model (e.g. nomic-embed-text) to generate a vector embedding.
- Store - chunks and their embeddings are stored in a pgvector-enabled PostgreSQL table with a native vector column and HNSW index for fast similarity search. Old embeddings for the same entity are deleted before insertion (upsert pattern).
Query pipeline
Three API endpoints are available:
POST /api/rag/query - accepts a query string and returns the most semantically similar chunks with similarity scores.
- The query text is sanitized, prepended with a search_query: prefix, and embedded using the same Ollama model
- A cosine similarity search is executed against the vector store using pgvector's <=> operator
- Results are filtered by min_score and limited by the limit parameter
- Each result includes entity_type, entity_id, entity_label, bundle, chunk text, embedding model, similarity score, and language code
POST /api/rag/prompt - returns an assembled prompt ready to be sent to any LLM.
- Retrieves relevant chunks (same as /api/rag/query)
- Formats each chunk as [Document: entity_type:entity_id `label`] followed by the text
- Loads the configurable prompt template from the settings form
- Replaces the {{context}} and {{query}} placeholders
- Returns the assembled prompt string plus source metadata
POST /api/rag/augment - assembles the prompt, sends it to Ollama for generation, and returns the LLM response along with source metadata and the assembled prompt.
- Builds the prompt (same as /api/rag/prompt)
- Sends the prompt to Ollama's /api/chat endpoint using the configured chat model (or embedding model as fallback)
- Returns the generated response plus source metadata and the full prompt
Configuration
The admin can configure everything via the settings form at /admin/config/search/drupal-rag:
- Enabled entity types - select which content types should be indexed
- Chunk size - maximum characters per chunk (100–10000)
- Chunk overlap - characters shared between consecutive chunks (0–5000)
- Ollama base URL - the Ollama server address
- Embedding model - the model used for generating embeddings (fetched live from Ollama)
- Chat model - the model used for response generation via /api/rag/augment (optional, defaults to embedding model)
- View mode - how entities are rendered for text extraction
- RAG prompt template - custom template for the augmented prompt with {{context}} and {{query}} placeholders
Status page
A read-only status page at /admin/reports/drupal-rag shows indexed entities grouped by type with chunk counts.
Drush commands
drupal-rag:queue-all(aliasrag:qa) - queue all published entities of enabled types for indexing
File text extraction
Supported file formats for text extraction from file/media entities:
- Plain text: txt, csv, json, xml, md, markdown, log, yml, yaml
- Office documents: docx, xlsx, pptx, odt, ods, odp
- PDF: pdf (via prinsfrank/pdf-parser)
Database
The module requires a PostgreSQL connection named 'pgvector' in settings.php. It uses two tables:
- drupal_rag_embeddings - stores entity metadata, chunk text, and native pgvector embeddings with an HNSW index for cosine similarity search
- drupal_rag_queue - dedicated queue table for entity processing, isolated from the default Drupal queue
Deduplication
- Entity types are filtered by the enabled_entity_types configuration before queuing
- Only published entities are indexed (entities implementing EntityPublishedInterface)
- The entity_presave hook is disabled to prevent double queuing on updates
- storeEmbedding() deletes existing chunks for an entity before inserting new ones, so reprocessing is idempotent
Permissions
- access rag query - allows access to the API endpoints
- administer drupal rag - allows configuration of the module (restricted)
Services
The module is fully object-oriented with registered Drupal services:
- OllamaClient - HTTP client for the Ollama API (embed, chat, model listing)
- FileTextExtractor - parses files into plain text (TXT, DOCX, XLSX, PPTX, ODF, PDF, and more)
- EntityExtractor - converts Drupal entities into text (file entities via FileTextExtractor, others via view mode)
- Chunker - splits text into overlapping chunks with sentence boundary awareness
- EmbeddingService - sanitizes text and generates embeddings via Ollama
- VectorStorage - pgvector database operations (table management, store, similarity search)
- RagQueryService - retrieval logic: query → embed → similarity search → results
- AugmentService - prompt assembly with configurable template + Ollama chat generation
- EntityHooks - event-to-queue bridge with entity type and published status filtering