ai_pageindex

AI PageIndex integrates the VectifyAI PageIndex approach into Drupal 11's AI module ecosystem. It indexes site content as hierarchical tree structures rather than embedding vectors, then retrieves answers to natural-language queries using LLM reasoning over that tree. No vector database required.

It registers as a Vector Database (VDB) provider for the AI module and works with Search API and the AI Search Block.

Features

Vectorless retrieval. Documents are indexed as hierarchical JSON trees (titles, section ranges, summaries). It generates no embeddings and needs no vector database.
LLM reasoning for search. When a query arrives, an LLM navigates the tree, identifies relevant sections, and returns scored excerpts with page-level citations.
Plugs into the AI module. Implements AiVdbProviderInterface and AiVdbProviderSearchApiInterface, appearing in the VDB provider list and working as a Search API backend.
Multi-LLM support. The microservice uses LiteLLM. OpenAI, Anthropic Claude, and local Ollama models all work.
Self-hostable. The required FastAPI microservice ships with a Dockerfile and a DDEV sidecar compose file in the module repository. It has no external SaaS dependency.
Entity map table. A lightweight ai_pageindex_entity_map table maps Drupal Search API item IDs to PageIndex document IDs. The table handles deletions and re-indexing cleanly.

When to use this module: Good for Drupal sites that host structured documents (PDFs, reports, manuals, legal filings, policy documents) where visitors ask natural-language questions and need answers with explicit section or page citations. It works best on content with clear structural hierarchy. It is not a replacement for general keyword search across standard short-form node content.

Post-installation

Installation requires two parts: the Drupal module and the PageIndex FastAPI microservice. The microservice source code (main.py, Dockerfile, requirements.txt) lives in the pageindex-service/ directory of the module's git repository.

Part 1: Start the microservice

1. Add your LLM API key to the DDEV environment

Edit .ddev/docker-compose.pageindex.yaml and set your key under environment:

# .ddev/docker-compose.pageindex.yaml
environment:
  PAGEINDEX_WORKSPACE_ROOT: /data/workspaces
  LLM_MODEL:                gpt-4o-2024-11-20
  RETRIEVE_MODEL:           gpt-4o-2024-11-20
  OPENAI_API_KEY:           "sk-..."   # or ANTHROPIC_API_KEY, etc.
  PAGEINDEX_API_KEY:        ""         # set a secret token in production

2. Start the service

ddev restart

The service starts at http://pageindex:8765 inside the DDEV network.

3. Confirm it is running

ddev exec curl -s http://pageindex:8765/health
# expected: {"status":"ok"}

Self-hosted / production: Use docker compose up -d with the included Dockerfile. Expose the service only to your Drupal application server, place it behind HTTPS, and set a strong PAGEINDEX_API_KEY bearer token.

Part 2: Configure the module

Go to Administration → Configuration → AI → VDB Providers → AI PageIndex (/admin/config/ai/vdb_providers/ai_pageindex) and set:

Service URL: http://pageindex:8765 for DDEV local; your Docker service hostname in production. No trailing slash.
API bearer token: select the Key module entry holding your PAGEINDEX_API_KEY value (leave blank to disable auth in development).
Indexing model and Retrieval model: LiteLLM-format model strings, e.g. gpt-4o-2024-11-20, anthropic/claude-sonnet-4-6, or ollama/mistral.

A Service status: Connected / Unreachable badge confirms the microservice is reachable from Drupal.

Part 3: Create a Search API server and index

Go to Administration → Configuration → Search API → Add server. Choose AI Search as the backend, then choose PageIndex (Vectorless Reasoning) as the vector database provider.
Create a Search API index pointing at your content datasource. Add the fields you want searchable (Title, Body, etc.) and set their Indexing option: Main content for body text, Contextual content for title.
Run indexing: drush search-api:index or via cron. Each item's text is sent to the microservice, which builds a tree index and stores a doc_id in the ai_pageindex_entity_map table.
Expose search via a Drupal View using Index [your index] as the data source with a fulltext search exposed filter, or use the ai_search_block submodule of the AI module for a ready-made Q&A block.

Requirements

Drupal modules

AI module: provides the VDB provider plugin system and the ai_search submodule used as the Search API backend.
Search API: required for the Search API backend integration.
Key: stores the microservice bearer token.

External infrastructure

PageIndex FastAPI microservice: a Python service that must run alongside Drupal. Source code, Dockerfile, and DDEV sidecar compose file are in the module's git repository under pageindex-service/. Requires Python 3.11+, FastAPI, and LiteLLM.
An LLM provider API key for tree-building and reasoning. OpenAI (gpt-4o recommended), Anthropic, or a self-hosted Ollama instance for fully offline deployments.

Recommended modules

AI Search Block (ai_search_block submodule of the AI module): a ready-made Q&A block that works with any VDB provider including this one.
LiteLLM AI Provider: if you run a LiteLLM proxy, this lets Drupal route all LLM calls through it.

Fully offline / air-gapped deployments: Ollama running a local model (e.g. Mistral, Llama 3) removes all dependency on external APIs. Set LLM_MODEL=ollama/mistral in the microservice environment and point OLLAMA_BASE_URL at your Ollama container.

How AI PageIndex differs from embedding-based VDB providers

The AI module ecosystem includes VDB providers for traditional vector database backends (Pinecone, Weaviate, Milvus, Chroma). These store document chunks as high-dimensional embedding vectors and retrieve results using mathematical similarity. AI PageIndex works differently:

Capability Embedding-based VDB providers AI PageIndex Vector database required Yes No Embeddings generated at index time Yes No Retrieval mechanism Cosine / inner-product similarity LLM reasoning over document tree Section / page citations in results No Yes Why a result is relevant Mathematical distance score only LLM explains its reasoning Best for High-volume short-form content Structured long-form documents FinanceBench accuracy (VectifyAI) ~60-75% 98.7% Query latency ~100-500 ms 2-12 s (LLM-bound) Index time per large document Seconds Minutes (LLM-bound)

AI PageIndex is not a replacement for general keyword search or high-volume real-time retrieval. Use it when accuracy on structured documents and traceable citations matter more than speed.

Supporting this module

This module is maintained as a personal contribution project. There is no funding page at this time.

Testing it, filing issues in the queue, and submitting merge requests are the most useful contributions.

AI Disclosure.

AI coding assistance helped write this module. The plugin structure, FastAPI microservice, and documentation were largely AI-generated. All decisions about what to build, code review, and testing were done by the code committer.

Version	Type	Release date
1.0.0-alpha1	Pre-release	Jun 27, 2026
1.0.x-dev	Dev	Jun 27, 2026