ai_pageindex
AI PageIndex integrates the VectifyAI PageIndex approach into Drupal 11's AI module ecosystem. It indexes site content as hierarchical tree structures rather than embedding vectors, then retrieves answers to natural-language queries using LLM reasoning over that tree. No vector database required.
It registers as a Vector Database (VDB) provider for the AI module and works with Search API and the AI Search Block.
Features
- Vectorless retrieval. Documents are indexed as hierarchical JSON trees (titles, section ranges, summaries). It generates no embeddings and needs no vector database.
- LLM reasoning for search. When a query arrives, an LLM navigates the tree, identifies relevant sections, and returns scored excerpts with page-level citations.
- Plugs into the AI module. Implements
AiVdbProviderInterfaceandAiVdbProviderSearchApiInterface, appearing in the VDB provider list and working as a Search API backend. - Multi-LLM support. The microservice uses LiteLLM. OpenAI, Anthropic Claude, and local Ollama models all work.
- Self-hostable. The required FastAPI microservice ships with a Dockerfile and a DDEV sidecar compose file in the module repository. It has no external SaaS dependency.
- Entity map table. A lightweight
ai_pageindex_entity_maptable maps Drupal Search API item IDs to PageIndex document IDs. The table handles deletions and re-indexing cleanly.
When to use this module: Good for Drupal sites that host structured documents (PDFs, reports, manuals, legal filings, policy documents) where visitors ask natural-language questions and need answers with explicit section or page citations. It works best on content with clear structural hierarchy. It is not a replacement for general keyword search across standard short-form node content.
Post-installation
Installation requires two parts: the Drupal module and the PageIndex FastAPI microservice. The microservice source code (main.py, Dockerfile, requirements.txt) lives in the pageindex-service/ directory of the module's git repository.
Part 1: Start the microservice
1. Add your LLM API key to the DDEV environment
Edit .ddev/docker-compose.pageindex.yaml and set your key under environment:
# .ddev/docker-compose.pageindex.yaml environment: PAGEINDEX_WORKSPACE_ROOT: /data/workspaces LLM_MODEL: gpt-4o-2024-11-20 RETRIEVE_MODEL: gpt-4o-2024-11-20 OPENAI_API_KEY: "sk-..." # or ANTHROPIC_API_KEY, etc. PAGEINDEX_API_KEY: "" # set a secret token in production
2. Start the service
ddev restart
The service starts at http://pageindex:8765 inside the DDEV network.
3. Confirm it is running
ddev exec curl -s http://pageindex:8765/health # expected: {"status":"ok"}
Self-hosted / production: Use docker compose up -d with the included Dockerfile. Expose the service only to your Drupal application server, place it behind HTTPS, and set a strong PAGEINDEX_API_KEY bearer token.
Part 2: Configure the module
Go to Administration → Configuration → AI → VDB Providers → AI PageIndex (/admin/config/ai/vdb_providers/ai_pageindex) and set:
- Service URL:
http://pageindex:8765for DDEV local; your Docker service hostname in production. No trailing slash. - API bearer token: select the Key module entry holding your
PAGEINDEX_API_KEYvalue (leave blank to disable auth in development). - Indexing model and Retrieval model: LiteLLM-format model strings, e.g.
gpt-4o-2024-11-20,anthropic/claude-sonnet-4-6, orollama/mistral.
A Service status: Connected / Unreachable badge confirms the microservice is reachable from Drupal.
Part 3: Create a Search API server and index
- Go to Administration → Configuration → Search API → Add server. Choose AI Search as the backend, then choose PageIndex (Vectorless Reasoning) as the vector database provider.
- Create a Search API index pointing at your content datasource. Add the fields you want searchable (Title, Body, etc.) and set their Indexing option: Main content for body text, Contextual content for title.
- Run indexing:
drush search-api:indexor via cron. Each item's text is sent to the microservice, which builds a tree index and stores adoc_idin theai_pageindex_entity_maptable. - Expose search via a Drupal View using Index [your index] as the data source with a fulltext search exposed filter, or use the
ai_search_blocksubmodule of the AI module for a ready-made Q&A block.
Requirements
Drupal modules
- AI module: provides the VDB provider plugin system and the
ai_searchsubmodule used as the Search API backend. - Search API: required for the Search API backend integration.
- Key: stores the microservice bearer token.
External infrastructure
- PageIndex FastAPI microservice: a Python service that must run alongside Drupal. Source code, Dockerfile, and DDEV sidecar compose file are in the module's git repository under
pageindex-service/. Requires Python 3.11+, FastAPI, and LiteLLM. - An LLM provider API key for tree-building and reasoning. OpenAI (
gpt-4orecommended), Anthropic, or a self-hosted Ollama instance for fully offline deployments.
Recommended modules
- AI Search Block (
ai_search_blocksubmodule of the AI module): a ready-made Q&A block that works with any VDB provider including this one. - LiteLLM AI Provider: if you run a LiteLLM proxy, this lets Drupal route all LLM calls through it.
Fully offline / air-gapped deployments: Ollama running a local model (e.g. Mistral, Llama 3) removes all dependency on external APIs. Set LLM_MODEL=ollama/mistral in the microservice environment and point OLLAMA_BASE_URL at your Ollama container.
How AI PageIndex differs from embedding-based VDB providers
The AI module ecosystem includes VDB providers for traditional vector database backends (Pinecone, Weaviate, Milvus, Chroma). These store document chunks as high-dimensional embedding vectors and retrieve results using mathematical similarity. AI PageIndex works differently:
Capability Embedding-based VDB providers AI PageIndex Vector database required Yes No Embeddings generated at index time Yes No Retrieval mechanism Cosine / inner-product similarity LLM reasoning over document tree Section / page citations in results No Yes Why a result is relevant Mathematical distance score only LLM explains its reasoning Best for High-volume short-form content Structured long-form documents FinanceBench accuracy (VectifyAI) ~60-75% 98.7% Query latency ~100-500 ms 2-12 s (LLM-bound) Index time per large document Seconds Minutes (LLM-bound)AI PageIndex is not a replacement for general keyword search or high-volume real-time retrieval. Use it when accuracy on structured documents and traceable citations matter more than speed.
Supporting this module
This module is maintained as a personal contribution project. There is no funding page at this time.
Testing it, filing issues in the queue, and submitting merge requests are the most useful contributions.
AI Disclosure.AI coding assistance helped write this module. The plugin structure, FastAPI microservice, and documentation were largely AI-generated. All decisions about what to build, code review, and testing were done by the code committer.