ai_provider_llama_cpp
This module integrates llama.cpp with the AI module for Drupal, enabling local and self-hosted AI inference without any external service or API key.
It connects to a running llama-server instance through its OpenAI-compatible /v1 HTTP API, which means it works with any model in GGUF format that llama.cpp supports.
Supported operations:
- Chat completions
- Embeddings
Features
- Auto-discovers available models from the server at /v1/models
- Caches the model list in Drupal State so the site stays functional if the server is temporarily offline
- No API key required — designed for local development and self-hosted deployments
- Works in DDEV and Docker environments via http://host.docker.internal
Additional Requirements
- AI module 1.2 or later
- A running llama-server (included in llama.cpp, default port 8080)