⚠️ This module is no longer actively developed.

Development has moved to AI Provider Universal, which supports llama.cpp along with Ollama, vLLM, LM Studio, LiteLLM, Fireworks, OpenAI and any other OpenAI-compatible server — with multiple server instances, models as config entities and per-model capability overrides.

No new features will be added here. Only critical security fixes will be considered until a stable release of AI Provider Universal is available.

Existing users: you can keep using this module for now, but we recommend migrating to AI Provider Universal. Configure your llama.cpp server at Configuration → AI → AI Servers (/admin/config/ai/providers/universal).

llama.cpp AI Provider (Multi-instance)

1.2.0-beta1 is now available! This is the first beta of the 1.2 series with full multi-server support.

A powerful, multi-instance provider for the AI module. While built with llama.cpp as its primary focus, it natively supports any OpenAI-compatible /v1 server (Ollama, vLLM, LiteLLM, LM Studio, etc.).

Key Features

Multi-server architecture: Configure and use multiple servers at the same time. Each server appears as an independent provider (e.g. llama_cpp:gpu, llama_cpp:embeddings).
Secure credentials: API keys are stored using the Key module (no more plaintext).
Broad operation support: Chat, Embeddings, Speech-to-Text, Rerank, Moderation (LlamaGuard3 + ShieldGemma with multiple safety policies), and Text-to-Image.
Per-server control: Model filtering with glob patterns, manual capability overrides, and configurable timeout + authentication per server.
Smart model detection: Auto-detects capabilities from server metadata, Hugging Face tags, and name heuristics.
Robust & cache-friendly: Model lists and types are cached per server in Drupal State.

Supported Operations

Chat completions (/v1/chat/completions)
Embeddings (/v1/embeddings)
Speech to Text (Whisper and similar)
Rerank
Moderation (native LlamaGuard3 and ShieldGemma support)
Text to Image (/v1/images/generations)

Installation

Install via Composer:

composer require drupal/ai_provider_llama_cpp

Important: Composer installs the latest stable release by default (currently 1.0.0). To install the new 1.2.0-alpha1 (recommended for new projects), run:

composer require 'drupal/ai_provider_llama_cpp:^2.0@alpha'

Then enable the module:

drush pm:enable ai_provider_llama_cpp

Requirements

Drupal 10.2, 11 or 12
AI module ^1.2
Key module ^1.18 (for secure API key storage)
An OpenAI-compatible server (Ollama, vLLM, llama.cpp, LiteLLM...)

See the release notes for full details and migration information from previous versions.

Note: This is a beta release. Please test in development/staging

Version	Type	Release date
2.0.0-alpha1	Pre-release	Jul 1, 2026
1.2.0-beta1	Pre-release	Jun 22, 2026
1.2.x-dev	Dev	Jun 22, 2026
1.1.0-beta1	Pre-release	Jun 15, 2026
1.0.0	Stable	Jun 1, 2026

llama.cpp

llama.cpp AI Provider (Multi-instance)

Key Features

Supported Operations

Installation

Requirements

Activity

Release Timeline

Releases