Drupal is a registered trademark of Dries Buytaert
cms 2.1.0 Update released for Drupal core (2.1.0)! bootstrap 8.x-3.40 Minor update available for theme bootstrap (8.x-3.40). menu_link_attributes 8.x-1.7 Minor update available for module menu_link_attributes (8.x-1.7). editoria11y 2.2.22 Minor update available for module editoria11y (2.2.22). ai 1.2.13 Minor update available for module ai (1.2.13). ai 1.3.2 Minor update available for module ai (1.3.2). moderated_content_bulk_publish 2.0.51 Minor update available for module moderated_content_bulk_publish (2.0.51). moderated_content_bulk_publish 2.0.50 Minor update available for module moderated_content_bulk_publish (2.0.50). editoria11y 2.2.21 Minor update available for module editoria11y (2.2.21). eca 3.1.0 Minor update available for module eca (3.1.0). sophron 3.1.1 Minor update available for module sophron (3.1.1). ai 1.3.1 Minor update available for module ai (1.3.1). seven 2.0.0-beta6 New beta version released for theme seven (2.0.0-beta6). seven 1.0.1-beta1 First beta version released for theme seven (1.0.1-beta1). modal_page 5.1.11 Minor update available for module modal_page (5.1.11). block_exclude_pages 2.2.1 Minor update available for module block_exclude_pages (2.2.1). miniorange_saml 3.1.4 Minor update available for module miniorange_saml (3.1.4). eca_tamper 2.0.10 Minor update available for module eca_tamper (2.0.10). modeler_api 1.1.1 Minor update available for module modeler_api (1.1.1). turnstile 1.1.26 Minor update available for module turnstile (1.1.26).

ai_eval

No security coverage
View on drupal.org

AI Eval measures and improves the quality of your AI integrations in Drupal. Define test datasets in YAML, run them against your agents or any AI provider, and get scored results with pass/fail quality gates.

Two evaluation modes

Agent mode invokes ai_agents plugins end-to-end, testing the full loop: tool calls, reasoning, and response quality.

Chat mode sends prompts directly to any AI provider (Anthropic, OpenAI, Ollama). No agent framework needed. Useful for evaluating system prompts, RAG pipelines, Q&A bots, or classification tasks.

Pluggable graders

Ships with seven graders. Four are LLM-based judges (relevance, completeness, accuracy, actionability) that score responses on a 1-5 scale. Three are deterministic (format validation, route matching, structured field matching) and run locally with no API cost. You can add your own graders as plugins in any module.

Quality gates

Each target defines a threshold and a gate type. Hard gates fail the run if the score is below threshold. Soft gates log a warning. Use hard gates in CI to block deployments when quality drops.

Prompt optimization

When a target fails its gate, the optimizer analyzes the failure patterns and proposes an improved system prompt. Proposals can auto-apply or go through an admin review workflow before taking effect.

Admin UI

  • Configure targets, graders, and scoring thresholds
  • Browse eval run history with per-question breakdowns
  • Review and apply/reject optimization proposals
  • Settings for judge provider, rate limiting, and dataset paths

Drush commands

drush ai-eval:run Run all targets

drush ai-eval:optimize Optimize failing prompts

drush ai-eval:distill Summarize results as markdown

Requirements

Drupal 11.2+, PHP 8.3+, AI module. Optional: AI Agents for agent mode.

Activity

Total releases
2
First release
Apr 2026
Latest release
1 day ago
Release cadence
0 days
Stability
0% stable

Releases

Version Type Release date
1.0.0-alpha2 Pre-release Apr 5, 2026
1.0.0-alpha1 Pre-release Apr 5, 2026