Drupal is a registered trademark of Dries Buytaert

ai_autoevals

1 sites No security coverage
View on drupal.org

Automated factuality evaluation of AI responses for Drupal 10/11.

⚠️ Not Production Ready

This module is currently under development and is not ready for production use. Use only for testing and evaluation purposes.

What This Module Does

AI AutoEvals evaluates the factual accuracy of AI-generated responses using a two-step LLM process:

  1. Fact Extraction - Analyzes user input to determine what a correct answer should contain
  2. Response Evaluation - Compares the AI response against extracted criteria

Evaluation criteria are derived solely from the user's question and context, not from the AI response itself, ensuring objective factuality checking.

Installation

composer require drupal/ai_autoevals
drush en ai_autoevals

Configure at /admin/config/ai/autoevals and view results at /admin/content/ai-autoevals.

Usage

Quick Start

  1. Configure provider - Set the AI provider and model for evaluations at /admin/config/ai/autoevals
  2. Enable auto-tracking - Check "Auto-track requests" to evaluate all AI responses automatically
  3. Process evaluations - Run drush queue:run ai_autoevals_evaluation_worker
  4. View results - Check the dashboard at /admin/content/ai-autoevals

Manual Tracking

Tag specific requests for evaluation when auto-tracking is disabled:

$ai_provider->chat($input, $model, ['ai_autoevals:track' => TRUE]);

Add context via tags:

$ai_provider->chat($input, $model, [
  'ai_autoevals:track' => TRUE,
  'category' => 'support',
]);

Evaluation Scores

Score Meaning Description 1.0 Exact Match Response fully meets expected criteria 0.6 Superset Response includes all expected info plus more 0.4 Subset Response has some expected info but missing some 0.0 Disagreement Response contradicts expected facts

Advanced Usage

Evaluation Sets

Create evaluation sets to customize evaluation behavior for different content types:

  • Operation types: Which AI operations to evaluate (chat, chat_completion)
  • Fact extraction method: AI-generated, rule-based, or hybrid
  • Custom knowledge: Domain-specific context for specialized extraction
  • Custom prompts: Override evaluation prompts per set
  • Keywords: Trigger evaluations based on query/response keywords
  • Scoring: Customize score mapping per evaluation set

Programmatic Creation

Use the builder pattern to create evaluation sets:

use Drupal\ai_autoevals\Entity\EvaluationSet;

$set = EvaluationSet::builder('weather_eval', 'Weather Evaluation')
  ->withDescription('Evaluates weather-related AI responses')
  ->forOperations(['chat'])
  ->triggerOnKeywords(['weather', 'forecast'])
  ->withCustomKnowledge('Current temp: 72°F, Humidity: 45%')
  ->withFactExtractionMethod('ai_generated')
  ->build();

Services

Access core services programmatically:

// Create and queue evaluations
$manager = \Drupal::service('ai_autoevals.evaluation_manager');
$evaluation = $manager->createEvaluation([...]);
$manager->queueEvaluation($evaluation->id());

// Extract facts
$extractor = \Drupal::service('ai_autoevals.fact_extractor');
$facts = $extractor->extractFacts($input, $context, $evaluationSet);

// Evaluate responses
$evaluator = \Drupal::service('ai_autoevals.evaluator');
$result = $evaluator->evaluate($evaluationSet, $input, $facts, $output);

Events

React to evaluation lifecycle events:

  • ai_autoevals.pre_evaluation - Before evaluation is sent to LLM
  • ai_autoevals.post_evaluation - After evaluation completes
  • ai_autoevals.evaluation_failed - When evaluation fails

Hooks

Filter evaluation sets via hook_ai_autoevals_evaluation_sets_alter() for conditional evaluation based on language, user roles, or custom business rules.

Custom Fact Extractors

Create plugins for specialized extraction:

/**
 * @FactExtractor(
 *   id = "my_custom",
 *   label = @Translation("My Custom Extractor")
 * )
 */
class MyCustomExtractor extends FactExtractorPluginBase {
  public function extract(string $input, array $context = []): array {
    // Custom logic
  }
}

Documentation

bug reports and feature suggestions

License

GPL-2.0-or-later

Activity

Total releases
2
First release
Jan 2026
Latest release
1 month ago
Release cadence
0 days
Stability
0% stable

Releases

Version Type Release date
1.0.x-dev Dev Jan 18, 2026
1.0.0-alpha1 Pre-release Jan 18, 2026