ai_autoevals
Automated factuality evaluation of AI responses for Drupal 10/11.
⚠️ Not Production Ready
This module is currently under development and is not ready for production use. Use only for testing and evaluation purposes.
What This Module Does
AI AutoEvals evaluates the factual accuracy of AI-generated responses using a two-step LLM process:
- Fact Extraction - Analyzes user input to determine what a correct answer should contain
- Response Evaluation - Compares the AI response against extracted criteria
Evaluation criteria are derived solely from the user's question and context, not from the AI response itself, ensuring objective factuality checking.
Installation
composer require drupal/ai_autoevals
drush en ai_autoevals
Configure at /admin/config/ai/autoevals and view results at /admin/content/ai-autoevals.
Usage
Quick Start
- Configure provider - Set the AI provider and model for evaluations at
/admin/config/ai/autoevals - Enable auto-tracking - Check "Auto-track requests" to evaluate all AI responses automatically
- Process evaluations - Run
drush queue:run ai_autoevals_evaluation_worker - View results - Check the dashboard at
/admin/content/ai-autoevals
Manual Tracking
Tag specific requests for evaluation when auto-tracking is disabled:
$ai_provider->chat($input, $model, ['ai_autoevals:track' => TRUE]);
Add context via tags:
$ai_provider->chat($input, $model, [
'ai_autoevals:track' => TRUE,
'category' => 'support',
]);
Evaluation Scores
Score Meaning Description 1.0 Exact Match Response fully meets expected criteria 0.6 Superset Response includes all expected info plus more 0.4 Subset Response has some expected info but missing some 0.0 Disagreement Response contradicts expected factsAdvanced Usage
Evaluation Sets
Create evaluation sets to customize evaluation behavior for different content types:
- Operation types: Which AI operations to evaluate (chat, chat_completion)
- Fact extraction method: AI-generated, rule-based, or hybrid
- Custom knowledge: Domain-specific context for specialized extraction
- Custom prompts: Override evaluation prompts per set
- Keywords: Trigger evaluations based on query/response keywords
- Scoring: Customize score mapping per evaluation set
Programmatic Creation
Use the builder pattern to create evaluation sets:
use Drupal\ai_autoevals\Entity\EvaluationSet;
$set = EvaluationSet::builder('weather_eval', 'Weather Evaluation')
->withDescription('Evaluates weather-related AI responses')
->forOperations(['chat'])
->triggerOnKeywords(['weather', 'forecast'])
->withCustomKnowledge('Current temp: 72°F, Humidity: 45%')
->withFactExtractionMethod('ai_generated')
->build();
Services
Access core services programmatically:
// Create and queue evaluations
$manager = \Drupal::service('ai_autoevals.evaluation_manager');
$evaluation = $manager->createEvaluation([...]);
$manager->queueEvaluation($evaluation->id());
// Extract facts
$extractor = \Drupal::service('ai_autoevals.fact_extractor');
$facts = $extractor->extractFacts($input, $context, $evaluationSet);
// Evaluate responses
$evaluator = \Drupal::service('ai_autoevals.evaluator');
$result = $evaluator->evaluate($evaluationSet, $input, $facts, $output);
Events
React to evaluation lifecycle events:
ai_autoevals.pre_evaluation- Before evaluation is sent to LLMai_autoevals.post_evaluation- After evaluation completesai_autoevals.evaluation_failed- When evaluation fails
Hooks
Filter evaluation sets via hook_ai_autoevals_evaluation_sets_alter() for conditional evaluation based on language, user roles, or custom business rules.
Custom Fact Extractors
Create plugins for specialized extraction:
/**
* @FactExtractor(
* id = "my_custom",
* label = @Translation("My Custom Extractor")
* )
*/
class MyCustomExtractor extends FactExtractorPluginBase {
public function extract(string $input, array $context = []): array {
// Custom logic
}
}
Documentation
bug reports and feature suggestions
- To submit bug reports and feature suggestions, or to track changes visit:
https://www.drupal.org/project/issues/ai_autoevals
License
GPL-2.0-or-later