Reinforcement Learning (RL) is an A/B and multivariate testing framework for Drupal where every visitor click is treated as human feedback (RLHF-style). Each page view is a trial, each conversion is a reward, and the algorithm continuously shifts traffic to whichever variant is winning. No fixed test horizons. No manual winner picking. No third-party SaaS.

RL is part of the DXPR marketing CMS stack and ships in DXPR CMS.

What you can A/B test with RL

RL: A/B Test Views Content (rl_sorting): the order of items in any Drupal View
RL: A/B Test Page Titles (rl_page_title, bundled): page titles for nodes, View pages, and any controller
RL: A/B Test Menu Links (rl_menu_link, bundled): labels in any menu link
DXPR Builder integration: variant slots inside builder blocks

Where experimentation fits in your AI workflow

Human review catches what's obviously off-brand, off-message, or factually wrong. It can't catch what merely fails to convert; only visitors can do that. Harvard Business School research on enterprise gen AI concludes that "designing targeted experiments and using scientific methods to test, refine, and scale promising solutions" is the layer between review and full rollout.

RL is that layer for Drupal. After your team approves a variant, RL tests it in production against the alternatives, shifts traffic toward what works, and sends the result back to the report. Variants can be hand-written, AI-generated, or both; RL is indifferent to authorship.

Source: Berndt et al., A Systematic Approach to Experimenting with Gen AI, Harvard Business Review, January-February 2026.

Why RL instead of fixed-horizon A/B testing?

Traditional A/B tests run for a fixed window (say two weeks) and split traffic 50/50 the whole time, even when one variant is obviously losing. RL turns the experiment into a feedback loop: every click adjusts the model, traffic shifts toward the leader as soon as evidence emerges, and the test never has to "end". You can run dozens or thousands of variants simultaneously (true multivariate testing), and a newly added variant is in play on the next render with no manual setup.

How it works

RL uses a multi-armed bandit (Thompson Sampling). Each variant has a reward distribution; on each render the algorithm samples from the distributions and picks the highest sample. Wins update the distribution toward higher rewards; losses update toward lower. The math: ThompsonCalculator.php.

Features

Multivariate by default: 2 to thousands of variants, no manual configuration
Real-time RLHF loop: visitor clicks update the model on every page
Fast HTTP REST API: optimized JSON endpoint for tracking and decisions
Admin reports: per-experiment performance, traffic, and confidence
Service-based architecture: extensible decorators, custom variant selectors
Data sovereignty: no cloud, no third-party SaaS, all data stays in your Drupal database
GDPR-friendly tracking: only anonymous interaction counts, no user IDs or cookies

You need RL if

You want to A/B or multivariate test any part of your site without third-party SaaS
You want continuous optimization rather than fixed-horizon experiments
You want to add or remove variants on the fly without restarting tests
You want a core API you can call from any module, View, or block

Prefer a turnkey demo site?

Spin up DXPR CMS, Drupal pre-configured with DXPR Builder, DXPR Theme, RL, and security best practices.

Get DXPR CMS »

Installation

composer require drupal/rl
drush en rl

Verify rl.php access

RL ships an .htaccess file that allows direct access to rl.php (same pattern as Drupal core's statistics.php). Test it:

curl -X POST -d "action=ping" http://example.com/modules/contrib/rl/rl.php

If the test fails:

Apache: ensure .htaccess files are processed (AllowOverride All)
Nginx: copy the rewrite rules from .htaccess to your server config
Security modules: whitelist /modules/contrib/rl/rl.php

If server policies prevent direct access, use the Drupal Routes API instead.

Drush command reference

Category Commands Description Discovery rl:list, rl:status, rl:performance, rl:trends List A/B tests, check phase/confidence, arm-level stats, historical trends Analysis rl:analyze, rl:export Full analysis with recommendations, export experiment data Experiment CRUD rl:experiment:create, rl:experiment:update, rl:experiment:delete Create, update, and delete A/B tests with --dry-run support Configuration rl:config:get, rl:config:set, rl:config:list, rl:config:reset Get/set module settings, list all with current values, reset to defaults Setup rl:setup-ai Install AI assistant skill files for Claude Code, Codex, Gemini, Copilot, Cursor

AI coding assistant integration

RL ships a built-in Agent Skills file that teaches AI coding assistants how to manage A/B tests through natural language. Compatible with Claude Code, Codex CLI, Gemini CLI, GitHub Copilot, Cursor, and any tool supporting the standard.

After installing the module, run drush rl:setup-ai to enable AI assistant support. Your assistant will then respond to prompts like:

"List all running A/B tests"
"Analyze the hero_cta_test experiment"
"Create a new A/B test for the homepage banner"
"What's the conversion rate for variant B?"

API

// Get the experiment manager
$experiment_manager = \Drupal::service('rl.experiment_manager');

// Record a trial (content shown)
$experiment_manager-&gt;recordTurn('my-experiment', 'variant-a');

// Record a reward (user clicked)
$experiment_manager-&gt;recordReward('my-experiment', 'variant-a');

// Get scores for the variants currently in play
$scores = $experiment_manager-&gt;getThompsonScores('my-experiment', NULL, ['variant-a', 'variant-b']);

// Pick a winner
$ts_calculator = \Drupal::service('rl.ts_calculator');
$best_arm = $ts_calculator-&gt;selectBestArm($scores);

JavaScript API

Attach the rl/api library to get Drupal.rl on the page:

Drupal.rl.turn('hero_cta', 'v0');
Drupal.rl.reward('hero_cta', 'v0');

Drupal.rl.decide('hero_cta', ['v0', 'v1', 'v2']).then(function (armId) {
  showVariant(armId);
});

All three methods feed a shared 500 ms batch window, so every A/B test on the page rides one POST to rl.php. See the README for the HTTP wire format and server-side patterns.

FAQ

Does RL store my A/B test's variants?

It stores their performance data, not the authoritative list. Every variant that has received traffic has a row in rl_arm_data with turn and reward counts. But "which variants are in play right now" is owned by your module, not RL.

Different consumer modules keep the live variant list in different places:

rl_sorting: the content returned by a View
rl_page_title: fields on a content entity
rl_menu_link: labels on a menu link
DXPR Builder: slots inside a block component

On each call your module passes its current list (getThompsonScores($id, NULL, $arms) in PHP or Drupal.rl.decide(id, arms) in JS) and RL matches it against the stored stats to pick a winner. A newly added variant is in play on the next render; a removed one stops appearing. No second save step can drift out of sync with your module's UI.

When do I pick a winner and end an A/B test?

Only when you want to. RL has no fixed horizon and no significance gate to wait out. It just shifts traffic to whatever variant is winning right now and keeps adapting as evidence changes.

Two patterns, depending on what you're testing:

Converging tests: a better page title, a clearer checkout button, a stronger hero image. Once the report shows a confident winner, lock it in and move on.
Evergreen experiments: blog post lists, banner ads that fade as returning visitors tune them out, seasonal calls to action. Leave them running. RL follows the winner as it shifts.

In both cases the loser of a pair just stops receiving traffic on its own, so there's no urgency to declare a winner by hand. If you're used to fixed-horizon A/B tools, this is the biggest mental shift: there's no "test complete" flag to chase.

Related modules

RL: A/B Test Views Content (rl_sorting): A/B test the order of any Drupal View
Analyze: content analysis and quality scoring for Drupal
AI Content Strategy: AI-driven content strategy recommendations

Version	Type	Release date
1.1.7	Stable	Jun 30, 2026
1.1.6	Stable	May 15, 2026
1.1.5	Stable	May 7, 2026
1.1.4	Stable	May 7, 2026
1.1.3	Stable	May 6, 2026
1.1.2	Stable	May 6, 2026
1.1.1	Stable	May 6, 2026
1.1.0	Stable	May 6, 2026
1.0.0	Stable	Apr 6, 2026
1.0.0-rc1	Pre-release	Jan 29, 2026
1.0.0-beta7	Pre-release	Aug 25, 2025
1.0.0-beta6	Pre-release	Aug 25, 2025
1.0.0-beta5	Pre-release	Aug 21, 2025
1.0.0-beta3	Pre-release	Aug 11, 2025
1.0.0-beta2	Pre-release	Aug 5, 2025
1.x-dev	Dev	Aug 5, 2025
1.0.0-beta1	Pre-release	Aug 5, 2025

Reinforcement Learning (or A/B & Multivariate Testing)