Drupal is a registered trademark of Dries Buytaert

ai_single_page_importer

5 sites No security coverage
View on drupal.org

AI-powered content import module for Drupal 11+ that automatically extracts and populates node fields from external URLs using OpenAI.

AI-created

Be aware that most of the code in this module was created by AI prompts, using Claude Sonnet 4.5.

Why would I need this?

This module is useful when you need to:

  • Migrate content efficiently - Save time by automatically extracting and formatting content instead of manual copy-pasting
  • Republish content from other sources - Import articles, blog posts, or news from external websites into your Drupal site
  • Curate content collections - Build a content library by importing relevant articles from across the web
  • Create summaries or references - Quickly populate fields with extracted metadata (title, intro, tags) from external content
  • Streamline editorial workflows - Enable content editors to import and adapt external content without technical knowledge

Features

  • Intelligent Content Extraction: Uses AI to analyze web pages and extract structured content
  • Dynamic Field Mapping: Automatically maps extracted content to appropriate Drupal fields
  • Security Focused: URL validation, domain blacklisting, HTML sanitization, and flood control
  • Configurable: Admin interface for AI model selection, content limits, and security settings
  • CKEditor 5 Integration: Seamlessly populates formatted content into CKEditor fields
  • Permission-Based: Granular access control via Drupal permissions

Requirements

  • Drupal 10.1+ or 11
  • PHP 8.1+
  • AI module (drupal/ai)
  • An AI provider module (e.g., drupal/ai_provider_openai, drupal/ai_provider_anthropic)
  • API key configured for your chosen provider

Installation

  1. Enable the module:

    ddev drush pm:enable ai_single_page_importer
    
  2. Grant permissions at /admin/people/permissions:

    • "Use AI Single Page Importer" - for content editors
    • "Administer AI Single Page Importer settings" - to access and configure settings
  3. Configure at /admin/config/content/ai-content-importer:

    • Select AI model (gpt-4o recommended)
    • Set content length limits
    • Configure allowed content types
    • Add blocked domains

Usage

  1. Create or edit a node (Article, Page, etc.)
  2. Expand the "AI Content Import" section at the top
  3. Paste the URL of the content you want to import
  4. Click "Import Content with AI"
  5. Review and adjust the populated fields
  6. Save the node

Security Features

URL Validation

  • Whitelist: Only HTTP/HTTPS URLs allowed
  • Blacklist: Private IP ranges blocked (127.0.0.1, 10.x.x.x, 192.168.x.x, etc.)
  • Domain filtering: Configurable domain blacklist (supports wildcards)

Rate Limiting

  • Flood control: 5 requests per hour per user
  • Prevents API abuse and cost overruns

Content Sanitization

  • HTML purification for long text fields
  • Tag stripping for short text fields
  • Removal of dangerous attributes (onclick, javascript:, etc.)
  • CKEditor-compatible output

Error Handling

  • Sanitized error messages (no sensitive info exposed)
  • Generic messages for API/configuration errors
  • Comprehensive logging for debugging

Configuration Options

Content Types

  • Allowed Content Types: Select which node types can use the AI import feature
  • If no content types are selected, the feature is available on all content types
  • Useful for restricting AI imports to specific types like Articles or Blog posts
  • Configured as checkboxes at /admin/config/content/ai-content-importer

Content Limits

  • Max Content Length: Character limit sent to AI (default: 30,000)
  • Request Timeout: HTTP timeout in seconds (default: 30)

Rate Limits

  • Maximum Requests: Number of import requests allowed per time window (default: 5)
  • Time Window: Time period in seconds for rate limiting (default: 3600 - 1 hour)
  • Once the limit is reached, users must wait for the time window to expire before making additional requests, or configure a higher limit on the settings page.

Access Control

  • Allowed Content Types: Limit which content types show the import feature (empty = all)
  • Domain Blacklist: Block specific domains from importing (one per line, wildcards supported)

Field Type Support

The module automatically detects and populates these field types:

Field Type Category Handling text_long, text_with_summary Long Text Full HTML content with formatting string, text Short Text Brief summaries (1-2 sentences) entity_reference (taxonomy) Taxonomy Array of suggested terms datetime, daterange, timestamp Date ISO 8601 format link Link URL strings

Architecture

Services

  • ai_single_page_importer.content_extractor

    • Fetches and extracts content using AI
    • Sanitizes HTML output
    • Maps fields dynamically
  • ai_single_page_importer.url_validator

    • Validates URLs for security
    • Checks domain blacklist
    • Blocks private IP ranges
  • ai_single_page_importer.html_sanitizer

    • Removes dangerous HTML/JavaScript
    • Preserves safe formatting tags
    • CKEditor-compatible output

Troubleshooting

"Rate limit exceeded"

Wait until the time window expires or adjust the rate limiting settings at /admin/config/content/ai-content-importer (default: 5 requests per hour).

"Domain is blocked"

Check domain blacklist configuration. Remove the domain or use a different source URL.

"AI service configuration error"

Verify OpenAI API key is configured correctly in the AI module settings.

Fields not populating

  1. Check browser console for JavaScript errors
  2. Verify field types are supported
  3. Check logs: ddev drush watchdog:show --count=50

Content quality issues

  • Try switching to gpt-4o model for better extraction
  • Increase max content length for longer articles
  • Some pages with heavy JavaScript may not extract well

Performance Considerations

  • Each import makes an HTTP request to fetch the page
  • AI processing typically takes 5-15 seconds
  • Larger pages are truncated to configured max length
  • Consider using gpt-4o-mini for cost savings on simple content

Logging

All operations are logged to the ai_single_page_importer channel:

# View recent logs
ddev drush watchdog:show --channel=ai_single_page_importer

# View with severity filter
ddev drush watchdog:show --channel=ai_single_page_importer --severity=Error

Development

Code Quality

# PHPStan analysis
ddev exec vendor/bin/phpstan analyze web/modules/custom/ai_single_page_importer --level=1

# Coding standards (if phpcs is installed)
ddev exec vendor/bin/phpcs --standard=Drupal web/modules/custom/ai_single_page_importer

Extending

To add support for custom field types, implement hook_ai_single_page_importer_field_map_alter() in your custom module:

/**
 * Implements hook_ai_single_page_importer_field_map_alter().
 */
function mymodule_ai_single_page_importer_field_map_alter(array &$field_map, array $field_definitions) {
  // Add support for custom field types.
  foreach ($field_definitions as $field_name => $field_definition) {
    $field_type = $field_definition->getType();

    // Example: Add support for a custom color picker field.
    if ($field_type === 'color_field_type') {
      $field_map[$field_name] = [
        'label' => (string) $field_definition->getLabel(),
        'type' => $field_type,
        'category' => 'color', // Custom category
      ];
    }

    // Example: Override existing field mapping.
    if ($field_name === 'field_custom_summary') {
      $field_map[$field_name]['category'] = 'long_text'; // Change from short to long
    }
  }
}

The category determines how the AI extracts and formats the content:

  • long_text - Full HTML content with formatting
  • short_text - Brief summaries (1-2 sentences)
  • taxonomy - Array of suggested terms
  • date - ISO 8601 format
  • link - URL strings
  • Custom categories - Define your own extraction logic

Activity

Total releases
4
First release
Jan 2026
Latest release
1 month ago
Release cadence
0 days
Stability
0% stable

Release Timeline

Releases

Version Type Release date
1.0.0-alpha3 Pre-release Jan 17, 2026
1.0.0-alpha2 Pre-release Jan 17, 2026
1.0.0-alpha1 Pre-release Jan 17, 2026
1.x-dev Dev Jan 17, 2026