Drupal is a registered trademark of Dries Buytaert
drupal 10.6.8 Update released for Drupal core (10.6.8)! drupal 11.3.9 Update released for Drupal core (11.3.9)! drupal 11.3.8 Update released for Drupal core (11.3.8)! drupal 11.3.7 Update released for Drupal core (11.3.7)! drupal 11.2.11 Update released for Drupal core (11.2.11)! drupal 10.6.7 Update released for Drupal core (10.6.7)! drupal 10.5.9 Update released for Drupal core (10.5.9)! cms 2.1.1 Update released for Drupal core (2.1.1)! drupal 11.3.6 Update released for Drupal core (11.3.6)! drupal 10.6.6 Update released for Drupal core (10.6.6)! cms 2.1.0 Update released for Drupal core (2.1.0)! linkit 7.0.14 Minor update available for module linkit (7.0.14). diff 2.0.0 Major update available for module diff (2.0.0). masquerade 8.x-2.2 Minor update available for module masquerade (8.x-2.2). video_embed_field 3.1.0 Minor update available for module video_embed_field (3.1.0). bootstrap 8.x-3.40 Minor update available for theme bootstrap (8.x-3.40). menu_link_attributes 8.x-1.7 Minor update available for module menu_link_attributes (8.x-1.7). domain 3.0.1 Minor update available for module domain (3.0.1). leaflet 10.4.8 Minor update available for module leaflet (10.4.8). single_content_sync 1.4.15 Minor update available for module single_content_sync (1.4.15).

Overview

Utility for populating content entities from HTML using plugins.

On it's own, this module does nothing. It is a tool to assist with writing custom code for tasks such as migration, where there is a need to do something to an entity based on the contents of an HTML document.

To use this module you will need to write one or more plugins, and then call the service to execute the code within these plugins.

There is no user interface.

Usage

Step 1 - write a plugin

A plugin has the responsibility of taking a parsed HTML document and populating an entity in some way. Plugins typically deal with populating a single content type, or populating a field across multiple content types. However this module does not place any restrictions on what a plugin can do.

Example:

#[HtmlToEntity('node_title')]
class SetNodeTitleFromH1 extends HtmlToEntityPluginBase {

  // A plugin decides whether to act on a particular entity.
  // In this case, we can set the title of any node.
  public function appliesToEntity(ContentEntityInterface $entity): bool {
    return $entity->getEntityTypeId() === 'node';
  }

  // A plugin decides whether to act upon or to ignore a document
  // based on the document's URI. In this case we only act on a
  // subset of pages within a website being scraped.
  public function appliesToUri(string $uri): bool {
    return str_starts_with($uri, 'https://www.example.com/news/');
  }

  // A plugin takes a document and does something to the entity.
  // In this case we use the text within the <h1> for the node's title.
  public function populate(ContentEntityInterface $entity, HTMLDocument $document): void {
    $h1_text = // extract the content of the <h1> element
    if ($h1_text) {
      $entity->setTitle($h1_text);
    }
  }

}

Step 2 - applying plugins

Call the service with an existing entity (which may or may not be already saved) and an HTML document:

$entity   = ... ;  // load or create a ContentEntityInterface
$document = ... ;  // obtain an HTMLDocument object
$logger   = ... ;  // optional LoggerInterface. If set, this is passed to plugins

\Drupal::service(\Drupal\html_to_entity\HtmlToEntityInterface::class)
  ->populate($entity, $document, $logger)

Requirements

PHP 8.4 or higher.

Related modules

You may find it useful to combine this module with:

Activity

Total releases
1
First release
May 2026
Latest release
1 day ago
Release cadence
Stability
0% stable

Releases

Version Type Release date
1.0.x-dev Dev May 8, 2026