Drupal is a registered trademark of Dries Buytaert

postlight_parser

9 sites Security covered
View on drupal.org

The Link content parser module extracts essential information from any URL you give it.
That includes article content, titles, authors, excerpts, lead images, and more.

It will be useful when you want to copy content from url. you have to copy title, save image, summary etc.. this module process for you quickly

Post-Installation

Install module with composer it will add library needed to vendor.
run

composer require drupal/postlight_parser

It will install all libraries right way. If you have problem with composer you can install separed readability

composer require fivefilters/readability.php
composer require drupal/postlight_parser

If your server supports nodejs you can:

  • Install Postlight Parser (The Postlight parser is not bad, the content received is clear)
    yarn global add @postlight/parser
    or
    npm -g install @postlight/parser
    
  • Or Install mercury Parser in your environment.
    yarn global add parse-server
  • Or Install Postlight / Mercury Parser api

Otherwise, the module will get content by PHP readability from url (it works fine in most of the case).

  • If you want to use library Graby You have to install php-tidy on your server manual
    sudo apt-get install php-tidy
    composer require j0k3r/graby
  • Support Embera for media page like youtube, dailymotion, tiktok (Ckeditor5 must active iframe, blockquote,...)

Configuration

  • Create link fields
  • In widget select Postlight parser (PHP readability is default)
  • Map yours field to get extracted content
  • Ckeditor 5 support you can add "Url get Content" button in ckeditor 5 configuration
  • Save images inline option will save all images in content from url to public://inline-images, it can also save image base64. This option dosen't work with breakpoints picture
  • If you want to embed a podcast audio, video from social media, you can enable tag iframe (youtube, dailymotion), blockquote (tiktok)...

For developper
You can use /parser/readability?url=LINK_TO_GET_CONTENT
it will return json {title,content,excerpt,lead_image_url,author}
For crawling content you can use service postlight_parser.url_parser
Example:

    $argument = [
      'url' => 'https://lemonde.fr/article_xxx.html',
      'parser' => 'readability', // graby, postlight, mercury, embera
      'save_image' => TRUE,
    ];
    $output = \Drupal::service('postlight_parser.url_parser')->parser($argument);
    if (!empty($article = $output['data'])) {
       $new_article = Node::create(['type' => 'article']);
    $new_article->set('title', $article['title']);
    $new_article->set('body', $article['content']);
    $new_article->enforceIsNew();
    $new_article->save();
    }

In progress:
- Insert image / video into entity widget field image (missing remove button, please help)

Do you like this module? Show your appreciation by buying me ☕.

Activity

Total releases
1
First release
Jun 2025
Latest release
8 months ago
Release cadence
Stability
100% stable

Releases

Version Type Release date
1.0.2 Stable Jun 20, 2025