postlight_parser

The Link content parser module extracts essential information from any URL you give it.
That includes article content, titles, authors, excerpts, lead images, and more.

It will be useful when you want to copy content from url. you have to copy title, save image, summary etc.. this module process for you quickly

Post-Installation

Install module with composer it will add library needed to vendor.
run

composer require drupal/postlight_parser

It will install all libraries right way. If you have problem with composer you can install separed readability

composer require fivefilters/readability.php
composer require drupal/postlight_parser

If your server supports nodejs you can:

Install Postlight Parser (The Postlight parser is not bad, the content received is clear)
```
yarn global add @postlight/parser
or
npm -g install @postlight/parser
```
Or Install mercury Parser in your environment.
yarn global add parse-server
Or Install Postlight / Mercury Parser api

Otherwise, the module will get content by PHP readability from url (it works fine in most of the case).

If you want to use library Graby You have to install php-tidy on your server manual
```
sudo apt-get install php-tidy
composer require j0k3r/graby
```
Support Embera for media page like youtube, dailymotion, tiktok (Ckeditor5 must active iframe, blockquote,...)

Configuration

Create link fields
In widget select Postlight parser (PHP readability is default)
Map yours field to get extracted content
Ckeditor 5 support you can add "Url get Content" button in ckeditor 5 configuration
Save images inline option will save all images in content from url to public://inline-images, it can also save image base64. This option dosen't work with breakpoints picture
If you want to embed a podcast audio, video from social media, you can enable tag iframe (youtube, dailymotion), blockquote (tiktok)...

For developper
You can use /parser/readability?url=LINK_TO_GET_CONTENT
it will return json {title,content,excerpt,lead_image_url,author}
For crawling content you can use service postlight_parser.url_parser
Example:

    $argument = [
      'url' => 'https://lemonde.fr/article_xxx.html',
      'parser' => 'readability', // graby, postlight, mercury, embera
      'save_image' => TRUE,
    ];
    $output = \Drupal::service('postlight_parser.url_parser')->parser($argument);
    if (!empty($article = $output['data'])) {
       $new_article = Node::create(['type' => 'article']);
    $new_article->set('title', $article['title']);
    $new_article->set('body', $article['content']);
    $new_article->enforceIsNew();
    $new_article->save();
    }

In progress:
- Insert image / video into entity widget field image (missing remove button, please help)

Do you like this module? Show your appreciation by buying me ☕.

Post-Installation

Configuration

Activity

Releases