postlight_parser
The Link content parser module extracts essential information from any URL you give it.
That includes article content, titles, authors, excerpts, lead images, and more.
It will be useful when you want to copy content from url. you have to copy title, save image, summary etc.. this module process for you quickly
Post-Installation
Install module with composer it will add library needed to vendor.
run
composer require drupal/postlight_parser
It will install all libraries right way. If you have problem with composer you can install separed readability
composer require fivefilters/readability.php
composer require drupal/postlight_parser
If your server supports nodejs you can:
- Install Postlight Parser (The Postlight parser is not bad, the content received is clear)
yarn global add @postlight/parser or npm -g install @postlight/parser - Or Install mercury Parser in your environment.
yarn global add parse-server - Or Install Postlight / Mercury Parser api
Otherwise, the module will get content by PHP readability from url (it works fine in most of the case).
- If you want to use library Graby You have to install php-tidy on your server manual
sudo apt-get install php-tidy composer require j0k3r/graby - Support Embera for media page like youtube, dailymotion, tiktok (Ckeditor5 must active iframe, blockquote,...)
Configuration
- Create link fields
- In widget select Postlight parser (PHP readability is default)
- Map yours field to get extracted content
- Ckeditor 5 support you can add "Url get Content" button in ckeditor 5 configuration
- Save images inline option will save all images in content from url to public://inline-images, it can also save image base64. This option dosen't work with breakpoints picture
- If you want to embed a podcast audio, video from social media, you can enable tag iframe (youtube, dailymotion), blockquote (tiktok)...
For developper
You can use /parser/readability?url=LINK_TO_GET_CONTENT
it will return json {title,content,excerpt,lead_image_url,author}
For crawling content you can use service postlight_parser.url_parser
Example:
$argument = [
'url' => 'https://lemonde.fr/article_xxx.html',
'parser' => 'readability', // graby, postlight, mercury, embera
'save_image' => TRUE,
];
$output = \Drupal::service('postlight_parser.url_parser')->parser($argument);
if (!empty($article = $output['data'])) {
$new_article = Node::create(['type' => 'article']);
$new_article->set('title', $article['title']);
$new_article->set('body', $article['content']);
$new_article->enforceIsNew();
$new_article->save();
}
In progress:
- Insert image / video into entity widget field image (missing remove button, please help)
Do you like this module? Show your appreciation by buying me ☕.