localgov_publications_importer
Import PDFs into your localGov Drupal site as HTML publications automatically.
Please join the #feature-publications-importer channel on LGD Slack to learn more about this.
You can fund the development of this feature via the LocalGov Drupal Community Fund.
How to try this out:
- Enable the module.
- Choose "Content" -> "Imports" from the admin menu.
- Upload a PDF file to the form linked from the top right and submit it.
- You'll be redirected back to the list of imports on submission.
- Imports are processed in the background. Once processed, a link to the imported publication will show on the imports screen.
Using AI to format the imported PDF
If you'd like to use AI to clean up the text, you can. A submodule, localgov_publications_importer_ai, is included. To enable this module you will need to install the Drupal AI module and at least one AI provider module. The default AI chat provider will be used if one is configured. In the steps below will illustrate how to configure one using ChatGPT. Similar steps can be used with other AI LLM providers:
- Enable the localgov_publications_importer_ai submodule.
- Download and install the Open AI provider module.
- Get an API key from OpenAI (requires an Open AI account).
- Choose "Configuration" -> "AI" -> "Provider Settings" -> "OpenAI Authentication" from the admin menu.
- Click the link saying "create a new key".
- Add your API key here. Key name and description can be whatever makes sense to you. Key type should be "Authentication". Key provider can be "Configuration" if you're just testing locally. Value is the key itself.
- Save the key and head to "Configuration" -> "AI" -> "Provider Settings" -> "OpenAI Authentication" again.
- This time you can choose your key from the dropdown. The key will be verified on save, so if you put in a key that's incorrect, you'll be notified here.
- Once the key is saved, head to "Configuration" -> "AI" -> "AI Default Settings".
Scroll down to chat. Ensure OpenAI is selected. Choose the model you'd like to use. GPT-4o seems to work. - Now repeat the steps to upload a PDF from before. You'll notice that the import takes longer, and the results are cleaned up compared to what they were previously like.
Plugin structure:
This module is designed to be customisable. You can either write your own plugins to affect how content is imported, or use Drupal modules that provide plugins.
We work on an instance of ImportInterface, which is passed between plugins. There's a default implementation called Import, but you can use your own if you like.
Operations are what happens to an Import. These can be one of three types:
- Extract: Plugin/LocalGovImporter/Extract
- Transform: Plugin/LocalGovImporter/Transform
- Save: Plugin/LocalGovImporter/Save
Content is extracted from the uploaded file by an Extract plugin, and placed on an Import object. It's then transformed by any number of Transform plugins, and saved by a Save plugin.