Drupal is a registered trademark of Dries Buytaert

dkan_dataset_archiver

104 sites Security covered
View on drupal.org

This module is under development following contrib first. At the moment it is incomplete but is functional. The API's are stable and use could proceed with caution. See the issue queue for remaining "major" issues.

Purpose

This module creates an individual archive copy of any DKAN dataset's resource files when the dataset is published. This archive persists even if the original resources are deleted or updated. They are stored in files/dataset-archives/. The archive entity contains the archive date, keywords and themes that belonged to the dataset when it was published.

This module will optionally allow aggregated DKAN dataset archives. These aggregations can be by theme, keyword, or annually. During the aggregation, a zip file is created that contains the aggregated CSV's, and a manifest that connects each file to their title, id and original modified date. The generation of the archives all takes place using a queue that runs on cron, so they should not degrade the performance of a site.

Dependencies

  • DKAN a Drupal based open-source open data portal for the storage and distribution of open data.
  • League Flysystem and flysystem-aws-s3-v3 is used for writing files to external file storage if that is enabled.

DEPENDENCY ISSUE

Currently, the module is dependent on some submodules in dkan that can not be discovered properly so they are removed from the info.yml file and need to be put back with a patch using CWEAGANS composer-patches See the README in the patches directory for more details.

Submodule Archive Remote Storage

This optional module enables the ability to maintain or copy archives to remote storage. Only AWS is supported at this time.

API

The following API endpoints exist. They can be used to populate front end pages of archives. All the apid respect the permissions of the user, so if the user has permission to see public repos they will be listed. If they have permission to see private repos, they will be listed.

  • /api/1/archive/individual/{link_type}/{filter_by}/{filter} This endpoint lists all individual archives, with optional filters for link_type (absolute|relative), filter_by (keyword|theme), and filter matching any item in either Keyword or Theme.
  • /api/1/archive/aggregate/keyword/{keyword}/{link_type} This endpoint lists all published archives of type 'keyword' and 'annual_keyword' sorted by theme and modified_date. To control whether the links are 'relative' or 'absolute' set the link_type. A keyword of "none" is a special case that will not include the annuals.
  • /api/1/archive/aggregate/current/{aggregate_of}/{theme_or_keyword}/{link_type} This endpoint lists all 'current' archives for current downloads of can be filtered by aggregate_of and theme_or_keyword sorted by theme and modified_date. To control whether the links are 'relative' or 'absolute' set the link_type.
  • /api/1/archive/aggregate/theme/{theme}/{link_type} This endpoint lists all published archives of type 'theme' and 'annual_theme' sorted by theme and modified_date. To control whether the links are 'relative' or 'absolute' set the link_type. A theme of "none" is a special case that will not include the annuals.
  • /api/1/archive/aggregate/annual/{aggregate_of}/{filter}/{link_type} This endpoint lists all published archives of type 'annual', with optional filtering by aggregate_of (keyword|theme|all|none) sorted by modified_date and filter. To control whether the links are 'relative' or 'absolute' set the link_type. A filter of "none" is a special case that will not include the annual_keyword or annual_theme.

FAQs

  • Can I have both keyword archives AND theme archives? - Yes you can enable each of those separately.
  • Can I store archives in local files and in an S3 bucket? - Yes those are operations that are separate from each other.
  • Can I edit an archive if I need to? - Yes you can edit the archive entity's data or you can edit the files in the zip file and re-upload the zip. How you do that may change depending on where your files are stored.
  • Does an archive contain datasets that are currently published? - Yes, an archive will only contain dataset content that was just created, updated. Example: If a dataset was updated on July 2nd, the archive created will contain the dataset as it existed on July 1st.
  • Does an archive of a theme or keyword, contain all the files that are part of that? - No, when a theme archive is created, it only contains the dataset files belonging to the theme that were just updated. Example: The theme "Ducks" contains datasets Mallard, Teal, Whistling, and Daffy. If only Mallard and Teal published new data, then the archive would only contain the old data for Mallard and Teal.
  • How are archives of private datasets handled? - If the dataset is marked private, and the option in the module settings for "Archive private datasets" is turned on, then the related archives will be created in private file storage if your site has $settings['file_private_path'] properly defined.

Drush commands

  • dkan_dataset_archiver:create-current - This command is run as part of the initial deploy following install. It can be run at any other time if somehow the 'current' archives are out of sync. It should be safe to run multiple times.
  • dkan_dataset_archiver:create-annual [year] - This will create or update annual archives for any year. It should be safe to run multiple times.

Activity

Total releases
32
First release
Aug 2025
Latest release
3 weeks ago
Release cadence
6 days
Stability
0% stable

Release Timeline

Releases

Version Type Release date
1.0.0-beta12 Pre-release Feb 11, 2026
1.0.0-beta11 Pre-release Dec 23, 2025
1.0.0-beta10 Pre-release Dec 15, 2025
1.0.0-beta9 Pre-release Dec 10, 2025
1.0.0-beta8 Pre-release Nov 25, 2025
1.0.0-beta7 Pre-release Nov 25, 2025
1.0.0-beta6 Pre-release Nov 24, 2025
1.0.0-beta5 Pre-release Nov 17, 2025
1.0.0-beta4 Pre-release Nov 16, 2025
1.0.0-beta3 Pre-release Nov 11, 2025
1.0.0-beta2 Pre-release Nov 9, 2025
1.0.0-beta1 Pre-release Nov 6, 2025
1.0.0-alpha19 Pre-release Nov 3, 2025
1.0.0-alpha18 Pre-release Nov 2, 2025
1.0.0-alpha17 Pre-release Nov 2, 2025
1.0.0-alpha16 Pre-release Oct 31, 2025
1.0.0-alpha15 Pre-release Oct 28, 2025
1.0.0-alpha14 Pre-release Oct 20, 2025
1.0.0-alpha13 Pre-release Oct 16, 2025
1.0.0-alpha12 Pre-release Oct 15, 2025
1.0.0-alpha11 Pre-release Oct 15, 2025
1.0.0-alpha10 Pre-release Oct 15, 2025
1.0.0-alpha9 Pre-release Oct 15, 2025
1.0.0-alpha8 Pre-release Oct 1, 2025
1.0.0-alpha7 Pre-release Sep 24, 2025
1.0.0-alpha6 Pre-release Sep 24, 2025
1.0.0-alpha5 Pre-release Sep 24, 2025
1.0.0-alpha4 Pre-release Sep 24, 2025
1.0.0-alpha3 Pre-release Sep 18, 2025
1.0.0-alpha2 Pre-release Sep 2, 2025
1.0.0-alpha1 Pre-release Aug 26, 2025
1.0.x-dev Dev Aug 8, 2025