Drupal is a registered trademark of Dries Buytaert

facet_bot_blocker

582 sites Security covered
View on drupal.org

Website crawlers have been around for decades, but starting around 2024/2025, with the increasing presences of AI/LM tools which are being trained, we have seen a significant rise in traffic that do not:

  • Follow robots.txt directives
  • Respect rel=”nofollow” links
  • Identify themselves via their user agent strings, and instead use what appears to be a standard OS and browser combination.
  • Additionally, these come from many different IP addresses instead of a single address.

Now it is an assumption that traffic like this are from crawlers for the purpose of building their own content repositories to train AIs, and not necessarily malicious (perhaps just incompetently designed). But these become a huge problem for websites during a few circumstances.

“Crawler trap” - this is a situation where a web crawler has access to a large number of links that have many different variations. This type of setup often appears in examples of search pages using “Facets”. With sufficient complexity, crawlers can find a near endless supply of unique URLs.

This module allows site administrators to configure a maximum number of facets that can be used before the user receives a blocked response.

Note on WAFs: It is recommended that if your site has access to some CDN tool with a "Web Application Firewall" feature, that you use that to implement such a block. As these requests hitting your site, even if blocked, will still contribute to your host's rate limit. This module was created to block such requests when a WAF is not available.

REQUIREMENTS

This module does not strictly require any other contributed modules.

Optional: Installing either the Memcache or Redis module allows storing tracking counters and config in memory (instead of the database), improving performance in high-traffic environments.

INSTALLATION

Install as you would normally install a contributed Drupal module.

CONFIGURATION

  • Enable the module: Enable the Facet Bot Blocker module from the Extend page (/admin/modules) or using Drush (drush en facet_bot_blocker).
  • Configure the module: Go to the module’s settings form (e.g., /admin/config/system/facet-bot-blocker).
  • Set the facet parameter limit, decide whether to return "410 Gone" or "403 Forbidden", and optionally customize the blocking message.
  • (Optional) Check the dashboard: A dashboard page (e.g., /admin/reports/facet-bot-blocker) displays counts of blocked and allowed requests, the last blocked IP, and other metrics. This data is stored in cache if memcache/redis is installed.

Note: If you are using version 3.x of the Facets module, and you are using views, you can use exposed filters as facets, which use regular forms. This should mitigate the bot traffic crawling these facets. This is unlikely to be addressed in the "classic style" facet blocks, and these features will not be back-ported to Facets 2.x. But that module may implement some solution in the future. See #2937191: Render using theme input and select instead of lists with links for checkboxes and dropdown for more information.

Despite the name, this module does not distinguish between real humans, and bots who are mis-representing their user agent. This module blocks requests for URLs that contain an f[{limit}] parameter, where the limit is configured by the site admin.

Bots that make requests under your facet limit are likely still crawling your site. For additional bot blocking options, see the Bot Blocker module.

Activity

Total releases
3
First release
Apr 2025
Latest release
9 months ago
Release cadence
26 days
Stability
100% stable

Release Timeline

Releases

Version Type Release date
1.0.2 Stable May 23, 2025
1.0.1 Stable Apr 9, 2025
1.0.0 Stable Apr 1, 2025