crawler_rate_limit
Crawler Rate Limit allows you to limit requests performed by web crawlers, bots, and spiders. It can also rate limit regular traffic, and block requests based on autonomous system number (ASN).
Features
- rate limits web crawlers, bots and spiders
- rate limits regular traffic (human visitors and bots not openly identifying as bots) at the visitor level (IP address + User-Agent string) and / or autonomous system level
- blocks traffic at the ASN-level
- number of allowed requests in a given time interval (limit) is configurable
- limits bot requests based on the User-Agent string - if the same crawler uses multiple IP addresses, all those requests will count towards the single limit
- each type of rate limiting can be configured independently from the other types
- minimal effect on performance
- rate limiting is performed in a very early phase of request handling
- uses Redis, Memcached or APCu as a rate limiter backend
There are a number of crawlers, bots, and spiders that are known for excessive crawling of websites. Such requests increase the server load and often affect performance of the website for regular visitors. In extreme cases they can even bring the site down.
This module detects if the request is made by a crawler/bot/spider by inspecting the User-Agent HTTP header and then limits the number of requests the crawler is allowed to perform in a given time interval. Once the limit is reached, the server will respond with HTTP code 429 (Too many requests). The crawler is unblocked at the end of the time interval and a new cycle beings. If the crawler exceeds the rate limit again, it will again be blocked for the duration of the same time interval.
Other types of rate limiting operate on the same principle.
Limitations
This module can't protect against DDOS attacks. Blocking and rate limiting will be effective only if your web server can actually handle volume of the traffic it receives. Once the server gets overloaded with the requests it will start failing (dropping requests) and Drupal won't ever get a chance to process requests and perform rate limiting or blocking.
However, rate limiting and blocking may help your server to handle much larger number of requests by significantly reducing the time required to process single request. Request that is either blocked (403) or rate limited (429) by Crawler Rate Limit module can be processed up to 10 times faster than regular Drupal request that returns HTTP code 200.
Requirements
- One of the three supported backends: APCu, Redis or Memcached.
- jaybizzle/crawler-detect (installed along with the module)
- nikolaposa/rate-limit (installed along with the module)
Required only if you intend to use ASN-level rate limiting and/or blocking:
- geoip2/geoip2 composer package
- GeoLite2/GeoIP2 binary ASN Database
- Cron task or GeoIP Update to keep the ASN database up-to-date
Installation and configuration
Refer to the README.md file that comes with the module.
How to update to version 3
New features introduced in the version 3 require changes to the module configuration in settings.php file. Version 3 is backward compatible with versions 1 and 2 and it will continue to work with old settings. However, it's recommended to update settings in order to take advantage of the new features. For details how to update refer to the README.md file that comes with the module.