Upgrade Your Drupal Skills
We trained 1,000+ Drupal Developers over the last decade.
See Advanced Courses NAH, I know EnoughGoogle Crawler hitting your site too aggressively?
If your Drupal site suffers occasional slow downs or outages, check if crawlers are hitting your site too hard.
We've seen several clients complain, and upon investigation we found that the culprit is Google's own crawler.
The tell tale sign is that you will see lots of queries executing with the LIMIT clause having high numbers. Depending on your site's specifics, these queries would be slow queries too.
This means that there are crawlers that accessing very old content (hundreds of pages back).
Here is an example from a recent client:
SELECT node.nid AS nid, ...
LIMIT 4213, 26SELECT node.nid AS nid, ...
LIMIT 7489, 26SELECT node.nid AS nid, ...
LIMIT 8893, 26
As you can see, Google's crawler is going back 340+ pages for the last query.
Going to your web server's log would show something like this:
1.2.3.4 - - [26/Feb/2012:07:26:59 -0800] "GET /blah-blah?page=621 HTTP/1.1" 200 10017 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Note the page= part, and the Google Bot as the user agent.
The solution is often to go into Google Webmaster and reduce the crawl rate for the site, so they are not hitting too many pages at the same time. Start with 20%. You may need to go up down to 40% in severe cases.
Either way, you need to experiment with a value that would fit your site's specific case.
About Drupal Sun
Drupal Sun is an Evolving Web project. It allows you to:
- Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
- Facet based on tags, author, or feed
- Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
- View the entire article text inline, or in the context of the site where it was created
See the blog post at Evolving Web