Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Google Crawler hitting your site too aggressively?

Parent Feed: 

If your Drupal site suffers occasional slow downs or outages, check if crawlers are hitting your site too hard.

We've seen several clients complain, and upon investigation we found that the culprit is Google's own crawler.

The tell tale sign is that you will see lots of queries executing with the LIMIT clause having high numbers. Depending on your site's specifics, these queries would be slow queries too.

This means that there are crawlers that accessing very old content (hundreds of pages back).

Here is an example from a recent client:

SELECT node.nid AS nid, ...
LIMIT 4213, 26

SELECT node.nid AS nid, ...
LIMIT 7489, 26

SELECT node.nid AS nid, ...
LIMIT 8893, 26

As you can see, Google's crawler is going back 340+ pages for the last query.

Going to your web server's log would show something like this:

1.2.3.4 - - [26/Feb/2012:07:26:59 -0800] "GET /blah-blah?page=621 HTTP/1.1" 200 10017 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Note the page= part, and the Google Bot as the user agent.

The solution is often to go into Google Webmaster and reduce the crawl rate for the site, so they are not hitting too many pages at the same time. Start with 20%. You may need to go up down to 40% in severe cases.

Either way, you need to experiment with a value that would fit your site's specific case.

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web