Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

I Got 99 Problems, But Drupal Crawl Errors Ain't One!

Parent Feed: 

Crawl errors are the bane of every digital marketer-- they seemingly pop up over night and their numbers grow exponentially. Luckily for Drupal marketers there a number of techniques you can use to minimize the number of crawl errors that occur and more easily resolve these types of crawl errors on your website. However, before we get started, let's first review a few common crawl errors that you're likely to run into:

  • Page Not Found - Hard 404 Errors. The hard 404 error is the most common 404 error you'll encounter when reviewing your crawl errors. These types of crawl errors generally occur when a previously published piece of content is deleted or the content is moved to another location without creating a search engine friendly redirect (301 redirect).
     
  • Page Not Found - Soft 404 Errors. Soft 404 errors are not quite as common as a hard 404. A soft 404 error occurs when a piece of content is published but has very little content on it or duplicate content. When the page returns a 300 code, it indicates that the page is accessible, but because there is so little content on the page it will not be indexed as it will be classified as a soft 404.
     
  • Access Denied - 403 Errors. This is one of the most frequent errors that we run into with Drupal websites. 403 errors typically occur when a previously published page is unpublished and no forward is created to a relevant page.
     
  • Internal Server Errors - 500 Errors. Internal server errors, also known as 500 errors, occur when something unexpected happens on the website or on the server and the source of the error cannot be identified.

How to Fix Crawl Errors Using Drupal

The best way to stop those pesky crawl errors from occurring is by configuring your Drupal website to defend against them. To do so, we recommend installing and configuring the following modules.

  1. The first thing you'll want to setup is the Pathauto module, which "provides a mechanism for modules to automatically generate aliases for the content they manage." In other words, you can configure Pathauto so that when you publish content, the content's address won't look like "/node/231." Instead, it will use a logical, human and search engine friendly syntax based on the patterns that you set.
     
  2. Perhaps the most important module you'll install is the Redirect module. The Redirect module can be configured to automatically create a redirect when you change the path of existing content. You can also use the Redirect module to manually fix crawl errors that are not automatically fixed.
     
  3. Finally, the Search 404 module is another important module for preventing 404 errors. When a visitor lands on a page that doesn't existing, it generates a 404 error; the Search 404 module can redirect that visitor to a new page using an internal site search related to the page. The intent of this module is to present similar content to what they were originally looking for. This can help reduce bounce rates and extend their time on the site.

How to Monitor Crawl Errors

While these modules will help to prevent crawl errors from occurring, they are not a guarantee against crawl errors. We recommend regularly monitoring your crawl errors using the following techniques.

At least once a month, check Drupal's built in reports for the "Top 'page not found' errors" and "Top 'access denied' errors". To find those reports, use the following URLs:

  • http://[EnterYourWebsiteURL.com< ]/admin/reports/access-denied
  • http://[EnterYourWebsiteURL.com]/admin/reports/page-not-found

The Redirect 404 module allows you to find and sort all 404 errors that have occurred on your website and provides an easy way to create redirects for those URLs.

Although not specific to Drupal, Google Search Console is one of the easiest ways of finding your 404, 403, 500 and other errors. In addition to finding your crawl errors, you should submit your XML sitemaps to Google using this tool to increase the number of pages indexed and identify possible indexing issues.

Google Search Console Index > Pages reporthttps://www.volacci.com/sites/default/files/inline-images/google-search-..." class="align-center">

One final tool for monitoring errors is the Screaming Frog SEO Spider. The SEO spider attempts to duplicate the functionality of Google's search engine spiders. You can run a crawl of the site and it will often identify many hidden crawl errors and issues.

Conclusion

There are a lot of moving parts that maintain your website's search engine optimization-- which is why I always recommend using a calendar to keep yourself organized. Setup daily, weekly, monthly, quarterly, and annual reminders so that your can address crawl errors and make sure your website stays at the top of the search engine rankings. Which techniques do you use to prevent crawl errors?

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web