Nov 16 2006
Nov 16

Until the mid 90s, spam was a non-issue. It was exciting to get email. The web was also virtually spam-free. Netizens respected one another and everything was very pleasant. Spam Those days are long gone. Fortunately, there are some pretty amazing tools out there for fighting email spam. I use a combination of SpamAssassin on the server side and Thunderbird (with its wonderful built in junkmail filters) on the desktop. I am sent thousands of spam messages a day that I never see thanks to these tools.

But approximately five years ago, a new type of spam emerged which exploited not email but the web. Among this new wave of abuse, my personal favorite, comment spam.

I love getting comments on my blog. I also like reading comments on other blogs. However, it's not practical to simply allow anyone who wants to leave a comment, as within a very short period of time, blog comments will be overrun with spam generated by scripts that exploit sites with permissive comment privileges. To prevent this, most sites require that you log in to post a comment. But this may be too much to ask of someone who just wants to post a quick comment as they pass through. I often come across blog postings which I would like to contribute to, but I simply don't bother because the site requires me to create an account (which I'd likely only use once) before posting a comment. Not worth it. Another common practice is the use of "captchas" which require a user enter some bit of information to prove they are human and not a script. This works fairly well, however, it is still a hurdle that must be jumped before a user can post a comment. And as I've personally learned, captchas, particularly those that are image based, are prone to problems which may leave users unable to post a comment at all.

As email spam grew, there were various efforts to implement similar types of protection, requiring by the sender to somehow verify he was not a spammer (typically by resending the email with some special text in the subject line). None of these solutions are around anymore because they were just plain annoying. SpamAssassin and other similar tools are now used on most mail servers. Savvy email users will typically have some sort of junkmail filter built into their email client or perhaps as part of an anti-virus package. And spam is much less a nuisance as a result.

What we need for comment spam is a similar solution. One that works without getting in the way of the commenter or causing a lot of work for the blog owner. Turn it on, and it works. I've recently come across just such a solution for blogs which also happens to have a very nice Drupal module so you can quickly and easily put this solution to work on your own Drupal site.

Enter Akismet

It's called Akismet, and it works similarly to junkmail filters. After a comment (or virtually any piece of content) has been submitted, the Akismet module passes it to a server where it is analyzed. Content labeled as potential spam is then saved for review by the site admin and not posted to the blog.

Pricing

Akismet follows my absolute favorite pricing model. It's free for workaday Joes like me and costs money only if you're a large company that will be pumping lots of bits through the service. They realize that most small bloggers are not making any money on their sites, and they price their service accordingly. Very cool.

Installation

In order to use Akismet, you need to obtain a Wordpress API key. I'm not entirely sure why, but it is free and having a collection of API keys is fun. So get one if you have not already.

The Akismet Drupal module is appropriately named Akismet. It's not currently hosted on Drupal.org, but hopefully the author will eventually host it there as that is where most people find their Drupal modules. Instead, you will need to download the Akismet module from the author's own site. The installation process is standard. Unzip the contents into your site's modules directory, go to your admin/modules page and enable it. There is no need for additional Akismet code as all the spam checking is done on Akismet's servers.

Configuration

After installing Akismet, I was immediately impressed at how professional the module is. There were absolutely no problems after installation. Configuration options are powerful and very well explained. The spam queue is very nice and lets you quickly mark content as "ham" (ie not spam) and delete actual spam. As you build up a level of trust with the spam detection, you can configure the module to automatically delete spam after a period of time.

Spam filtering can be enabled on a per node type basis, allowing you to turn off filtering for node types submitted by trusted users (such as bloggers) and on for others (eg forums users). Comment filtering is configured separately.

Another sweet feature is the ability to customize responses to detected spammers. In addition to being able to delay response time by a configureable number of seconds, you can also configure an alternate HTTP response to the client, such as 503 (service unavailable) or 403 (access denied). Nice touch.

One small problem

I've only been working with Akismet for several days now. And I'd previously been using captcha, which I imagine got me out of the spammers sights for a while (spammers seem to spend most of their efforts on sites where their scripts can post content successfully). So far, Akismet has detected 12 spams, 2 of which were not actually spam. These were very short comments, and I imagine Akismet takes the length of the content into consideration. I assume that as the Akismet server processes more and more pieces of content, it will become more accurate in picking out spam versus legitimate content. Each time a piece of flagged content is marked as "ham", it is sent to Akismet where it can help refine their rule sets and make the service more accurate.

Perhaps Akismet could provide an additional option that allows users to increase or decrease tolerance for spam. I would prefer to err on the side of caution and let comments through.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web