Oct 16 2014
Oct 16

Earlier today, the Drupal Security Team announced SA-CORE-2014-005 - Drupal core - SQL injection, a 'Highly Critical' bug in Drupal 7 core that could result in SQL injection, leading to a whole host of other problems.

While not a regular occurrence, this kind of vulnerability is disclosed from time to time—if not in Drupal core, in some popular contributed module, or in some package you have running on your Internet-connected servers. What's the best way to update your entire infrastructure (all your sites and servers) against a vulnerability like this, and fast? High profile sites could be quickly targeted by criminals, and need to be able to deploy a fix ASAP... and though lower-profile sites may not be immediately targeted, you can bet there will eventually be a malicious bot scanning for vulnerable sites, so these sites need to still apply the fix in a timely manner.

In this blog post, I'll show how I patched all of Midwestern Mac's Drupal 7 sites in less than 5 minutes.

Hotfixing Drupal core - many options

Before we begin, let me start off by saying there are many ways you can apply a security patch, and some are simpler than others. As many have pointed out (e.g. Lullabot, you can simply download the one line patch and apply it to your Drupal codebase using patch -p1.

You could also use Drush to do a Drupal core update (drush up drupal), but you'll still need to do this, manually, on every Drupal installation you manage.

If you have multiple webservers with Drupal (or multiple instances of Drupal 7 on a single server, or spread across multiple servers), then there are simpler ways of either deploying the hotfix, or upgrading Drupal core via drush and/or version control (you are using Git or some other VCS, right?).

Enter Ansible, the Swiss Army Knife for infrastructure

Ansible is a powerful infrastructure management tool. It does Configuration Management (CM), just like Puppet or Chef, but it goes much, much further. One great feature of Ansible is the ability to run ad-hoc commands against a bunch of servers at once.

After installing Ansible, you need to create a hosts file at /etc/ansible/hosts, and tell Ansible about your servers (this is an 'inventory' of servers). Here's a simplified overview of my file:

[mm]
midwesternmac.com drupal_docroot=/path/to/drupal

[servercheck-drupal]
servercheck.in drupal_docroot=/path/to/drupal

[hostedsolr-drupal]
hostedapachesolr.com drupal_docroot=/path/to/drupal

[drupal7:children]
mm
servercheck-drupal
hostedsolr-drupal

There are a couple quick things to note: the inventory file follows an ini-style format, so you define groups of servers with [groupname] (then list the servers one by one after the group name, with optional variables in key=value format after the server name), then define groups of groups with [groupname:children] (then list the groups you want to include in this group). We defined a group for each site (currently each group just has one Drupal web server), then defined a drupal7 group to contain all the Drupal 7 servers.

As long as you can connect to the servers using SSH, you're golden. No additional configuration, no software to install on the servers, nada.

Let's go ahead and quickly check if we can connect to all our servers with the ansible command:

$ ansible drupal7 -m ping
hostedapachesolr.com | success >> {
    "changed": false,
    "ping": "pong"
}
[...]

All the servers have responded with a 'pong', so we know we're connected. Yay!

For a simple fix, we could add a variable to our inventory file for each server defining the Drupal document root(s) on the server, then use that variable to apply the hotfix like so:

$ ansible drupal7 -m shell -a "curl https://www.drupal.org/files/issues/SA-CORE-2014-005-D7.patch | patch -p1 chdir={{ drupal_docroot }}"

This would quickly apply the hotfix on all your servers, using Ansible's shell module (which, conveniently, runs shell commands verbatim, and tells you the output).

Fixing core, and much more

Instead of running one command via ansible, let's make a really simple, short Ansible playbook to fix and verify the vulnerability. I created a file named drupal-fix.yml (that's right, Ansible uses plain old YAML files, just like Drupal 8!), and put in the following contents:

---
- hosts: drupal7
  tasks:
    - name: Download drupal core patch.
      get_url:
        url: https://www.drupal.org/files/issues/SA-CORE-2014-005-D7.patch
        dest: /tmp/SA-CORE-2014-005-D7.patch

    - name: Apply the patch from the drupal docroot.
      shell: "patch -p1 < /tmp/SA-CORE-2014-005-D7.patch chdir={{ drupal_docroot }}"

    - name: Restart apache (or nginx, and/or php-fpm, etc.) to rebuild opcode cache.
      service: name=httpd state=restarted

    - name: Clear Drupal caches (because it's always a good idea).
      command: "drush cc all chdir={{ drupal_docroot }}"

    - name: Ensure we're not vulnerable anymore.
      [redacted]

Now, there are again many, many different ways I could've done this. (And to the eagle-eyed, you'll note I haven't included my test for the vulnerability... I'd rather not share how to test for the vulnerability until people have had a chance to update all their sites).

I chose to do the hotfix first, and quickly, since I didn't necessarily have time to update all my Drupal project codebases to Drupal 7.32, then push the updated code to all my repositories. I did do this later in the day, however, and used a playbook similar to the above, replacing the first two tasks with:

- name: Pull down the latest code changes.
  git:
    repo: "git://[mm-git-host]/{{ inventory_hostname }}.git"
    dest: "{{ drupal_docroot }}"
    version: master

Using Ansible's git module, I can tell Ansible to make sure the given directory (dest) has the latest commit to the master branch in the given repo. I could've also used a command and run git pull from the drupal_docroot directory, but I like using Ansible's git module, which provides great reporting and error handling.

Summary

This post basically followed my train of thought after hearing about the vulnerability, and while there are a dozen other ways to patch the vulnerability on multiple sites/servers, this was the way I did it. Though I patched just 9 servers in about 5 minutes (from the time I started writing the playbook (drupal-fix.yml) to the time it was deployed everywhere), I could just as easily have deployed the fix to dozens or hundreds of Drupal servers in the same amount of time; Ansible is fast and uses simple, secure SSH connections to manage servers.

If you want to see much, much more about what Ansible can do for your infrastructure, please check out my book, Ansible for DevOps, and also check out my session from DrupalCon Austin earlier this year: DevOps for Humans: Ansible for Drupal Deployment Victory!.

Oct 12 2014
Oct 12

Now that Drupal 8.0.0-beta1 is out, and the headless Drupal craze is in full-swing, the Drupal St. Louis meetup this month will focus on using Drupal 8 with AngularJS to build a demo pizza ordering app. (The meetup is on Thurs. Oct. 23, starting at 6:30 p.m.; see even more info in this Zero to Drupal post).

We'll be hacking away and seeing how far we can get, and hopefully we'll be able to leave with at least an MVP-quality product! I'll be at the event, mostly helping people get a Drupal 8 development environment up and running. For some, this alone will hopefully be a huge help, and maybe motivation to adopt Drupal 8 more quickly!

If you're in or around the St. Louis area, consider joining us; especially if you would like to learn something about either Drupal 8 or AngularJS!

P.S. To those who have been emailing: the rest of the Apache Solr search series is coming, it's just been postponed while I've started a new job at Acquia, and had a new baby!

Aug 21 2014
Aug 21

Posts in this series:

Drupal has included basic site search functionality since its first public release. Search administration was added in Drupal 2.0.0 in 2001, and search quality, relevance, and customization was improved dramatically throughout the Drupal 4.x series, especially in Drupal 4.7.0. Drupal's built-in search provides decent database-backed search, but offers a minimal set of features, and slows down dramatically as the size of a Drupal site grows beyond thousands of nodes.

In the mid-2000s, when most custom search solutions were relatively niche products, and the Google Search Appliance dominated the field of large-scale custom search, Yonik Seeley started working on Solr for CNet Networks. Solr was designed to work with Lucene, and offered fast indexing, extremely fast search, and as time went on, other helpful features like distributed search and geospatial search. Once the project was open-sourced and released under the Apache Software Foundation's umbrella in 2006, the search engine became one of the most popular engines for customized and more performant site search.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Timeline of Apache Solr and Drupal Solr Integration

If you can't view the timeline, please click through and read this article on Midwestern Mac's website directly.

A brief history of Apache Solr Search and Search API Solr

Only two years after Apache Solr was released, the Apache Solr Search module was created. Originally, the module was written for Drupal 5.x, but it has been actively maintained for many years and was ported to Drupal 6 and 7, with some major rewrites and modifications to keep the module up to date, easy to use, and integrated with all of Apache Solr's new features over time. As Solr gained popularity, many Drupal sites started switching from using core search or heavily customized Views to using Apache Solr.

Seeing this trend, hosted solutions for Solr search were built specifically for Drupal sites. Some of these solutions included Acquia's Acquia Search (2008), Midwestern Mac's Hosted Apache Solr (2011), and Pantheon's Apache Solr service. Acquia, seeing the need for more stability and development in Drupal's Solr integration module, began sponsoring the development of the Apache Solr Search module in April of 2009 (wayback machine).

Search API came on the scene after Drupal 7 was released in 2011. Search API promised to be a rethink of search in Drupal. Instead of tying to a particular search technology, a search framework (with modular plugins for indexing, searching, facets, etc.) was written to plug into the Drupal database, Apache Solr, or whatever other systems a Drupal site could integrate with. The Search API Solr module was released shortly thereafter, and both Search API and Search API Solr were written exclusively for Drupal 7.

Both Solr integration solutions—Apache Solr Search and Search API Solr—have been actively developed, and both modules offer a very similar set of features. This led to a few issues during reign of Drupal 7 (still the current version as of this writing):

  • Many site builders wonder: Which module should I use?
  • Switching site search between the two modules (for example, if you find a feature in one that is not in the other) can be troublesome.
  • Does corporate sponsorship of one module over the other cause any issues in enterprise adoption, new feature development, or community involvement?

These problems have run their course over the past few years, and cause much confusion. Some add-on modules, like Facet API (which allows you to build facets for your search results), have been abstracted and generalized enough to be used with either search solution, but there are dozens of modules, and hundreds of blog posts, tutorials, and documentation pages written specifically for one module or the other. For Drupal 6 users, there is only one choice (since Search API Solr is only available for Drupal 7), but for Drupal 7 users, this has been a major issue.

Hosted Apache Solr's solr module usage statistics reveal the community's split over the two modules: 46% of the sites using Hosted Apache Solr use the Apache Solr Search module, while 54% of the sites use Search API Solr.

So, is Drupal's Solr community destined to be divided for eternity? Luckily, no! There are many positive trends in the current Solr module development cycle, and some great news regarding Drupal 8.

Uniting Forces

Already for Drupal 7, the pain of switching between the two modules (or supporting both, as Hosted Apache Solr does) is greatly reduced by the fact that both modules started using a unified set of Apache Solr configuration files (like schema.xml, solrconfig.xml, etc.) as of mid-2012 (see the Apache Solr Common Configurations sandbox project).

Additionally, development of add-on modules like Facet API and the like has been generalized so the features can be used today with either search solution with minimal effort.

A Brighter Future

There's still the problem of two separate modules, two separate sets of APIs, and a divided community effort between the two modules. When Drupal 8 rolls around, that division will be no more! In a 2013 blog post, Nick Veenhof announced that the maintainers of Search API and Apache Solr Search would be working together on a new version of Search API for Drupal 8.

The effort is already underway, as the first Drupal 8 Search API code sprint was held this past June in Belgium, after a successful funding campaign on Drupalfund.us.

The future of Solr and Drupal is bright! Even as other search engines like Elasticsearch are beginning to see more adoption, Apache Solr (which has seen hundreds of new features and greater stability throughout it's 4.x release series) continues to gain momentum as one of the best text search solutions for Drupal sites.

Aug 11 2014
Aug 11

Posts in this series:

It's common knowledge in the Drupal community that Apache Solr (and other text-optimized search engines like Elasticsearch) blow database-backed search out of the water in terms of speed, relevance, and functionality. But most developers don't really know why, or just how much an engine like Solr can help them.

I'm going to be writing a series of blog posts on Apache Solr and Drupal, and while some parts of the series will be very Drupal-centric, I hope I'll be able to illuminate why Solr itself (and other search engines like it) are so effective, and why you should be using them instead of simple database-backed search (like Drupal core's Search module uses by default), even for small sites where search isn't a primary feature.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Why not Database?

Apache Solr's wiki leads off it's Why Use Solr page with the following:

If your use case requires a person to type words into a search box, you want a text search engine like Solr.

At a basic level, databases are optimized for storing and retrieiving bits of data, usually either a record at a time, or in batches. And relational databases like MySQL, MariaDB, PostgreSQL, and SQLite are set up in such a way that data is stored in various tables and fields, rather than in one large bucket per record.

In Drupal, a typical node entity will have a title in the node table, a body in the field_data_body table, maybe an image with a description in another table, an author whose name is in the users table, etc. Usually, you want to allow users of your site to enter a keyword in a search box and search through all the data stored across all those fields.

Drupal's Search module avoids making ugly and slow search queries by building an index of all the search terms on the site, and storing that index inside a separate database table, which is then used to map keywords to entities that match those keywords. Drupal's venerable Views module will even enable you to bypass the search indexing and search directly in multiple tables for a certain keyword.

So what's the downside to database-backed search? Mainly, performance. Databases are built to be efficient query engines—provide a specific set of parameters, and the database returns a specific set of data. Most databases are not optimized for arbitrary string-based search. Queries where you use LIKE '%keyword%' are not that well optimized, and will be slow—especially if the query is being used across multiple JOINed tables! And even if you use the Search module or some other method of pre-indexing all the keyword data, relational databases will still be less efficient (and require much more work on a developer's part) for arbitrary text searches.

If you're simply building lists of data based on very specific parameters (especially where the conditions for your query all utilize speedy indexes in the database), a relational database like MySQL will be highly effective. But usually, for search, you don't just have a couple options and maybe a custom sort—you have a keyword field (primarily), and end users have high expectations that they'll find what they're looking for by simply entering a few keywords and clicking 'Search'.

Why Solr?

What makes Solr different? Well, Solr is optimized specifically for text-based search. The Lucene text search engine that runs behind Apache Solr is built to be incredibly efficient and also offers some other really useful tools for searching. Apache Solr adds some cool features on top of Lucene, like:

  • Efficient and fast search indexing.
  • Simple search sorting on any field.
  • Search ranking based on some simple rules (over which you have complete control).
  • Multiple-index searching.
  • Features like facets, text highlighting, grouping, and document indexing (PDF, Word, etc.).
  • Geospatial search (searching based on location).

Some of these things may seem a little obtuse, and it's likely that you don't need every one of these features on your site, but it's nice to know that Solr is flexible enough to allow you to do almost anything you want with your site search.

These general ideas are great, but in order to really understand what benefits Solr offers, let's look at what happens with a basic search in Apache Solr.

Simple Explanation of how Solr performs a search

This is a very basic overview, leaving out many technical details, but I hope it will help you understand what's going on behind the scenes at a basic level.

When searching with a database-backed search, the database says, "give me a few keywords, and I'll find exact matches for those words," and it only covers a few very specific bits of data (like title, body, and author). Searching with Solr is more nuanced, flexible, and powerful.

Step 1 - Indexing search data

First, when Solr builds an index of all the content on your site, it gathers all the content's data—each entity's title, body, tags, and any other textual information related to the entity. While reading through all this textual information, Solr does some neat things, like:

  • Stemming: taking a word like "baseballs" and adding in 'word stems' like "baseball".
  • Stop Word filtering: Removing words with little search relevance like "a", "the", "of", etc.
  • Normalization: Converting special characters to simpler forms (like ü to u and ê to e so search can work more intuitively).
  • Synonym expansion: Adding synonyms to words, so the words "doctor" and "practitioner" could be equivalent in a search, even if only one word appears in the content.

These functions are collectively known as tokenization, and are actually performed by Lucene, the engine running under Solr. You don't need to know what all this means right now, but basically, if your content has the word "baseball" in it, and a user searches for "baseballs" or "stickball", the "baseball" result will be returned.

Step 2 - Searching with keywords

Second, when someone enters keywords to perform a search, Solr does a few things before it starts the actual search. We'll take the example below and run through what happens:

Baseball hall of fame

The first thing Solr does is splits the search into groupings: first the entire string, then all but one word in every combination, then all but two words in every combination, and so on, until it gets to individual words. Just like with indexing, Solr will even take individual words like "hall" and split that word out into "halls", "hall", etc. (basically any kind of related term/plural/singular/etc.).

So now, at this point, your above search looks kind of like you actually searched for:

"baseball hall of fame"
"baseball hall"
"baseball fame"
"baseballs"
"halls"
...
"baseball"

I've skipped many derivatives for clarity, but basically Solr does a little work on the entered keywords to make sure you're going to get results that are relavant for the terms you entered.

Step 3 - Executing the search

Finally, the search engine takes every one of the parsed keywords, and scores them against every piece of content in the index. Each piece of content then gets a score (higher for the number of possible matches, zero if no terms were matched). Then your search result shows all those results, ranked by how relevant they are to the current search.

If you had an entity with the title "Baseball Hall of Fame", it's likely that would be the top result. But some other content may match on parts or combinations of the keywords, so they'll also show up in the search.

If you know better than the search engine, and only want results that exactly match your search, you can enclose your keywords in quotes, so you would only get results with the exact string baseball hall of fame, and nothing that mentions 'hall of fame' or 'baseball' independently.

Solr also adds in a few nifty features when it returns the search results (or lack thereof); it will give back spelling suggestions, which are based on whether any words in the search index are very close matches to the words or phrase you entered in the keywords, and it will also highlight the matched words or word parts in the actual search result.

Summary

In a nutshell, this post explained how Apache Solr works by indexing, tokenizing, and searching your content. If you read through the entire post, you even have a basic understanding of Levenshtein distance, approximate string matching, and concept search, and can get started building your own Google :)

I'll be diving much more deeply into Apache Solr as time allows, highlighting especially the past, present, and future of Apache Solr and Drupal, as well as ways you can make Apache Solr integrate more seamlessly and effectively with your site, perform better, and do exactly what you want it to do.

Jul 29 2014
Jul 29

I wanted to post this here, since this is more of my sounding board for the Drupal community, but the details are on my personal blog: starting October 6, I will be working for Acquia as a Technical Architect in their Professional Services group!

What does this mean for this site/blog, Hosted Apache Solr, and Server Check.in? Not much, actually—they will continue on, likely at the same pace of development they've been for the past year or so (I'll work on them when I get an odd hour or two...). I am still working on completing Ansible for DevOps, and will actually be accelerating my writing schedule prior to starting the new job, since I'll have a little wedge of free time (a.k.a. unemployment!) between Mercy (my current full-time employer) and Acquia.

I'm excited to start working for Acquia, and am also excited to be able to continue doing what I love—working on Drupal, working on Solr, working on Ansible/infrastructure, and working in their respective communities.

Jan 22 2014
Jan 22

If you're a Drupal or PHP developer used to debugging or troubleshooting some code by adding a print $variable; or dpm($object); to your PHP, and then refreshing the page to see the debug message (or using XDebug, or using watchdog logging...), debugging Varnish's VCL language can be intimidating.

VCL uses C-like syntax, and is compiled when varnish starts, so you can't just modify a .vcl file and refresh to see changes or debug something. And there are only a few places where you can simply stick a debug statement. So, I'll explain four different ways I use to debug VCLs in this post (note: don't do this on a production server!):

Simple Error statements (like print in PHP)

Sometimes, all you need to do is see the output of a variable, like req.http.Cookie, inside vcl_recv(). In these cases, you can just add an error statement to throw an error in Varnish and output the contents of a string, like the Cookie:

error 503 req.http.Cookie;

Save the VCL and restart Varnish, and when you try accessing a page, you'll get Varnish's error page, and below the error message, the contents of the cookie, like so:

Varnish error debug message

This debugging process works within vcl_recv() and vcl_fetch(), but causes problems (or crashes Varnish) in any other functions.

Monitoring Varnish with varnishlog

varnishlog is another simple command-line utility that dumps out every bit of information varnish processes and returns during the course of it's processing, including backend pings, request headers, response headers, cache information, hash information, etc.

If you just enter varnishlog and watch (or dump the info into a file), be prepared to scroll for eons or grep like crazy to find the information you're looking for. Luckily, varnishlog also lets you filter the information it prints to screen with a few options, like -m to define a regular expression filter. For example:

# Display all Hashes.
varnishlog -c -i Hash# Display User-Agent strings.
varnishlog -c -i RxHeader -I User-Agent

There are more examples available on the Varnish cache Wiki.

Monitoring Varnish with varnishtop

varnishtop is a simple command-line utility that displays varnish log entries with a rolling count (ranking logged entries by frequency within the past minute). What this means in laymans terms is that you can easily display things like how many times a particular URL is hit, different bits of information about requests (like Hashes, or headers, etc.).

I like to think of varnishtop as a simple way to display the incredibly deep stats from varnishlog in realtime, with better filtering.

Some example commands I've used when debugging scripts:

# Display request cookies.
varnishtop -i RxHeader -I Cookie

# Display varnish hash data ('searchtext' is text to filter within hash).
varnishtop -i "Hash" -I searchtext

# Display 404s.
varnishlog -b -m "RxStatus:404"

You can change the length of time being monitored from the default of 60 seconds by specifying -p period (note that this setting only works for Varnish > 3.0).

There are a few other common monitoring commands in this StackOverflow answer.

Dry-run Compiling a VCL

Sometimes you may simply have a syntax error in your .vcl file. In these cases, you can see exactly what's wrong by using the command varnishd -Cf /path/to/default.vcl, where default.vcl is the base VCL file you've configured for use with Varnish (on CentOS/RHEL systems, this file is usually /etc/varnish/default.vcl).

The output of this command will either be a successfully-compiled VCL, or an error message telling you on exactly what line the error occurs.

Other debugging techniques

Are there any other simple debugging techniques you use that I didn't cover here? Please let me know in the comments. I wanted to compile these techniques, and a few examples, because I've never really seen a good/concise primer on debugging Varnish configuration anywhere—just bits and pieces.

Jan 01 2014
Jan 01

2014 is going to be a big year for Drupal. I spent a lot of 2013 sprucing up services like Hosted Apache Solr and Server Check.in (both running on Drupal 7 currently), and porting some of my Drupal projects to Drupal 8.

So far I've made great progress on Honeypot and Wysiwyg Linebreaks, which I started migrating a while back. Both modules work and pass all tests on Drupal's current dev/alpha release, and I plan on following through with the D8CX pledges I made months ago.

Some of the other modules I maintain, like Gallery Archive, Login as Other, Simple Mail, and themes like MM - Minimalist Theme, are due to be ported sooner rather than later. I'm really excited to start working with Twig in Drupal 8 (finally, a for-real front-end templating engine!), so I'll probably start working on themes in early 2014.

Drupal in 2014

Drupal 8 Logo

2013 was an interesting year for Drupal, with some major growing pains. Drupal 8 is architecturally more complex (yet simpler in some ways) than Drupal 7 (which was more complex than Drupal 6, etc.), and the degree of difference caused some developer angst, even leading to a fork, Backdrop. Backdrop is developing organically under the guidance of Nate Haug, but it remains to be seen what effect it will have on the wider CMS scene, and on Drupal specifically.

One very positive outcome of the fork that some of the major Drupal 8 DX crises (mostly caused by switching gears to an almost entirely-OOP architecture) are being resolved earlier in the development cycle. As with any Drupal release cycle, the constant changes can sometimes frustrate developers (like me!) who decide to start migrating modules well before code/API freeze. But if you've been a Drupal developer long enough, you know that the drop is always moving, and the end result will be much better for it.

Drupal 8 is shaping up to be another major contender in the CMS market, as it includes so many timely and necessary features in core (Views, config management, web services, better blocks, Twig, Wysiwyg, responsive design everywhere, great language support, etc.). I argue it's hard to beat Drupal 8 core, much less core + contrib, with any other solution available right now, for any but the simplest of sites.

One remaining concern I have with Drupal 8 is performance; even though you can cover some performance problems with caching layers, the core, uncached Drupal experience is historically pretty slow, even without a bevy of contrib modules thrown in the mix. Drupal's new foundation (Symfony) will help in some aspects (probably more so in more complicated environments—sometimes Symfony is downright slow), and there are issues open to try to fix some known regressions, but being a performance nut, I like it when I can shave tens to hundreds of ms per request, even on a simple LAMP server!

Unlike Drupal 7's sluggish adoption—it was months before most people considered migrating, mostly because Views, and to a lesser extent, the Migrate module, was not ready for some time after release—I think some larger sites will begin migrating to 8 with the first release candidate (there are already some personal sites and blogs using alpha builds). For example, when I migrate Server Check.in, I can substantially reduce the lines of custom code I maintain, and build a more flexible core, simply because Drupal 8 offers more flexible and reliable solutions in core, most especially with Views and Services.

Drupal 8 is shaping up to be the most exciting Drupal release to date—what are your thoughts as we enter this new year? Oh, and Happy New Year!

Oct 01 2013
Oct 01

For a recent project, I needed to migrate anything inside <script> and <style> tags that were embedded with other content inside the body field of Drupal 6 nodes into separate Code per Node-provided fields for Javascript and CSS. (Code per Node is a handy module that lets content authors easily manage CSS/JS per node/block, and saves the styles and scripts to the filesystem for inclusion when the node is rendered—read more about CPN goodness here).

The key is to get all the styles and scripts into a string (separately), then pass that data into an array in the format:

<?php
$node
->cpn = array(
 
'css' => '<string of CSS without <style> tags goes here>',
 
'js' => '<string of Javascript without <script> tags goes here>',
);
?>

Then you can save your node with node_save(), and the CSS/JS will be stored via Code per Node.

For a migration using the Migrate module, the easiest way to do this (in my opinion) is to implement the prepare() method, and put the JS/CSS into your node's cpn variable through a helper function, like so:

First, put implement the prepare() method in your migration class:

<?php
 
/**
   * Make changes to the entity immediately before it is saved.
   */
 
public function prepare($entity, $row) {
   
// Process the body and move <script> and <style> tags to Code per Node.
   
if (isset($entity->body[$entity->language][0])) {
     
$processed_info = custom_process_body_for_cpn($entity->body[$entity->language][0]['value']);
     
$entity->body[$entity->language][0]['value'] = $processed_info['body'];
     
$entity->cpn = $processed_info['cpn'];
    }
  }
?>

Then, add a helper function like the following in your migrate module's .module file (assuming your migrate module is named 'custom'):

<?php
/**
* Break out style and script tags in body content into a Code per Node array.
*
* This function uses regular expressions to grab the content inside <script>
* and <style> tags inside the given body HTML, then put them into separate keys
* in an array that can be set as $node->cpn for a node before saving, which
* will store the scripts and styles in the appropriate fields for the Code per
* Node module.
*
* Why regex? I originally tried using PHP's DOMDocument to process the HTML,
* but besides being overly verbose with error messages on all but the most
* pristine markup, DOMDocument processed tags poorly; if there were multiple
* script tags, or cases where script tags were inside certain other tags, only
* one or two of the matches would work. Yuck.
*
* Regex is evil, but in this case necessary.
*
* @param string $body
*   HTML string that could potentially contain script and style tags.
*
* @return array
*   Array with the following elements:
*     cpn: array with 'js' and 'css' keys containing corresponding strings.
*     body: same as the body passed in, but without any script or style tags.
*/
function custom_process_body_for_cpn($body) {
 
$cpn = array('js' => '', 'css' => '');  // Search for script and style tags.
 
$tags = array(
   
'script' => 'js',
   
'style' => 'css',
  );
  foreach (
$tags as $tag => $type) {
   
// Use a regular expression to match the tag and grab the text inside.
   
preg_match_all("/<$tag.*?>
(.*?)<\/$tag>/is", $body, $matches, PREG_SET_ORDER);
    if (!empty($matches)) {
      foreach ($matches as $match_set) {
        // Remove the first item in the set (it still has the matched tags).
        unset($match_set[0]);
        // Loop through the matches.
        foreach ($match_set as $match) {
          $match = trim($match);
          // Some tags, like script tags for embedded videos, are empty, and
          // shouldn't be removed, so check to make sure there's a value.
          if (!empty($match)) {
            // Remove the text from the body.
            $body = preg_replace("/<$tag.*?>(.*?)<\/$tag>/is", '', $body);
            // Add the tag contents to the cpn array.
            $cpn[$type] .= $match . "\r\n\r\n";
          }
        }
      }
    }
  }  // Return the updated body and CPN array.
  return array(
    'cpn' => $cpn,
    'body' => $body,
  );
}
?>

If you were using another solution like the CSS module in Drupal 6, and need to migrate to Code per Node, your processing will be a little different, and you might need to do some work in your migration class' prepareRow() method instead. The main thing is to get the CSS/Javascript into the $node->cpn array, then save the node. The Code per Node module will do the rest.

Sep 30 2013
Sep 30

There are some simple Drupal modules that help with login redirection (especially Login Destination), but I often need more advanced conditions applied to redirects, so I like being able to do the redirect inside a custom module. You can also do something similar with Rules, but if the site you're working on doesn't have Rules enabled, all you need to do is:

  1. Implement hook_user_login().
  2. Override $_GET['destination'].

The following example shows how to redirect a user logging in from the 'example' page to the home page (Drupal uses <front> to signify the home page):

<?php
/**
* Implements hook_user_login().
*/
function mymodule_user_login(&$edit, $account) {
 
$current_path = drupal_get_path_alias($_GET['q']);  // If the user is logging in from the 'example' page, redirect to front.
 
if ($current_path == 'example') {
   
$_GET['destination'] = '<front>';
  }
}
?>

Editing $edit['redirect'] or using drupal_goto() inside hook_user_login() doesn't seem to do anything, and setting a Location header using PHP is not best practice. Drupal uses the destination parameter to do custom redirects, so setting it anywhere during the login process will work correctly with Drupal's built in redirection mechanisms.

Sep 25 2013
Sep 25

CI: Deplyments and Code Quality

tl;dr: Get the Vagrant profile for Drupal/PHP Continuous Integration Server from GitHub, and create a new VM (see the README on the GitHub project page). You now have a full-fledged Jenkins/Phing/SonarQube server for PHP/Drupal CI.

In this post, I'm going to explain how Jenkins, Phing and SonarQube can help you with your Drupal (or hany PHP-based project) deployments and code quality, and walk you through installing and configuring them to work with your codebase. Bear with me... it's a long post!

Code Deployment

If you manage more than one environment (say, a development server, a testing/staging server, and a live production server), you've probably had to deal with the frustration of deploying changes to your code to these servers.

In the old days, people used FTP and manually copied files from environment to environment. Then FTP clients became smarter, and allowed somewhat-intelligent file synchronization. Then, when version control software became the norm, people would use CVS, SVN, or more recently Git, to push or check out code to different servers.

All the aforementioned deployment methods involved a lot of manual labor, usually involving an FTP client or an SSH session. Modern server management tools like Ansible can help when there are more complicated environments, but wouldn't everything be much simpler if there were an easy way to deploy code to specific environments, especially if these deployments could be automated to either run on a schedule or whenever someone commits something to a particular branch?

Jenkins Logo

Enter Jenkins. Jenkins is your deployment assistant on steroids. Jenkins works with a wide variety of tools, programming languages and systems, and allows the automation (or radical simplification) of tasks surrounding code changes and deployments.

In my particular case, I use a dedicated Jenkins server to monitor a specific repository, and when there are commits to a development branch, Jenkins checks out that branch from Git, runs some PHP code analysis tools on the codebase using Phing, archives the code and other assets in a .tar.gz file, then deploys the code to a development server and runs some drush commands to complete the deployment.

Static Code Analysis / Code Review

If you're a solo developer, and you're the only one planning on ever touching the code you write, you can use whatever coding standards you want—spacing, variable naming, file structure, class layout, etc. don't really matter.

But if you ever plan on sharing your code with others (as a contributed theme or module), or if you need to work on a shared codebase, or if there's ever a possibility you will pass on your code to a new developer, it's a good idea to follow coding standards and write good code that doesn't contain too many WTFs/min.

SonarQube Logo

The easiest way to do this is to use static code analysis tools like PHP Mess Detector, PHP CodeSniffer (with Drupal's Coder module), and the PHP Copy/Paste Detector.

It's great to be able to use any of these tools individually, but let's face it—unless they're set up to run and give you reports in some automated fashion, there's little chance you're going to take time out of your busy development schedule to run these helpful code review tools, especially if the boring plain text reports they generate are long.

Jenkins and Phing together will do the heavy lifting of grabbing code from your repository and running it through all these analysis tools (as well as PHPUnit for automated unit testing, or SimpleTest). But we're going to take this to the next level; instead of just leaving you with a long text file to decipher, we're going to use another awesome tool, SonarQube, to generate (automatically) graphs, charts, and custom dashboards showing statistics like lines of code and commented lines of code over time, method/function complexity, coding standards violations, etc.

SonarQube helps highlight areas of your codebase where you can get the most ROI for your cleanup efforts; it's easy to spot that one module or template where a bunch of quickly-written messy code might be lurking, waiting to destroy a week of development time because it's lacking documentation, poorly-written, or an incredibly complicated mess!

Putting It All Together

Vagrant Logo          VirtualBox Logo

Now, into the nitty gritty. We're going to build ourselves a virtual machine that has everything configured to do all the things I mentioned above, and will be flexible enough to allow us to add more code quality analysis (like Drupal SimpleTest integration) and deployment options over time.

We'll build this VM using Vagrant with VirtualBox, which means the VM can be built and rebuilt on any Mac, Windows, or Linux PC. The configuration can also be split up to create separate servers for all the different components—a Jenkins server with Phing and SonarQube Runner to do the deployments and code analysis, and a SonarQube server for the pretty graphs and overview of our code quality.

The complete VM is available on GitHub (Vagrant profile for Drupal/PHP Continuous Integration Server), but I'll go through the configuration step by step. This guide assumes you're running CentOS or some other RHEL-flavored Linux, but it should be easily adaptable to other environments that use apt or another package manager instead of yum. Additionally, I am working on rebuilding this Vagrant profile using Ansible instead of shell scripts, but for now, shell scripts will have to do :-)

Installing Jenkins

Note: I will be using the hostname 'jenkins-sandbox' for this server, and Jenkins will run on port 8080.

To install Jenkins, you need to be running Java (in my situation, 1.6.0 is the latest version offered by the standard CentOS repos):

yum install --quiet -y java-1.6.0-openjdk

After Java is installed and configured (check the version with java -version), install Jenkins from the Jenkins RPM:

wget --quiet -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
rpm --quiet --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key
yum install --quiet -y jenkins

Configure Jenkins to start automatically after system boot:

service jenkins start
chkconfig jenkins on

Force Jenkins to update it's plugin directory (you can do this by visiting Jenkins' update center in your browser, but we'll do it via CLI since it's faster and can be part of the automated server build):

curl -s -L http://updates.jenkins-ci.org/update-center.json | sed '1d;$d' | curl -s -X POST -H 'Accept: application/json' -d @- http://jenkins-sandbox:8080/updateCenter/byId/default/postBack

Install Jenkins' CLI tool so you can run later commands via the command line instead of having to click through the interface:

wget --quiet http://jenkins-sandbox:8080/jnlpJars/jenkins-cli.jar

Install the Jenkins phing and sonar plugins:

java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ install-plugin phing
java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ install-plugin sonar

You can import and export jobs in Jenkins using XML files if you have the Jenkins CLI installed, using the following syntax:

java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ get-job MyJenkinsJob > /path/to/exported/MyJenkinsJob.xml
java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ create-job MyJenkinsJob < /path/to/exported/MyJenkinsJob.xml

Restart Jenkins so everything works correctly:

service jenkins restart

Now that Jenkins is installed, you could visit http://jenkins-sandbox:8080/ in your browser and start playing around in the UI... but we're going to keep moving along, getting the rest of our PHP CI system up and running.

Installing PHP and Phing

Since we're going to be building PHP projects in Jenkins, and using a variety of PHP code analysis tools to inspect and test our code, we need to install PHP, PEAR, Phing, and some other plugins.

First, let's install PHP, PEAR, and some other basic dependencies:

yum install --quiet -y php php-devel php-xml php-pear ImageMagick
pear channel-update pear.php.net
pear config-set auto_discover 1

Then, install PHPUnit if you'd like to run unit tests against your code:

pear channel-discover pear.phpunit.de
pear channel-discover pear.symfony.com
pear install phpunit/PHPUnit

Install the PHP Copy/Paste Detector (this will check for duplicate code that could be merged to reduce the lines of code you need to maintain):

pear install pear.phpunit.de/phpcpd

Install the PHP Mess Detector (this will check for poor code quality, overly-complicated code, and code that will introduce lots of technical debt).

pear channel-discover pear.phpmd.org
pear channel-discover pear.pdepend.org
pear install phpmd/PHP_PMD

Install PHP CodeSniffer (this will 'sniff' your code for bad formatting and coding standards violations):

pear install PHP_CodeSniffer

Install XDebug (useful for debugging PHP code, and used by some other tools):

pecl install xdebug

Install Phing (which will be used to coordinate the running of all the other tools we just installed against your code):

pear channel-discover pear.phing.info
pear install phing/phing

Download the Drupal Coder module, copy the Drupal Coding Standards out of the coder_sniffer submodule into the PHP CodeSniffer's standards diretory, then delete the downloaded module:

wget --quiet http://ftp.drupal.org/files/projects/coder-7.x-2.x-dev.tar.gz
tar -zxvf coder-7.x-2.x-dev.tar.gz
mv coder/coder_sniffer/Drupal $(pear config-get php_dir)/PHP/CodeSniffer/Standards/Drupal
rm coder-7.x-2.x-dev.tar.gz
rm -rf coder

At this point, you should have a working PHP installation that has all (or at least most) of the tools you need to find potential issues with your code, and deploy your code using Jenkins and Phing.

Installing MySQL

SonarQube requires a database to function, and it's pretty simple to get MySQL set up and configured to be able to handle anything SonarQube can throw at it. Let's install mysql and mysql server:

yum install --quiet -y mysql-server mysql

Start MySQL and set it to start up at system boot.

service mysqld start
chkconfig mysqld on

You could run the MySQL setup assistant at this point, but we'll just run a few scriptable commands to do the same things as the mysql_secure_installation script would do (configure the root password (we'll use 'root' for simplicity's sake), delete the anonymous user, and delete the test database):

/usr/bin/mysqladmin -u root password root
/usr/bin/mysqladmin -u root -h jenkins-sandbox password root
echo "DELETE FROM mysql.user WHERE User='';" | mysql -u root -proot
echo "FLUSH PRIVILEGES;" | mysql -u root -proot
echo "DROP DATABASE test;" | mysql -u root -proot

Now we just need to create a database and user for SonarQube:

echo "CREATE DATABASE sonar CHARACTER SET utf8 COLLATE utf8_general_ci;" | mysql -u root -proot
echo "CREATE USER 'sonar' IDENTIFIED BY 'sonar';" | mysql -u root -proot
echo "GRANT ALL ON sonar.* TO 'sonar'@'%' IDENTIFIED BY 'sonar';" | mysql -u root -proot
echo "GRANT ALL ON sonar.* TO 'sonar'@'localhost' IDENTIFIED BY 'sonar';" | mysql -u root -proot
echo "FLUSH PRIVILEGES;" | mysql -u root -proot

MySQL is ready to go!

Installing SonarQube Server

SonarQube is a very nice code analysis and code review visualization and tracking tool, and it needs to be installed on a server with Java (which we already have set up for Jenkins) and a database (which we just set up above). First, we'll install Sonar:

wget --quiet http://dist.sonar.codehaus.org/sonar-3.7.1.zip
unzip -q sonar-3.7.1.zip
rm -f sonar-3.7.1.zip
mv sonar-3.7.1 /usr/local/sonar

Next, edit the sonar.properties file so Sonar knows how to connect to the MySQL database we created earlier (the file is located at /usr/local/sonar/conf/sonar.properties). Edit the following configuration options to match:

sonar.jdbc.username: sonar
sonar.jdbc.password: sonar
sonar.jdbc.url: jdbc:mysql://localhost:3306/sonar?useUnicode=true&amp;characterEncoding=utf8&amp;rewriteBatchedStatements=true

Install the PHP plugin for Sonar, so our PHP projects can be analyzed without an ugly error message (you can also install the plugin through Sonar's interface, but this method is faster and easy to include in a script):

wget --quiet http://repository.codehaus.org/org/codehaus/sonar-plugins/php/sonar-php-plugin/1.2/sonar-php-plugin-1.2.jar
mv sonar-php-plugin-1.2.jar /usr/local/sonar/extensions/plugins/

To make sonar easier to manage from the command line, we'll add an init script so you can start/restart/stop it with service and use chkconfig. Create a file at etc/init.d/sonar with the following contents:

#!/bin/sh
#
# rc file for SonarQube
#
# chkconfig: 345 96 10
# description: SonarQube system (www.sonarsource.org)
#
### BEGIN INIT INFO
# Provides: sonar
# Required-Start: $network
# Required-Stop: $network
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Short-Description: SonarQube system (www.sonarsource.org)
# Description: SonarQube system (www.sonarsource.org)
### END INIT INFO
/usr/bin/sonar $*

Next, we'll symlink the appropriate sonar executable into /usr/bin, set the correct permissions, and enable sonar at system boot:

ln -s /usr/local/sonar/bin/linux-x86-64/sonar.sh /usr/bin/sonar
chmod 755 /etc/init.d/sonar
chkconfig --add sonar

Finally, we're ready to start up sonar for the first time:

service sonar start

It will probably take 45 seconds to a minute to start up the first time, as Sonar will generate it's database and configure itself. Once it's started, you can access Sonar at http://jenkins-sandbox:9000/.

Installing SonarQube Runner

There are two parts to SonarQube: the server itself, and the Runner, which is helpful if you're using a language that doesn't need to be compiled, but needs to have code analysis generated and dumped into an active SonarQube installation (basically anything that doesn't use Maven). Here we'll install and configure SonarQube Runner, so we can push the code analysis done on our Drupal site into our SonarQube server.

First, we need to download sonar-runner and place it in the appropriate directory (note: this guide was written for version 2.3... in the future, you may need to update the version number/URL):

wget --quiet http://repo1.maven.org/maven2/org/codehaus/sonar/runner/sonar-runner-dist/2.3/sonar-runner-dist-2.3.zip
unzip -q sonar-runner-dist-2.3.zip
rm -f sonar-runner-dist-2.3.zip
mv sonar-runner-2.3 /usr/local/sonar-runner

Now, configure your sonar-runner instance to point to the SonarQube server we set up earlier by editing the sonar-runner.properties file (located at /usr/local/sonar-runner/conf/sonar-runner.properties). The file should contain something the following (at least):

# SonarQube Host URL.
sonar.host.url=http://jenkins-sandbox:9000
# MySQL connection.
sonar.jdbc.url=jdbc:mysql://localhost:3306/sonar?useUnicode=true&amp;amp;characterEncoding=utf8
# MySQL credentials.
sonar.jdbc.username=sonar
sonar.jdbc.password=sonar

Finally, to allow sonar-runner to work correctly (so you can just cd to a directory containing a sonar properties file for a project and enter sonar-runner to analyze the code), you need to set the environment variable SONAR_RUNNER_HOME and add the sonar-runner bin directory to your PATH. The simplest way to do this is to add the file /etc/profile.d/sonar.sh with the following inside:

# Sonar settings for terminal sessions.
export SONAR_RUNNER_HOME=/usr/local/sonar-runner
export PATH=$PATH:/usr/local/sonar-runner/bin

You can also have Jenkins install SonarQube Runner via the UI, but that spoils the fun of using the shell commands, and also isn't able to be wrapped up in a Vagrant profile :-).

Let's Do This Thing!

Okay, now that we've completed the marathon of installation and configuration (or just finished a cup of coffee if you used the Vagrant profile and vagrant up), it's time to jump into Jenkins, run a deployment, and see our results in Jenkins and SonarQube!

Jenkins Dashboard
The Jenkins Dashboard

Fire up your web browser and visit http://jenkins-sandbox:8080/ to get to the Jenkins dashboard. We'll create a new project to test everything out:

  1. Click on 'New Job'.
  2. Put in a Job name (like 'Drupal 7') and choose 'Build a free-style software project', then click OK.
  3. Under Source Code Management, choose 'Git' and enter the following:
    • Branch Specifier: 7.x
    • (In 'Advanced...') Local subdirectory for repo: drupal
  4. Under Build, click 'Add build step' and choose 'Invoke Phing targets', then enter the following:
    • Targets: build
    • Phing Build File: /vagrant/config/jenkins/drupal-7-example/build.xml
    • Properties: project.builddir=${WORKSPACE}
  5. Under Build, click 'Add build step' and choose 'Invoke Standalone Sonar Analysis', then enter the following:
    • Path to project properties: /vagrant/config/jenkins/drupal-7-example/sonar-project.properties
  6. Click Save at the bottom of the page.

(Note that the build.xml and sonar-project.properties files are in the location they would be if you use the Vagrant profile linked at the top of this post—if you're building the server manually, update the paths to your Phing and Sonar properties files accordingly).

If everything is configured correctly, you can now click 'Build Now', and prepare to be dazzled! After a few minutes (depending on the speed of your connection), Jenkins will clone the Drupal git repository, run some analysis on the code through Phing, archive the codebase, and send the analysis results off to SonarQube.

Once everything is complete (and, hopefully, you get a happy blue ball indicating build success!), you can click the Sonar link from the Project's main page to view the SonarQube analysis.

Conclusion

You now have a Continuous Integration server set up that will enable more automated deployments and make code review a more visual and simple process. Plus, as you improve your codebase, you'll be able to see pretty SonarQube graphs and charts showing you how much the code has improved!

Phing and Jenkins offer many more features—I've barely scratched the surface! Go forward and explore the many things you can now do, like automatically generate API documentation for your custom code or email developers directly when their commits break tests.

And, for Heaven's sake, instead of following the 100+ manual steps above to configure a Continuous Integration server, use the Vagrant profile for Drupal/PHP Continuous Integration Server, and let Vagrant + VirtualBox do the heavy lifting of configuring your server!

Security caveat: The steps above and the Vagrant profile are meant for local testing only—if you build a production/web-accessible CI server, make sure to lock down access with better passwords, authentication, and firewall rules.

Sep 16 2013
Sep 16

Midwestern Mac has been offering Apache Solr hosting for Drupal websites for the past three years, but this service has never been given too much attention or made easy to sign up for and use—until now!

Today we're announcing the re-launch of our service with a new website: Hosted Apache Solr.

Hosted Apache Solr home page - Drupal 7

The website was built on Drupal 7, and uses a custom base theme shared with Server Check.in (our server monitoring service built with Drupal and Node.js). We built a small payment integration module for PayPal subscriptions (though we're considering using Drupal Commerce, so we can use different payment processors more easily), and have built a very simple to use front-end for managing Solr core subscriptions.

If you don't know much about what Apache Solr can do for your site's search and listings, here's a one-sentence summary: Solr enables highly customizable and speedy content indexing, faceted and advanced search filtering abilities, and raw speed—indexing and searching are many times faster than database-backed search (like Drupal's default search or basic Views filtering).

There are a few other companies that offer hosted instances of Apache Solr, notably Acquia, but most other solutions require more expensive contracts or are not tailored specifically towards Drupal sites. We hope you like our offering, and would love to hear your feedback—what can we do to help you choose Hosted Apache Solr as your Drupal search solution?

Check out Hosted Apache Solr, and sign up to improve your search experience!

Midwestern Mac will be soon be posting more stories about Hosted Apache Solr, Apache Solr itself, and Drupal/Solr integrations, so please consider subscribing to our RSS feed to stay informed!

Apache Solr is a trademark of the Apache Software Foundation. Drupal is a registered trademark of Dries Buytaert.

Aug 30 2013
Aug 30

It seems most developers I know have a story of running some sort of batch operation on a local Drupal site that triggers hundreds (or thousands!) of emails that are sent to the site's users, causing much frustration and ill will towards the site the developer is working on. One time, I accidentally re-sent over 9,000 private message emails during a test user migration because of an email being sent via a hook that was invoked during each message save. Filling a user's inbox is not a great way to make that user happy!

With Drupal, it's relatively easy to make sure emails are either rerouted or directed to temp files from local development environments (and any other environment where actual emails shouldn't be sent to end users). Drupal.org has a very thorough page, Managing Mail Handling for Development or Testing, which outlines many different ways you can handle email in non-production environments.

However, for most cases, I like to simply redirect all site emails to my own address, or route them to a figurative black hole.

Rerouting Emails to an Arbitrary Email Address

There's a simple module, Reroute Email, which allows you to have all emails sent through Drupal to be rerouted to a configured email address. This module is simple enough, but for even more simplicity, if you have a custom module, you can just invoke hook_mail_alter() to force all messages to a given email address. Example (assuming your module's name is 'custom' and you want to send emails to the configured 'site_mail' address):

<?php
/**
* Implements hook_mail_alter().
*/
function custom_mail_alter(&$message) {
 
// Re-route emails to admin when override_email variable is set.
 
if (variable_get('override_email', 0)) {
   
$message['to'] = variable_get('site_mail');
  }
}
?>

Now you can just add $conf['override_email'] = 1; to settings.php for any environment where you want all emails to be sent to the 'site_mail' configured email address. Pretty simple!

Directing emails to text files in /tmp

Another simple option, if you still don't want emails to be sent to the end user, but still want to see them in some form (in this case, a text file), is to enable the Devel module, then set your site's mail system to 'DevelMailLog' (like the following inside settings.php):

<?php
$conf
['mail_system'] = array('default-system' => 'DevelMailLog');
?>

Devel will now re-route all emails to .txt files inside your server's /tmp folder.

Aug 21 2013
Aug 21

There are many times when a custom module provides functionality that requires a tweaked or radically altered template file, either for a node, a field, a view, or something else.

While it's often a better idea to use a preprocess or alter function to accomplish what you're doing, there are many times where you need to change the markup/structure of the HTML, and modifying a template directly is the only way to do it. In these cases, if you're writing a generic custom module that needs to be shared among different sites with different themes, you can't just throw the modified template into each theme, because you'd have to make sure each of the sites' themes has the same file, and updating it would be a tough proposition.

I like to keep module-based functionality inside modules themselves, so I put all templates that do specific things relating to that module into a 'templates' subdirectory.

In my example, I'd like to override field-collection-item.tpl.php, which is included with the Field collection module. To do so, I copy the default template into my custom module's 'templates' folder, and modify it how I like. Then I implement hook_theme_registry_alter() to tell Drupal where my template exists:

<?php
/**
* Implements hook_theme_registry_alter().
*/
function custom_theme_registry_alter(&$theme_registry) {
 
// Override the default field-collection-item.tpl.php with our own.
 
if (isset($theme_registry['field_collection_item'])) {
   
$module_path = drupal_get_path('module', 'custom');
   
$theme_registry['field_collection_item']['theme path'] = $module_path;
   
$theme_registry['field_collection_item']['template'] = $module_path . '/templates/field-collection-item';
  }
}
?>

This presumes my module's machine name is 'custom'. Make sure you clear all caches after adding this hook, so Drupal will pick up the hook and your new template!

Note that there are sometimes other/better ways of overriding templates in your module—for example, the views module lets you set a template directory path in hook_views_api(), and will automatically pick up templates from your module. And note again that preprocess and alter hooks are often a better way to go to accomplish small tweaks to content and markup for nodes, fields, views, etc.

Aug 17 2013
Aug 17

I have been at the Midwest Drupal Summit for the past few days, focusing on #D8CX and reducing Drupal 8's technical debt (at least, a tiny bit of it!).

Wysiwyg Linebreaks

My main goal at the conference was to port the Wysiwyg Linebreaks module to Drupal 8. I originally built the module for Drupal 6 while helping the Archdiocese of St. Louis migrate almost 50 separate Joomla-based websites into one organic-groups-driven Drupal site. Their legacy content used linebreaks (rather than markup like <p> and <br /> tags) for paragraphs of text, and when we originally enabled Wysiwyg with TinyMCE, the editor ran all the text together in one big paragraph. The Wysiwyg Linebreaks module fixes that problem by running some JavaScript that adds the required tags when an editor is attached to a textarea, and (optionally) removes the tags when the editor is detached.

The Drupal 6 and Drupal 7 versions of the module depended on the Wysiwyg module, and worked with many different editors—however, the way the linebreaks plugin was added was slightly clunky, and required a little bit of a hack to work well (see Let cross-editor plugins be button-less (aka 'extensions')).

For Drupal 8, the module simply defines an editor plugin without a button (no hacks!), and integrates with CKEditor's API (See change notice: CKEditor module added: WYSIWYG in core!).

This is the second contrib module I've ported (the first being Honeypot), and the process is relatively straightforward. The nicest thing about Drupal 8's refined architecture is that, for modules like Wysiwyg Linebreaks, you don't need to have much, if any, procedural code inside .module and .inc files. For Wysiwyg Linebreaks, there's just the JavaScript plugin code inside /js/linebreaks/linebreaks.js, and a CKEditor plugin definition inside /lib/Drupal/wysiwyg_linebreaks/Plugin/CKEditorPlugin/Linebreaks.php. Very clean!

To anyone else working on a CKEditor plugin or integration with the new Drupal 8 Editor module: The API for dealing with editors, or with CKEditor in particular, is very simple but powerful—see the 'API' section on this change notice for the Editor module, and the 'Provide additional CKEditor plugins' section on this change notice for CKEditor.

One more note: I was made aware of the issue How do we want to facilitate enabling of CKEditor for sites upgraded from Drupal 7 to Drupal 8? just after I finished committing the last fixes for the D8 version of Wysiwyg Linebreaks. This module solves the problem of legacy content that uses the autop filter ("Convert line breaks into HTML (i.e. <br> and <p>)") quite nicely—enable it, and content will look/function as it always has, with or without CKEditor enabled.

MWDS at Palantir's HQ

Bacon Donuts
Bacon Donuts at #MWDS – Palantir, you know us too well...

This was the first year I've attended the Midwest Drupal Summit at Palantir's HQ in Chicago, IL, and it was a great experience! Besides working on porting Wysiwyg Linebreaks and cleaning up Honeypot to work with Drupal 8 head, I worked on:

I was also able to meet and talk to some really awesome Drupal developers—many from Chicago and the surrounding areas, but also a bunch of people who I've met at past DrupalCons and was happy to say hello to again. Palantir provided a great atmosphere, and some amazing food (bacon donuts, good pizza, tasty sandwiches, schwarma, etc.), and even some fun games (though I was unable to stay long enough to enjoy them during the summit).

I learned a lot about Drupal 8's architecture—plugins, controllers and routes especially—and I'm excited about the things this new architecture will afford when building and migrating Drupal modules and sites (like easier/faster testing and more code portability!). While there have been legitimate gripes about the release timeline and API changes for Drupal 8, developers have a tendency to focus too much on what's missing and broken (negatives) during the current core development phase (remember D7's release cycle?), and not on the more positive meta-level view—Drupal 8 has a vastly-improved admin UI, responsive design throughout, first-class HTML5 support, a great template system, a very flexible plugin system, more sane APIs for dealing with entities and fields, etc.

We made good progress in moving Drupal 8 forward during the summit, but there's still a ways to go... And you can help! See: Technical debt in Drupal 8 (or, when will it be ready?) and help push out the first beta release!

Aug 13 2013
Aug 13

The Drupal Way

I've worked with a wide variety of developers, designers, content managers, and the other Drupal users in the past few years, and I'm pretty sure I have a handle on most of the reasons people think Drupal is a horrible platform. But before I get to that, I have to set up the rest of this post with the following quote:

There are not a hundred people in America who hate the Catholic Church. There are millions of people who hate what they wrongly believe to be the Catholic Church — which is, of course, quite a different thing.

Forgive me for diverging slightly into my faith, but this quote is from the late Fulton J. Sheen, and is in reference to the fact that so many people pour hatred on the Catholic Church not because of what the Church actually teaches, but because of what they think the Catholic Church teaches. Once someone comes to understand the actual teaching, they are free to agree or disagree with it—but there are comparatively few people who disagree with teachings they actually understand.

Similarly, the problems most people have with Drupal—and with systems like it—are problems not with Drupal, but with their perception of Drupal.

Java Jane: One-off vs. Flexible Design

A Java developer (let's call her Jane) is used to creating a bunch of base object classes and a schema for a database by hand, then deploying an application and managing the database through her own wrapper code. Jane is assigned to a Drupal project, takes one look at the database, and decides that no sane person would ever design a schema with hundreds of tables named field_data_* and field_revision_* for every single data point in the application!

Why does Drupal have So Many Database Tables?

In reality, Drupal is doing this because The Drupal Way dictates that things like field data should be: flexible (able to be used by different kinds of entities (content)), able to be translated, able to be revised with a trackable history, and able to be stored in different storage backends (e.g. MySQL, MariaDB, MongoDB, SQLite, etc.). If the fields were all stored in a per-entity table as separate columns, these different traits would be much more difficult to implement.

Thus, The Drupal Way is actually quite beneficial—if you want a flexible content management system.

I think a lot of developers hate Drupal because they know they could build a more efficient web application that only has the minimal required features they need by simply writing everything from scratch (or using a barebones framework). But what about the next 72 times you have to build the exact same thing, except slightly different each time, with a feature that's different here, translation abilities there, integration with Active Directory for user login here, integration with a dozen APIs there, etc.?

There's a maxim that goes something like: Every seasoned web developer started with plain HTML and CSS, or some hosted platform, then discovered a dynamic scripting language and built his own CMS-like system. Then, after building the CMS into a small system like many others but hopelessly insecure and unmaintainable, the developer realized that thousands of other people went through the same progression and ultimately worked together on systems like Drupal. Then said developer starts using Drupal, and the rest is history.

I know you could build a small system that beats the pants off Drupal performance-wise, and handles the three features you need done now. But why spend hours on a login form (that probably has security holes), session handling (ditto), password storage (ditto) forms in general (ditto), content CRUD interfaces, a translation system, a theme layer, etc., when you can have that out of the box, and just spend a little time making it look and behave like you want it? The shoulders of giants and all that...

.Net Neil: Letting Contrib/Bespoke Code Let You Down

A .Net developer (lets call him Neil) joins a Drupal project team after having worked on a small custom .Net application for a few years. Not only does he not know PHP (so he's learning by seeing the code already in use), he is also used to a tightly-controlled application code structure, which he knows and owns end-to-end.

After taking a peek inside the custom theme, and a couple of the Drupal modules that the team has built in the past year, .Net Neil feels like he needs to take a shower! He sees raw SQL strings mixed in with user-provided data, he sees hundreds of lines of business logic in two dozen theme template files, and he can't find a line of documentation anywhere!

Why don't you use PDO for Database queries?

Who would blame Neil for washing his hands of Drupal entirely?

However, Neil shouldn't throw out the baby with the bathwater. Unfortunately, due to PHP's (and, by extension, Drupal's) popularity, many non-programmers or junior level programmers work on Drupal sites, and know just enough PHP to be incredibly dangerous.

Now, it doesn't help that Drupal allows PHP inside template files—something that will be corrected in Drupal 8—and it doesn't help that PHP is a quirky language full of inconsistencies and security holes—something that's vastly improved in PHP 5.3+ (especially 5.4+). But while some decide that PHP is a fractal of bad design, or that they simply hate PHP (mostly because of code they've seen that's from either designers or new programmers with a lot to learn... or they have a lot of baggage from pre-PHP 5 days), I think it's best to understand that bad code is bad code regardless of the language. Using Ruby, Django, Go, Node.js, etc. does not automatically make you a better programmer. Just like writing in French doesn't make you a great author. Its just a different language that's useful for different purposes.

One more note here: in all the Drupal code I've seen, there are three levels of quality:

  • Code in Drupal Core: Drupal core is extremely well-documented, has low cyclomatic complexity, has almost full automated test coverage, and has a very high bar for code acceptance. Drupal core is not only a great example of good code in PHP-land, but across languages—especially the latest version (which is on the tail end of some pretty major refactoring).
  • Code in Contrib Modules: Contributed modules can be pretty hit-or-miss. Even with a more rigorous review process in place, many contrib modules have hacked-together code that has some subtle and not-so-subtle security and performance flaws. However, the modules used by a vast array of Drupal installations, and included with popular Distributions (like Views, Panels, Colorbox, etc.) are usually very well constructed and adhere to the Drupal coding standards. (Another good way of knowing a module is good: if Drupal.org uses it).
  • Custom code: Welcome to the wild west. I've seen some of the craziest code in custom templates, hacked installations of Drupal, hacked contrib modules, and strange custom modules that I'm amazed even compile.

When people say Drupal has a terrible security track record, they often point to lists of all Drupal-related security flaws (like this one). Unfortunately for this argument, it holds little water; a quick scan usually finds that well over half the affected modules are used by a very small share of Drupal sites, and a flaw that affects Drupal core is very rare indeed (see how rare on Drupal's security track record page).

The Drupal Way™

Jane and Neil would both come to appreciate Drupal much better if they understood why Drupal does certain things in certain ways. They would also likely appreciate the strictness and thoroughness of Drupal's Coding Standards and security guidelines, and the fact that patches for consideration in Drupal core undergo strict reviews and must pass a full suite of automated tests.

They'd probably also learn to accept some of Drupal's quirks once they realize that the people who built and are making Drupal better range from a mother-of-five-turned-hobbyist-programmer to the world's largest government organizations. Drupal can't be everything to everyone—but it's one of the most flexible web content management systems available.

I'm going to go through some of the main areas where I've seen people get derailed in their understanding of Drupal.

A lot of first-time Drupal users decide they need twenty or thirty modules to add things like share buttons, fancy blogging features, forum tweaks, etc. Eventually, many fresh Drupal sites end up with over 100 enabled modules (of varying quality), and the site takes seconds to load a single page.

This problem is the open buffet syndrome, outlined in detail here. In addition to adding way too much functionality to a site (usually making the site harder to use anyways), adding a ton of extraneous modules makes it harder to track down problems when they occur, and usually makes for slower performance and a very large memory footprint on a server.

How do you combat the open buffet? Be frugal with modules. Only enable modules you really need to help your site run. Instead of adding a module for something, create a new View for a blog page or for a special block that lists a certain type of content. For larger and more customized sites, having a custom module that performs one or two small hook_alters to change a couple things is better than enabling a beefy module that does what you need and a thousand more things besides.

Don't be a module glutton!

One more tip: Whenever you consider using a contributed module, check out its ranking on the Project usage overview page, and check how many sites are currently using the module (under the 'Project Information' heading on the project home page). If the module is only used by a few hundred sites, that could be a sign that it's not going to be updated in a timely fashion, or thoroughly vetted for performance and security issues. I'd always recommend stepping through a module's code yourself if it's not a very popular module—if it's a tangled mess of spaghetti, steer clear, or submit patches to clean it up!

Configuration and Code

Drupal's philosophy when it comes to configuration and settings is that everything, or nearly everything, should be manageable through a user interface. Many developers who work outside of web applications are used to storing a lot of configuration in code, and don't see much value to making sure everything can be configured by administrators on-the-fly. In fact, many developers scoff at such an idea, since they lose some control over the final application/site.

However, this is one of the traits of Drupal that makes it so powerful, and so beloved by site builders and people who actually use the sites developers build for them.

This presents a problem, though—if things are configurable by end-users, how do we version-control settings? How do we deal with different environments, like moving a feature from a development server to a test server, then to the live server? With Drupal <6, this was very challenging indeed, and usually required a lot of manual SQL work in update hooks. However, in Drupal 6 and 7, the situation has improved quite a bit, and in Drupal 8 and beyond, configuration management will likely be a standout feature (see: Configuration management architecture).

The Features module lets developers take things like content types, image styles, site settings, and even content itself (with the help of something like Universally Unique IDentifier), and export them to code. Then, that code can be version controlled and deployed to different environments with some simple drush commands or the click of a button in the UI. As long as the modules you're using use normal Drupal variables, or use CTools Exportables (most of the top modules do), you can use Features to keep things in sync.

Another thing that irks non-Drupal developers (especially those used to 'cowboy coding'—not using any kind of framework or system when they build sites) is the fact that the database is abstracted away. In Drupal, it should be fairly rare that a developer needs to write database queries. Almost everything within Drupal is wrapped in an API, allowing Drupal to work across a variety of platforms and backends. Instead of writing variables to the {variables} database table (and dealing with serialization and unserialization), you use variable_get() and variable_set()—these functions even take care of static caching for performance, and configuration included via settings.php. Instead of querying twenty different tables to find a list of entities that match your conditions, you use EntityFieldQuery. It may seem inefficient at first, but it's actually quite freeing—if you do things The Drupal Way, you'll spend less time worrying about databases and schemas, and more time solving interesting problems.

One more tip: If you ever see the PHP filter module enabled on a site, or something like Views PHP filter, that likely indicates someone getting lazy and not doing things The Drupal Way™. Putting PHP code into the database (as part of content, the body of a node, or as part of a view) is like pouring Mentos into Diet Coke—it's a recipe for disaster! There's always a way to do what you need to do via a .module file or your theme. Even if it's hacked together, that's a million times better than enabling the insecure, developer-brain-draining module that is the PHP filter.

Themes and the .tpl.phps of DOOM!

Drupal has had a long and rocky relationship with themers and designers—and at some times in Drupal's history, the very idea of the responsibility of a 'theme' has been unclear. One principle has always been clear, however: themes should deal with HTML markup, CSS styling, some JavaScript for the user interface, and maybe a tiny bit of PHP to help sort data into certain templates.

That last bit, however—the 'tiny bit of PHP'—has been abused very often due to the fact that Drupal has been using a custom theme engine called PHPTemplate, which allowed the use of any PHP code inside any template (.tpl.php or sometimes referred to as 'tipple fip') file.

Many themers, designers, and new Drupal developers have mangled templates and thrown all kinds of code into template files which simply doesn't belong. The idea that HTML markup and PHP code can be mixed and mashed together is something that comes out of a 'scripting' mentality that is predominant in very old versions of PHP, custom-coded PHP websites, and an old-school PHP <4 mentality. Nowadays, there should be a distinct separation between markup and styling (a theme's responsibility), and the business logic that generates data to be put into markup and styled (a module's responsibilty—or, rarely, inside a theme's template.php).

I've seen sites where there were 30+ copies of the theme's page.tpl.php file, all just to change one variable on different pages on a site. What the developer should've done is use one page.tpl.php, and implemented template_preprocess_page() (which can be invoked in either template.php, or in a module as hook_preprocess_page()). Inside that function, the developer can set the variable depending on which page is being viewed. If the developer were to continue to duplicate page templates, he'd be in a very sorry situation the first time he had to change the page markup sitewide—instead of changing it in one page template, he'd have to change it in 30+ copies, and make sure he didn't miss anything!

Don't Repeat Yourself - DRY

The DRY principle (Don't Repeat Yourself) applies very strongly to themes and templates—instead of making a bunch of duplicate templates and changing little things in each one, use hook_preprocess_hook() functions in either your theme or custom modules.

One other important note: If you're coming from Wordpress or another PHP-based CMS that often mixes together HTML markup and PHP files throughout modules, plugins, themes, etc., please try to get that concept out of your head; in Drupal, you should have one, and only one opening <?php tag inside any PHP code file, and templates (.tpl.php files) should only include the most basic PHP and Drupal theming constructs, like if, else, print(), hide() and render(). If you have any more than that in a template, that's a sign of code smell.

Thankfully, Drupal 8 will use Twig instead of PHPTemplate as the default template engine. Twig is a true templating language, and doesn't allow PHP. It's also more designer-friendly, and doesn't require a rudimentary knowledge of PHP to use—or an advanced knowledge of PHP to use well.

Code Quality

Spaces versus tabs. Putting curly braces on the same line as the if statement or the next. These are the things that will be argued ad infinitum, and these are the things that don't really matter to a compiler. But they matter greatly to a community of developers. The larger and more diverse the community, the more important they are!

Drupal developers come from around the world, from many different cultures. It's important that we have a common way of communicating, and it helps quite a bit if we all use certain standards when we share code.

Since the mid-2000s, the Drupal community has banded together to make and enforce some very thorough coding standards for PHP, JavaScript, CSS, and other code used in Drupal core and contributed projects. The community is in ongoing discussions about code quality and review processes, and continues to adapt to modern software development best practices, and does a great job of teaching these practices to thousands of new developers every release.

Since early in the Drupal 7 development cycle, the Drupal community has written automated tests to cover almost all of Drupal core and many large contributed projects, and has built testing infrastructure to ensure all patches and bugfixes are thoroughly tested before being accepted.

Since early in the Drupal 8 development cycle, the Drupal community has used the concept of core gates and issue count thresholds, as well as divided responsibilities in different core initiatives, to ensure that development didn't get too scattered or start making Drupal core unstable and incoherent. Drupal 8, though in alpha stages, is already very stable, and is looking to be the most bug-free and coherent release yet.

Drupal's strict coding standards already match up pretty well with the suggested PSR standards from the PHP Framework Interop Group, and Drupal 8 and beyond will be taking future PSRs into account as well. This will help the Drupal community integrate more easily into the larger PHP world. By following standards and best practices, less time is spent trying to get individual PHP classes, methods, and configurations to work together, and more time is spent creating amazing websites, applications, and other products.

One tip: The Coder module will help you to review how well your own code (PHP, JS and CSS) follows the Drupal Coding standards. It also helps you make sure you're using best practices when it comes to writing secure code (though automated tools are never a perfect substitute for knowing and writing secure code manually!).

Even further: Many developers who work with PHP-based systems seem to have followed the progression of designer -> themer -> site builder -> developer, and thus don't have a strong background in software architecture or actual 'hard' programming (thus many ridicule the PHP community as being a bunch of amateur programmers... and they're often right!). I'd suggest trying to work on some small apps in other languages as well (might I suggest Node.js, Go, Java, or Ruby), to get a feel for different architectures, and learn what is meant by terms like SOLID, DRY, TDD, BDD, Loose coupling, YAGNI, etc.

Hacking Core and Contrib modules

Every time you hack core, God kills a kitten. Please, consider the kittens.

The above image comes from an idea originally presented at DrupalCon Szeged 2008 by Greg Dunlap. It goes like this: Every line of code you change in Drupal core or one of the contributed modules you're using will add many man-hours spent tracking the 'hack' over time, make upgrading your site more difficult, and introduce unforeseen security holes and performance regressions.

The times when actually modifying a line of code anywhere outside your custom module or theme's folder is a good idea are extremely rare.

If you find you are unable to do something with Drupal core or a contributed module to make it work the way you want, either you haven't yet learned how to do it the right way, or you found a bug. Drupal is extremely flexible with all it's core hooks, alter hooks, preprocess functions, overrides, etc., and chances are, there's a more Drupalish way of doing what you're trying to do.

On the rare occasion where you do have a problem that can only be fixed by patching core or a contrib module, you should do the following:

  1. Search the project's issue queues to see if someone else had the same problem (chances are you're not the first!).
  2. If you found an issue describing the same problem, see if the issue is resolved or still open:
    • If the issue is resolved, you might need to download a later -dev release to fix the problem.
    • If the issue is not resolved, see if there's a patch you can use to fix the problem, test the patch, and post back whether the patch resolves your problem, so the patch progresses towards being accepted.
    • If the issue is not resolved and there is no patch to fix the problem, work on a patch and submit it to the issue queue.

The key takeaway here is the idea of investing in patches. If you find an actual bug or would like to see some improvement to either Drupal core or a contributed project, you should either test and push forward existing patches, or contribute a patch to get your problem resolved.

When you do things this way, you no longer operate on an island, and you'll benefit from community feedback and improvements to your patch. In addition, by only using patches that are tracked on a drupal.org issue, you can track your patches more easily. On the rare occasion when I need to use a patch, I put the patch file (named [issue_number]-[comment_number].patch) into 'sites/all/core-patches' directory, and then add an entry in a 'Patches' file along with a link to the issue, a description of the patch, and why it is necessary.

Participating in the Drupal Community

In the previous section, I mentioned the idea of not being an island when developing with Drupal. How true this is! You're using software that's built by thousands of developers, and used by millions. There are people working on Drupal from every continent, and this diverse community is one of the most positive aspects of Drupal.

On Drupal.org's front page, the first line of text reads:

Come for the software, stay for the community.

With so many people using and building Drupal, chances are you aren't the first person to encounter a particular problem, or build a certain piece of functionality. And if you can't find a module or a simple built-in way to do something you need to do, there are plenty of places to go for help:

And these are just a few of the places where you can discover community and get help!

As I said before: don't be an island. With proprietary, closed-source software, you don't have anywhere to go except official (and expensive) vendor support. With Drupal, you get the code, you get to talk to the people who wrote the code, and you can even help make the code better!

Global state / Assuming too much

Not every request for a Drupal resource (most often a path defined in hook_menu()) comes from a web browser, and many variables and things you assume are always available are not. A lot of developers forget this, and write code that assumes a lot of global state that will be missing at certain times—if drush (or the command line in general) is in use, if data is being retrieved via AJAX, or if a data is being retrieved by some other service.

Always use Drupal's API functionality instead of things like $_GLOBALS and $_GET. To get the current URL path of the page being viewed, use current_path(). To use dynamic URL paths, use paths and the arg() function or Drupal's built-in menu router instead of adding a bunch of query parameters.

Additionally, use Drupal's menu router system and Form API to the fullest extent. When you define a menu item in hook_menu(), you can pass an access callback which integrates with Drupal's menu access system and lets you determine whether a given user has access (return TRUE) or not (return FALSE). Drupal takes care of outputting the proper headers and access denied page for you. When building forms, use the built-in validation and submit callback functionality, along with helper functions like form_set_error(). Using APIs that are already built into Drupal saves you time and code, and usually ensures your forms, content, etc. is more secure and more performant.

Finally, always enable logging (typically via syslog on production servers, or logging errors to the screen in development environments) and check your logs over time to make sure you're not generating a bunch of errors in your custom code.

Drupal 8 will be dropping some bits of global state that are often abused in Drupal 7 and below—the use of the global $user object is discouraged, and $_GET['q'] won't be available at all! Use the API, Luke, and the force will be with you.

The Drop is Always Moving

Though this post is one of the longest I've written on this blog, it barely scratches the surface of a full understanding of The Drupal Way™. The only way to start wrapping your head around how to do things properly with Drupal is to build a site with Drupal. And another site, and another, etc. Then build some modules, and some themes. Build an installation profile or two. Learn drush. Contribute to Drupal core.

Every day, learn something new about Drupal. You'll find that Drupal is a constantly-evolving (and improving!) ecosystem. The best practice today may be slightly different tomorrow—and with Drupal 8 just around the corner, there are many exciting opportunities to learn!

Related Posts from Elsewhere

Discuss this post on Hacker News, Reddit, or below...

Jun 27 2013
Jun 27

BoostI'm a huge fan of Boost for Drupal; the module generates static HTML pages for nodes and other pages on your Drupal site so Apache can serve anonymous visitors the static pages without touching PHP or Drupal, thus allowing a normal web server (especially on cheaper shared hosting) to serve thousands instead of tens of visitors per second (or worse!).

For Drupal 7, though, Boost was rewritten and substantially simplified. This was great in that it made Boost more stable, faster, and easier to configure, but it also meant that the integrated cache expiration functionality was dumbed down and didn't really exist at all for a long time. I wrote the Boost Expire module to make it easy for sites using Boost to have the static HTML cache cleared when someone created, updated, or deleted a node or comment, among other things.

However, the Cache Expiration module has finally gotten solid Boost module integration (through hook_cache_expire()) in version 7.x-2.x, and the time has come for all users of Boost Expire to switch to the more robust and flexible Cache Expiration module (see issue). Here's how to do it:

  1. Disable and uninstall the Boost Expire module (then delete it, if you wish).
  2. Download and enable the Cache Expiration module (make sure Boost is still enabled).
  3. Visit the Cache Expiration configuration page (admin/config/development/performance/expire), and set the following options:
    • Module status: select 'External expiration' to enable cache expiration for the Boost module.
    • Node expiration: check all three checkboxes under Node actions, and make sure the 'Node page' checkbox is checked below.
    • Comment expiration: check all five checkboxes under Comment actions, and make sure the 'Comment page' and 'Comment's node page' checkboxes are checked below.

For the visually inclined, see the screenshots in this comment.

I'd like to thank the 750+ users of Boost Expire for helping me make it a great and robust stopgap solution until Cache Expiration 'cached' up (heh) with Boost in D7, and the author of and contributors to both Boost and Cache Expiration for making some great and powerful tools to make Drupal sites fly!

If you're interested in some other ways to make your Drupal site faster, check out the article Drupal Performance White Paper (still in development) on my personal website.

Jun 25 2013
Jun 25

[Update: And, as quickly as I finished writing this post, I thought to myself, "surely, this would be a good thing to have drush do out-of-the-box. And... it already does, making my work on this shell script null and void. I present to you: drush sql-drop! Oh, well.]

When I'm creating or updating an installation profile/distribution for Drupal, I need to reinstall Drupal over and over again. Doing this requires a few simple steps: drop/recreate the database (or drop all db tables), then drush site-install (shortcut: si) with the proper arguments to install the site again.

In the past, I've often had Sequel Pro running in the background on my Mac, and I'd select all the database tables, right-click, choose 'Delete Tables', then have to click again on a button to confirm the deletion. This took maybe 10-20 seconds, depending on whether I already had Sequel Pro running, and how good my mouse muscles were working.

I created a simple shell script that works with MAMP/MAMP Pro on the Mac (but can easily be modified to work in other environments by changing a few variables), which simply drops all tables for a given database:

#!/bin/bash
#
# Drop all tables from a given database.
#

# Some variables.
MYSQL=/Applications/MAMP/Library/bin/mysql
AWK=$(which awk)
GREP=$(which grep)
USER="valid-username-here"
PASSWORD="your-password-here"

# Database (argument provided by user).
DATABASE="$1"

# Require the database argument.
[ $# -eq 0 ] && {
  echo "Please specify a valid MySQL database: $0 [database_goes_here]" ;
  exit 1;
}

# Drop the given database with mysql on the commind line.
TABLES=$($MYSQL -u $USER -p$PASSWORD $DATABASE -e 'show tables' | $AWK '{ print $1}' | $GREP -v '^Tables')
for TABLE in $TABLES
do
  # echo "Deleting $TABLE table from $DATABASE..."
  $MYSQL -u $USER -p$PASSWORD $DATABASE -e "DROP TABLE $TABLE"
done

I named the script wipe-db.sh, and you can call it like so: $ /path/to/wipe-db.sh database-name. I added a symlink to the script inside my /usr/local/bin folder so I can just type in 'wipe-db' in the Terminal instead of entering the full path. To add the symlink:

$ ln -s /path/to/wipe-db.sh /usr/local/bin/wipe-db

Now I can wipe the database tables within a couple seconds, since I always have Terminal running, and I never have to reach for the mouse!

Apr 11 2013
Apr 11

Edit: There's a module for that™ now: Pingdom RUM. The information below is for historical context only. Use the module instead, since it makes this a heck of a lot simpler.

Pingdom just announced that their Real User Monitoring service is now available for all Pingdom accounts—including monitoring on one site for free accounts!

This is a great opportunity for you to start making page-specific measurements of page load performance on your Drupal site.

To get started, log into your Pingdom account (or create one, if you don't have one already), then click on the "RUM" tab. Add a site for Real User Monitoring, and then Pingdom will give you a <script> tag, which you then need to insert into the markup on your Drupal site's pages.

The easiest way to do this is to use drupal_add_html_head() within the page_alter hook (in your theme's template.php, or in a custom module):

<?php
/**
* Implements hook_page_alter().
*/
function THEMENAME_page_alter($page) {
 
// Add Pingdom RUM code.
 
$pingdom_rum = array(
   
'#type' => 'html_tag',
   
'#tag' => 'script',
   
'#attributes' => array(
     
'type' => 'application/javascript',
    ),
   
'#value' => '[SCRIPT TAG CONTENT HERE]',
  );
 
drupal_add_html_head($pingdom_rum, 'pingdom_rum');
}
?>

Replace THEMENAME with your module or theme name, and [SCRIPT TAG CONTENT HERE] with the content of the pingdom script tag (excluding the two <script> tags at the beginning and end).

Once you've done this, go back to Pingdom, and you can view page load times in real-time:

Pingdom RUM monitoring graph

Pretty nifty!

Note: If you're looking for a great website monitoring service that's a bit simpler and cheaper than something like Pingdom (which we love!), check out Server Check.in :)

Feb 28 2013
Feb 28

Druplicon at DrupalCon - balloonDrupalCon Portland is only a couple months away (early bird registration ends soon, so get your tickets if you haven't already!), and I'll be headed out that way. If this will be your first time attending a DrupalCon, be sure to read my First Timer's Guide to DrupalCon from last year.

At this year's DrupalCon, I'm excited to hear about everything going on with Drupal 8, as we're nearing the end of the development cycle, and a release candidate is on the not-too-distant horizon.

After having a baby and shying away from much Drupal contrib/core work, I finally had some time in the past few weeks to get up to speed on many of the Drupal changes that have been committed in the past month or so—and boy are they amazing (CKEditor in core, new node edit form, new responsive layouts, new admin toolbar, config, views, etc.)!

In addition, since the feature freeze deadline has passed, I decided to try porting Honeypot (a popular spam bot-fighting module) to Drupal 8. So far, most everything works, but I'm still working on making sure new configuration changes are accounted for.

I'd love to talk about everything I've learned while developing Honeypot and running some small—and large—community websites (juicy targets for human and non-human spammers!). To that end, I've submitted the session , and would love to hear (in the session's comments) anything specific you'd like to learn more about. Spam is a difficult problem, but there are many weapons you have to fight it! I'll go through all that and more during the session, if it's accepted.

Also, if you're a daring soul, and would like to help me get Honeypot up and running well in Drupal 8, download Drupal 8 and Honeypot 8.x-dev, and give it a whirl! Hopefully more module and theme maintainers will start porting their projects to Drupal 8 under the banner of #D8CX, now that core is feature frozen!

Feb 14 2013
Feb 14

PHP's command line interface doesn't respect the max_execution_time limit within your php.ini settings. This can be both a blessing and a curse (but more often the latter). There are some drush scripts that I run concurrently for batch operations that I want to make sure don't run away from me, because they perform database operations and network calls, and can sometimes slow down and block other operations.

Memory usage - PHP and MySQL locked from runaway threads
Can you tell when the batch got backlogged? CPU usage spiked to 20, and threads went from 100 to 400.

I found that some large batch operations (where there are hundreds of thousands of items to work on) would hold the server hostage and cause a major slowdown, and when I went to the command line and ran:

$ drush @site-alias ev "print ini_get('max_execution_time');"

I found that the execution time was set to 0. Looking in PHP's Docs for max_execution_time, I found that this is by design:

This sets the maximum time in seconds a script is allowed to run before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. The default setting is 30. When running PHP from the command line the default setting is 0.

I couldn't set a max_execution_time for the CLI in php.ini, unfortunately, so I simply added the following to my site's settings.php:

<?php
ini_set
('max_execution_time', 180); // Set max execution time explicitly.
?>

This sets the execution time explicitly whenever drush bootstraps drupal. And now, in my drush-powered function, I can check for the max_execution_time and use that as a baseline to measure against whether I should continue processing the batch or stop. I need to do this since I have drush run a bunch of concurrent threads for this particular batch process (and it continues all day every day).

Now the server is much happier, since I don't get hundreds of threads that end up locking the MySQL server during huge operations. I can set drush to only run every 3 minutes, and it will only create a few threads that die around the next time another drush operation is called.

Feb 12 2013
Feb 12

...there's a site for that.

Simply Test.me Screenshot

I just found out about SimplyTest.me today, and it allows you to, well, simply test any Drupal.org-hosted module, theme, or distribution in seconds.

No longer do you need to spin up (or maintain) a live website locally (which usually takes an extra minute or two—at least) just to check out a module or make sure a theme or distribution fits your needs before using it on a live or development site.

Instead of simply getting a screen shot or trying a theme on a demo site, you get a full Drupal website set up and configured with the module/theme/distro (as well as it's dependencies), so you can play with it to your heart's content (for 30 minutes if you don't have an account on the site, an hour if you do).

According to the site's Q&A page, Drupal 6, 7, and 8 are all supported—even with sandbox projects! You can read more about the architecture and service implementation on the simplytest.me project page on Drupal.org.

Check it out, and thank Patrick Drotleff and all the sponsors (who help provide the hosting) for the hard work on this awesome tool!

[Update: There's also a great post on the Comm Press Blog about how you can test patches quickly and easily using Simply Test.me: Everyone can test patches. Really! simplytest.me to the rescue.]

Jan 04 2013
Jan 04

Some random bits of news from Midwestern Mac, LLC:

St. Louis-area Drupal Group

After taking a hiatus for the month of December, the St. Louis area Drupal Group will be meeting up (hopefully) on the third Thursday of the month as normal. We're hoping to have more structure to our meetups, and there are already some great ideas for meeting topics in 2013.

If you live in or around St. Louis and use or contribute to Drupal, please make an effort to join us and build up the Drupal community here in St. Louis!

As an aside, we still have a separate website for the St. Louis Drupal group—if anyone has ideas for how we can use that to spread the Drupal love in the center of the U.S., please let us know!

Server Check.in Launched

A couple weeks ago, we (Midwestern Mac, LLC) announced our newest service, Server Check.in, a website and server monitoring service that checks on your sites and servers every 10 minutes, notifying you of any problems. The service runs on Drupal, and integrates with services like Twilio and Stripe to handle SMS messaging and payments, respectively.

I (geerlingguy) wrote up a case study for Server Check.in and posted it to the Community showcase on drupal.org. This is the first application-type service built on by Midwestern Mac on Drupal, and we've already been hard at work improving the service.

If you have any questions about Server Check.in, or how it was built, please ask away; I had a great discussion with some other developers in this thread on Hacker News.

Hosted Apache Solr Search updated to 3.6.x

At the request of many people who wanted to do some neat new things with Solr on their Drupal sites, we've finally followed Acquia's lead and updated some of our Solr search servers to 3.6.x, meaning things like Location-based searching are now possible. And our servers are happier :)

Nov 06 2012
Nov 06

I was recently browsing a very popular review website, when I noticed the following warnings popping up:

Angie's List website errors

From simply loading their web page and seeing these error messages, I could conclude:

  1. The website is using Drupal.
  2. The website is using memcached.
  3. The website is running on Acquia's managed hosting cloud.
  4. The website has error reporting set to print all errors to the screen.

If I were trying to break into this review site, or cause them a bad day, the information presented in this simple error message would help me quickly tailor my attacks to become much more potent than if I started from a blank slate.

Security through obscurity

I will quickly point out that security through obscurity—thinking you're more secure simply because certain information about your website is kept secret—is no security at all. However, that doesn't mean that obscurity is not an important part of your site's security.

Simply because the site above doesn't have the 'display no error messages' setting enabled on the live website, I was able to learn quite a bit about the site. I could've probably found more 'helpful' error messages had I spent a little more time investigating.

At least the site's server-status page is protected! (Many sites leave the Apache server-status page open, exposing a ton of potentially dangerous details).

Keeping certain things secret, like errors that occur on your site, the version of a particular CMS, plugin, module, or theme of your website, or status reporting information, does improve your site's security. It won't prevent a dedicated intruder, but it will definitely slow him down, and will likely deter less-dedicated intruders.

To contribute to the overall security of your website, you should do the following:

  • Make sure server and configuration status pages are secure from outside access. If you need to expose phpinfo() or server-status, make sure only you have access.
  • Turn off error message printing on the screen on your publicly-accessible sites. Only turn on this feature on development or testing sites. (You should still log error messages, but do this behind the scenes, using syslog or some other logging facility).
  • Protect your server configuration, error, and log files from prying eyes; even backups of these files can be a security hole.

Hardening your defenses

Of course, as I mentioned above, security through obscurity is no security at all. Even if someone were to know every detail about your server configuration and setup, your site should still be secure. The following are some essential steps to ensuring the security of your website:

  • Apply patches and updates routinely. Most systems have automatic update systems, or at least notify you when an update is available.
  • Have an outside consultant evaluate the security of any custom code or interfaces you provide (especially for custom forms, API interaction, and file handling).
  • Use automated tools like Fail2Ban on your servers to make sure repeated attempts to access your servers are blocked.
  • Know your options when it comes to spam filters and flood controls; Drupal, as an example, has a plethora of excellent modules and configuration settings to prevent certain security holes from being opened. There's even a nice Security Review module that looks at common site configuration problems and warns you if they're incorrect.
Oct 01 2012
Oct 01

Most people who have grown up on the web, and have used Wysiwyg utilities online, or newer text editors/word processing applications are used to having a simple 'return' create a new paragraph, with (on average) one extra line of empty space between the new paragraph and the one before it.

However, a lot of people like having the 'return' key just go down one line. There are a few ways this is possible in most Wysiwygs:

  • You can change the block style from 'Paragraph' (which creates <p> tags around new lines of text) to 'div' (which creates <div> tags around new lines of text).
  • You can press Shift + Return when you want to just go down one line (using a <br /> tag instead of a <p> tag).

I use the second method when I'm using a Wysiwyg, as I like using paragraphs (which are semantic for text, and which allow for better CSS styling than a monolithic block of text with linebreaks). I also rarely use a Wysiwyg editor, so it's not really an issue for me anyways ;-)

But, some people ask me if they can set up TinyMCE to use line breaks instead of paragraph returns by default, so they don't have to hit Shift + Return all the time (instead, they hit 'Enter Enter'... more keystrokes, but whatever floats their boat!).

Well, as it turns out, TinyMCE does have a setting for this, called forced_root_block. And Drupal's Wysiwyg module allows you to pass along this setting to TinyMCE when TinyMCE is loaded on a page, using hook_wysiwyg_editor_settings_alter() like so (in a custom module):

<?php
/**
* Implements hook_wysiwyg_editor_settings_alter().
*
* Sets defaults for TinyMCE editor on startup.
*/
function custom_wysiwyg_editor_settings_alter(&$settings, $context) {
  if (
$context['profile']->editor == 'tinymce') {
   
// Force linebreaks instead of paragraph returns.
   
$settings['forced_root_block'] = FALSE;
  }
}
?>

Sep 25 2012
Sep 25

I just sent a new note to the Flocknote Development list about making Flocknote speedier. Flocknote is a very complex web application, and at the beginning of this summer, I noticed that some pages were taking more than a second to generate on our server (that's before the page would be sent to the end user!).

Investigating the performance problems using MySQL's EXPLAIN, the PHP profiler XHProf, and Drupal's Devel module, I found the culprits to be some inefficient and memory-hungry caches and some inefficient database queries. Applying a couple patches that are in development for Drupal, and adding a couple indexes on different tables more than halved average page load time.

I also am actively trying to get these patches accepted into Drupal core and the Views module. Once the patches are incorporated, millions of other Drupal websites and applications will be able to conserve memory and clock cycles as well. You could easily substitute 'Wordpress', 'Joomla', 'DotNetNuke', or any other CMS or platform for 'Drupal' here.

When we shave milliseconds off page load times, or optimize CSS and JavaScript to conserve CPU time on an end user's computer or mobile device, we are not only making end users happier, we're effectively:

  • Conserving battery life, and thus recharging time—reducing power demands altogether.
  • Making end users enjoy (and thus continue) using our websites and products.
  • Allowing for more free memory and CPU time on our servers, which in turn increases capacity.

These are very real benefits of pursuing better performance. Do you performance test your code when you add new features? Do you run something like Google PageSpeed to make sure your fancy new scripted widget doesn't kill performance on older Android devices, iPhones, and PCs?

Just like with rampant misuse of Adobe Flash everywhere in the early part of this millenium, many people seem to be adding features, effects and widgets wily-nily to their sites and platforms with little regard for their frying servers or those using the sites. Do you really need a 3D tag cloud on your site, when it costs tons more time to generate on the backend, and tons of time to render in a browser?

Consider learning about improved performance techniques and incorporating performance testing in all the development you do—no matter what kind of software platform or website you're building. And if you can help large web platforms like Drupal, Wordpress and Joomla work faster using less memory, that's a win for everyone.

Sep 05 2012
Sep 05

One Drupal site I manage has seen MySQL data throughput numbers rising constantly for the past year or so, and the site's page generation times have become progressively slower. After profiling the code with XHProf and monitoring query times on a staging server using Devel's query log, I found that there were a few queries that were running on pretty much every page load, grabbing data from cache tables with 5-10 MB in certain rows.

The two main culprits were cache_views and cache_field. These two tables alone contained more than 16MB of data, which was queried on almost every page request. There's an issue on drupal.org (_field_info_collate_fields() memory usage) to address the poor performance of field info caching for sites with more than a few fields, but I haven't found anything about better views caching strategies.

Knowing that these two tables, along with the system cache table, were queried on almost every page request, I decided I needed a way to cache the data so MySQL didn't have to spend so much time passing the cached data back to Drupal. Can you guess, in the following graph, when I started caching these things?

MySQL Throughput graph - munin

APC, Memcached, MySQL Query Cache?

If this site were running on multiple servers, or had a bit more infrastructure behind it, I would consider using memcached, which is a great caching system to run in front of MySQL, especially if you want to cache a ton of things and have a scalable caching solution (read this story for more). Running on one server, though, memcached doesn't offer a huge benefit compared to just using MySQL's query cache and tuning the innodb_buffer_pool_size so more queries come directly from memory. Memcached incurs a slight overhead due to the fact that data is transferred over a TCP socket (even if it's running on localhost).

MySQL's query cache is nice, but doesn't offer a huge speed benefit compared to how much more memory it needs to store a lot of queries.

I've often used APC (an opcode cache for PHP) to cache all a site's compiled PHP files in memory so they don't need to be re-read and compiled from disk on every page request (for most Drupal sites, if you're not already using APC for this purpose, you should be; unless you're using fast SSDs or a super-fast RAID array (and even in that case), APC will give probably a 20-50% gain in page load times).

However, I've never used APCs 'user cache' before, since I normally let APC run and don't want to worry about fragmentation or purging.

APC User Cache

There's a handy Drupal module, APC, which lets you configure Drupal to store certain caches in APC instead of in the database, meaning Drupal can read certain caches directly from RAM, in a highly-optimized key-value cache. APC caching is suited best for caches that don't change frequently (otherwise, you could slow things down due to frequent purging and fragmentation).

Some good candidates I've found include:

  • cache (includes entity_info, filter_formats, image_styles, and the theme_registry, many of which are queried every page load).
  • cache_bootstrap (includes system_list and variables, queried every page load).
  • cache_field (queried whenever field data is needed, grows proportionally to how many fields + instances you have).
  • cache_views (queried whenever a view is loaded—even if your views are all stored in code).

You may find some other caches that are suitable for APC, but when you've decided which caches you'd like in APC, count up the data sizes of all the tables after the cache is warm, and then double that value. This is how many MB you should add to your existing apc.shm_size variable (usually in apc.ini somewhere on your server) to give a good overhead for user cache objects.

Monitor the APC cache size and usage (especially the free space and fragmentation amounts) using either the apc.php file included with APC (instructions), or using something like munin-php-apc along with munin monitoring. Make sure you have a good ratio of available vs. fragmented memory (more blue than orange, in the graph below):

Munin - APC Memory Usage Graph

When NOT to Use APC

APC is awesome for single-server setups. Especially if you have a site with relatively steady traffic, growing organically. APC is NOT helpful when you know you're going to need to scale quickly and will be adding servers (APC only benefits the server on which it's running). For a site that will exceed it's current capacity quickly, you'll probably want to consider first splitting your web server (Apache/PHP) from your MySQL server (but put them both in the same datacenter and connect via a private network), then consider adding a memcached server between the web and database server. From there, you can start adding more memcached servers and database slave servers as needed.

APC is also not very helpful if you don't have enough RAM on your server to store the cached objects (opcode + user cache objects) with at least 20-40% overhead (free space). In almost every situation, the default 32M apc.shm_size won't cut it, and in some cases, you'll need to push 128M or 256M before the server can run swiftly with a normal amount of fragmentation and purges.

Conclusion

It's always important to benchmark and profile everything. It's no use caching things in APC if you have a database query that takes 2 seconds to run, or an external web service call that takes 5! Once you've done things like tune database queries, check for obvious front-end performance flaws, and have your page load down to a couple seconds or less, start working on your caching strategy. APC isn't a good fit for everyone, but in this case, page generation times were cut at least 30% across the board and MySQL data throughput was cut by more than half!

A few important notes if you choose this route:

  • Drush/CLI operations will effectively rebuild the APC cache for the command line every time they run, due to the way APC works (if apc.enable_cli is turned on). However, it seems to have no effect on the separate APC cache for non-cli PHP.
  • Make SURE you monitor your APC memory usage, fragmentation, and purges. If you don't have about twice the required RAM allocated to APC, fragmentation and frequent purging might very well negate any significant performance benefit from using APC.
  • Read through this Stack Overflow question for some more good notes on APC settings: Best APC settings to reduce page execution time.
Aug 11 2012
Aug 11

Drupal 7 uses InnoDB tables. InnoDB provides many benefits, but can cause some unexpected headaches. One headache for me is that, by default, MySQL tells InnoDB to create one file on your system, called ibdata1, to hold ALL the data from EVERY InnoDB table you have on your MySQL server. This file never shrinks in size; it only expands to contain new data. If you delete something from MySQL or drop a table, the space that table was using is reallocated for other new data. This isn't a bad thing, especially for those who have a lot of drive space, and not many databases that are altered or dropped quite frequently.

I develop a lot of sites on my little MacBook Air (with a 128GB SSD), so I often download database snapshots from live and testing environments, empty out the tables on my local environment, then import the database dumps. Can you spot the problem here?

Using Daisy Disk I just noticed that my ibdata1 file had grown to more than 10 GB, and my Air's drive only had about 5 GB free space!

So, after reading through MySQL's InnoDB Engine documentation and this answer on Stack Overflow, I found that it's not too hard to change MySQL to keep data tables in their own files, and delete the files after the tables are deleted (thus saving me a ton of space). It just takes a little time and annoyance.

Here's how to do it, roughly:

  1. Export/dump all your databases. (In my case, I didn't do this, since I could just grab them all from production or development servers.) If you have a good backup and restoration system in place, you shouldn't need to fret too much about this part, but if you don't, you'll probably need to spend a bit of time dumping each database or writing a script to do this for you.
  2. Drop (delete) all databases, except for the mysql database, and information_schema, if it exists.
  3. Shut down MySQL.
  4. Delete the ibdata1 file and any ib_logfile log files (I just had ib_logfile0 and ib_logfile1).
  5. Add innodb_file_per_table under the [mysqld] heading in your my.cnf file.
  6. Start MySQL.
  7. Import all your databases.

After doing this, my 'mysql' directory with all my databases only took up about 3 GB (there are a few large databases I regularly work with... but +/-3 GB is a lot less painful than 10+ GB!

I also took this opportunity to flush out some other testing databases that I had on my local computer for Drupal 4.7 (really!), 5, 6, 7 and 8 testing. It's easy enough to create a new database when the need arises, and with drush, it's easier than ever to create and sync databases and files for my Drupal sites.

On most of the production servers I manage, I don't worry about setting innodb_file_per_table, because there are often only one, two or three databases, and they aren't constantly changing like on my local computer—they only grow over time, so the ever-increasing size of the ibdata1 file isn't concerning to me.

Jun 04 2012
Jun 04

For the past couple years, discussions about 'PSR-0', PHP standards, and some sort of framework standardizations have been popping up here and there. It wasn't until a bunch of 'PSR-0 Interoperability' patches started popping up in the Drupal core issue queues that I decided to take a closer look at PSR. (The latest? PSR-1 (Basic Coding Standard) and PSR-2 (Coding Style Guide) have been accepted).

There's a great FAQ that was just posted by Paul M. Jones explaining the PHP-FIG (PHP Frameworks Interoperability Group), which will give a little backstory to the group and its purpose. Drupal is a member of this group, with Crell (Larry Garfield) representing Drupal's vote for standards guidelines. You can see group members and discussions in the PHP Standards Working Group Google Group, and you can follow along with proposed and ratified group standards in the php-fig GitHub repository.

A lot of the larger PHP frameworks, CMSes and developer communities are represented, but—importantly—this group does not intend to represent PHP as a whole (that's probably the main reason it's now called the 'Framework Interoperability Group' instead of the 'PHP Standards Working Group'). Rather, it represents the mainstream PHP developer, and countless professional PHP developers working with and for the projects in the group. The main premise is that there are many large development groups working with PHP, and it would be helpful if these large groups could use a common set of coding standards, naming standards, and the like when developing their projects so things like the fruitful relationship between Symfony and Drupal can flourish (we're already seeing positive results here in the Drupal community!).

Having set standards that many organizations follow (such as PSR-0, PSR-1, etc.) also helps unify PHP development and bring it to a higher level; many others have (often rightfully) criticized the PHP language and developers for being fragmented, inconsistent and amateurish. I'm going to adopt PSR standards in my own PHP side projects (heck, most of my code already conforms, so it's not a big deal to me), and I'm glad many organizations are working towards adopting the standards as well. It will let our community spend more time working on making better end results and useful classes than arguing over whitespace, bracket placement, and control structure formatting (to name a few things...).

May 15 2012
May 15

[Update: As of Views 7.x-3.4, you can now use the new "Global: Combine fields filter" to combine fields for an exposed search. Just add the fields you want to search to the view's Fields section, then add a 'Global: Combine fields filter' and select all the fields you want to search. Simple as that!]

A common need I run into with a ton of Drupal sites and Views is searching/filtering content based on multiple fields. For example, a lot of people would like to search for content using either the Title or the Body for a particular content type.

There are two primary solutions offered for this situation, but they both have downsides or are overly complex, in my opinion:

  • Use the Computed Field module to create yet another field stored in the database, combining the two (or more) fields you want to search, then expose a filter for that field instead of both of the individual fields. (I don't like this because it duplicates content/storage, and involves an extra module to do so).
  • Use the Views Filters Populate to invisibly populate a second field that you've added to a views OR group (using Views OR in Views 2.x, or the built-in AND/OR functionality in Views 3.x). (This module is slightly limited in that you can only work with strings, and again, it involves an extra module).

Instead of using an extra module, I simply do the following to achieve a multi-field search:

  1. Add an and/or group to the filters in Views 3.x (next to 'Filter criteria', click the 'Add' drop down and choose 'and/or, rearrange').
  2. Put the main field you'd like to search into the new filter group (in my case, the Title field), and set the new group to OR.
  3. Implement hook_views_query_alter() in a custom module. In the query alter, you'll simply get the keyword parameter, and add a join and where clause (if you want to join to another table, like the 'body' data table). The code I'm using in this particular instance is below:

<?php
/**
* Implements hook_views_query_alter().
*
* Allow users to search the in the 'help' view by title OR body.
*/
function custom_views_query_alter(&$view, &$query) {
 
// Only do anything when using the 'help' view.
 
if ($view->name == 'help') {
   
// Get the keyword used for the search.
   
$keyword = isset($_GET['title']) ? $_GET['title'] : '';    // Add a new LEFT JOIN and WHERE clause for the help node body.
   
$join = new views_join();
   
$join->construct('field_data_body', 'node', 'nid', 'entity_id');
   
$query->table_queue['node__field_data_body'] = array(
     
'table' => 'field_data_body',
     
'num' => 1,
     
'alias' => 'node__field_data_body',
     
'join' => $join,
     
'relationship' => 'node',
    );
   
// The first parameter selects the 'AND/OR' group this WHERE will be added to.
    // In this case, we add it to the second group (the first one is an AND group for
    // 'status = published' and 'type = help').
   
$query->add_where(2, 'node__field_data_body.body_value', '%' . $keyword . '%', 'LIKE');
  }
}
?>

The documentation for a views_join() and add_where() are somewhat vague, but basically, the code above only runs on the 'help' view, it gets the keyword from the URL parameters (works with or without AJAX-enabled views), then adds a join from the node table (where the 'Title' is) to the field_data_body table (where the content is), and adds a 'where' clause to the new 'OR' group we created in steps 1-2 above.

If you want to dig deeper into the query, just use the Devel module's dpm() function to show the $query object (dpm($query);).

(Note: This illustrates a pretty simple two-field search. I've used the same technique to search on more fields, just adding more where clauses, and making sure there are joins to all the tables where I'm searching... in one example, I searched a list of users by their username, real name (fields), phone number (a field), or email address).

Mar 19 2012
Mar 19

I was inspired today to get XHProf working on my Mac, using MAMP PRO 2.0.5/PHP 5.3.6, after reading @Dave Reid's tweet. Since I'm not leaving for DrupalCon until tomorrow, what else could I do today? There's an excellent article on Lullabot that will help you get 85% of the way towards having XHProf up and running on your Mac, working with your Drupal sites, but there are a few missing pieces and little tips that will help you get XHProf fully-armed and operational.

XHProf Callgraph example
Ooh, pretty visualizations!

First, after you've installed and configured XHProf on your Mac (and restarted MAMP/Apache so the configuration takes effect), you need to do a few things to get it working well with Drupal. For starters, if you have the Devel module installed, head over to its configuration page (at admin/config/development/devel), and check the box that says "Enable profiling of all page views and drush requests."

Now, enter the following values in the two fields that appear (note: these paths could be different depending on where you installed xhprof, and how you have your Sites folder/localhost set up. For simplicity, I kept the xhprof stuff in MAMP's htdocs folder):

  • xhprof directory: /Applications/MAMP/htdocs/xhprof
  • xhprof url: http://localhost/xhprof/xhprof_html

Save the configuration, and refresh the page. Scroll down to the bottom of the page, and click on the newly-added link in the Developer information section titled 'XHProf output'. You should see a huge table with large, menacing numbers. Don't worry about interpreting them just yet. (If you got an error, or something other than a table of a bunch of functions and numbers, then XHProf is not configured correctly).

Now, click on the [View Full Callgraph] link towards the top of the page. You'll probably get an error like:

Error: either we can not find profile data for run_id [ID-HERE] or the threshold 0.01 is too small or you do not have 'dot' image generation utility installed.

This is because GraphViz (which provides the 'dot' utility) is not installed on your computer, or it's not in your $PATH. So, go ahead and download the OS X-compiled version of GraphViz appropriate to your computer (I downloaded the Intel version 2.14.1), and install it (it uses the normal Mac installer, and puts the files in /usr/local/graphviz-2.14).

The final step to get dot working correctly is to make a symlink to the dot binary using ln -s (in my case, /usr/local/bin is in my $PATH, as defined in ~/.bash_profile):

$ sudo ln -s /usr/local/graphviz-2.14/bin/dot /usr/local/bin/dot

NOW, go ahead and jump back over to your fancy XHProf data table page, and click the Callgraph link. Wait a minute, and you'll be rewarded with a beautiful graphical representation of where Drupal/PHP spends all its time, with colors, arrows, lines, and numbers to your heart's content!

The last step for getting XHProf would be to install the XHProf module on your site and get the data displaying inside Drupal—but I haven't been able to install it yet on my own site (there was an installation error), and the standard interface that I get (provided by XHProf itself) is good enough for me.

(Remember to clean out the directory where you're saving your XHProf runs every now and then (this directory is configured in php.ini as the xhprof.output_dir variable); each run will be 100-200KB, and that adds up as you load and reload tons of pages!).

Mar 18 2012
Mar 18

Flocknote is a large web application that lets churches easily manage communications with their members via email, text message, and phone calls. Many of the core features of email marketing services like MailChimp and Constant Contact are implemented in flocknote similarly, such as list management and mass emailing (and many features like shared list/member information management, text messaging, etc. are unique to flocknote).

Until recently, few groups using flocknote didn't have subscription lists that were big enough to hit our relatively high PHP max_time_limit setting when importing and exporting subscriber data. Since we're getting bigger, though, I've started implementing Batch API all over the place so user-facing bulk operations could not only complete without resulting in a half-finished operation, but could also show the end user exactly how much has been done, and how much is left:

Exporting List Subscribers - Batch API CSV Export

I've seen many tutorials, blog posts, and examples for using Drupal's Batch API for importing tons of data, but very few (actually, none) for exporting tons of data—and specifically, in my case, building a CSV file with tons of data for download. The closest thing I've seen is a feature request in the Webform issue queue: Use BatchAPI to Export very large data sets to CSV/Excel.

Before I get started, I want to mention that, for many people, something like Views Data Export (for getting a ton of data out of a View) or Node Export (specifically for exporting nodes) might be exactly what you need, and save you a few hours' time working with Batch API. However, since my particular circumstance ruled out Views, and since I was exporting a bit more customized data than just nodes or users, I needed to write my own batch export functionality.

Quick Introduction to the Batch API

I'm sure most of you have encountered Drupal's awesome Batch API at some point or another. It lets your site perform a task (say, updating a few thousand nodes) while the user can see a progress bar (which is always nice for UX) without running into the dreaded PHP timeout. Sometimes increasing PHP's max_time_limit can help, but if you want things to scale, and if you want to keep your PHP configuration sane, you should instead split the large operation up into smaller chunks of work—which is what Batch API does.

In my case, I wanted to integrate the batch operation with a form that allowed users to select certain parameters for their export. There's an excellent example in the Examples for Developers module here: batch_example.module. I'd suggest you read through that file to get the basics of how Batch API works along with a form submission.

Basically, when you submit the form, you set a batch with batch_set(), then kick off the batch processing using batch_process().

Processing the Batch: Building a CSV file

For my huge CSV file, I needed to do a couple things to make sure I (a) didn't overwrite the file each time my batch process was called, and (b) had all the right data in all the right places.

In the function MYMODULE_export_list_subscribers_batch() (defined as the only operation in the $batch passed to batch_set()), I process 50 subscribers at a time, getting their profile data using some helper functions. I also check (in the first part) to see if the file has been created yet (basically, if this is the first or a later pass at the process function), and if it has not, I create the file, get the column headers for this particular list using a helper function, and store the filepath in the $batch's context.

I create a temporary file, which will automatically get cleaned out by the system after a while, because I just need the file to persist for a short while, until the user has downloaded the generated file (to get the filepath, I use file_directory_temp() and create a file with a list-specific file name.

On each pass of the batch process operation, I add in 50 more subscribers to the file (using fopen(), with the 'a' flag, so it adds on to the end of the file), and then store the current location of the file in the batch's context.

Code speaks louder than words, though, so here's the main batch operation function in all its glory (a few details are missing, but the relevant parts are all there):

<?php
/**
* Batch operation to export list subscribers.
*/
function MYMODULE_export_list_subscribers_batch($list_id, $option, &$context) {
 
// Start working on a set of results.
 
$limit = 50;
 
$context['finished'] = 0// Create the CSV file with the appropriate column headers for this
  // list/network if it hasn't been created yet, and store the file path and
  // field data in the $context for later retrieval.
 
if (!isset($context['sandbox']['file'])) {
   
$list = node_load($list_id);    // Get field names for this list/network. (I use a helper function here).
   
$field_labels = array(
     
'Member ID',
     
'First Name',
     
'Last Name',
     
'Email Address',
     
'Phone Number',
     
'Created',
     
'Last Updated',
     
'etc...',
    );   
// Create the file and print the labels in the header row.
   
$filename = 'list_' . $list_id . '_subscriber_export.csv';
   
$file_path = file_directory_temp() . '/' . $filename;
   
$handle = fopen($file_path, 'w'); // Create the file.
   
fputcsv($handle, $field_labels); // Write the labels to the header row.
   
fclose($handle);    // Store file path, fields, subscribers, and network in $context.
   
$context['sandbox']['file'] = $file_path;
   
$context['sandbox']['fields'] = $fields;
   
$context['sandbox']['subscribers'] = MYMODULE_retrieve_list_subscribers($list->nid, TRUE);
   
$context['sandbox']['subscribers_total'] = count($context['sandbox']['subscribers']) - 1;    // Store some values in the results array for processing when finshed.
   
$context['results']['filename'] = $filename;
   
$context['results']['file'] = $file_path;
   
$context['results']['list_id'] = $list_id;
  } 
// Accounting.
 
if (!isset($context['results']['count'])) {
   
$context['results']['count'] = 0;
  } 
// Open the file for writing ('a' puts pointer at end of file).
 
$handle = fopen($context['sandbox']['file'], 'a');  // Loop until we hit the batch limit.
 
for ($i = 0; $i < $limit; $i++) {
   
$number_remaining = count($context['sandbox']['subscribers']) - 1;    if ($number_remaining) {
     
$uid = $context['sandbox']['subscribers'][$context['results']['count']];
     
// I use a helper function to get the data for each subscriber.
     
$subscriber_data = MYMODULE_retrieve_account_data_for_export($uid, $context['sandbox']['fields'], $context['sandbox']['network']);
     
fputcsv($handle, $subscriber_data);      // Remove the uid from $context.
     
unset($context['sandbox']['subscribers'][$context['results']['count']]);      // Increment the counter.
     
$context['results']['count']++;
     
$context['finished'] = $context['results']['count'] / $context['sandbox']['subscribers_total'];
    }
   
// If there are no subscribers remaining, we're finished.
   
else {
     
$context['finished'] = 1;
      break;
    }
  } 
// Close the file.
 
fclose($handle);  // Show message updating user on how many subscribers have been exported.
 
$context['message'] = t('Exported @count of @total subscribers.', array(
   
'@count' => $context['results']['count'],
   
'@total' => $context['sandbox']['subscribers_total'],
  ));
}
?>

There are a few things I can do to further optimize this, if need be; for example, I could probably run through the subscriber list in a better way, besides storing the whole thing (a bunch of integers) in an array, which doesn't scale infinitely. But those are micro-optimizations that I'll worry about if/when they become a problem.

Finishing the Batch: Delivering the CSV file

Because I want to deliver a .csv file download to the end-user, and not just display a simple message like 'Congratulations! We built your CSV file... but you have to click here to download it!', I decided to actually have the batch operation set CSV file download path in the user's session data, and then redirect to my own page for the end of the batch operation (to do this, I pass in the final path to batch_process() when I call it in the form submit function).

Here's the 'finished' function for the batch, where I simply set a message, and set a couple session variables that will be used later:

<?php
/**
* Finish the export.
*/
function MYMODULE_export_list_subscribers_finished($success, $results, $operations) {
 
// The 'success' parameter means no fatal PHP errors were detected. All
  // other error management should be handled using 'results'.
 
if ($success) {
   
$message = format_plural($results['count'], 'One subscriber exported.', '@count subscribers exported.');
  }
  else {
   
$message = t('There were errors during the export of this list.');
  }
 
drupal_set_message($message, 'warning');  // Set some session variables for the redirect to the file download page.
 
$_SESSION['csv_download_file'] = $results['file'];
 
$_SESSION['csv_download_filename'] = $results['filename'];
}
?>

Here's the page building function for the path that I have the user go to at the end of the batch operation (after the _finished function is called above)—this page's path redirect is set by passing it into batch_process() as a simple string, way back in the form submit function:

<?php
/**
* Interim download step for downloading CSV file.
*/
function MYMODULE_download_csv_file_interim($list_id) {
  global
$base_url;  if (empty($_SESSION['csv_download_filename']) || empty($_SESSION['csv_download_file'])) {
    return
t('Please visit your list subscribers page to begin a list download.');
  } 
$list = node_load($list_id);  // Redirect to the download file.
 
$redirect = base_path() . 'path/to/download/csv/' . $list_id;
 
drupal_add_js('setTimeout(function() { window.location.href = "' . $redirect . '"; }, 2000);', 'inline');  $download_link = l(t('click here to download the file'), 'path/to/download/csv/' . $list_id);
 
$output = '<p>' . t('Your subscriber list is now ready for download. The download should begin automatically. If it does not begin downloading within 5 seconds, please !download_link.', array('!download_link' => $download_link)) . '</p>';
 
$output .= '<p>' . l(t("&#8592; Back to %list subscribers", array('%list' => $list->title)), 'node/' . $list_id . '/subscribers', array('html' => TRUE)) . '</p>';
  return
$output;
}
?>

I used JavaScript/setTimeout() on this page, and redirected to another path that actually delivers the CSV file to the end user, because otherwise, most browsers will block the download (without user intervention), or go to the downloaded file and show a blank white page. Here's the code that's used to deliver the actual CSV file at the redirect path defined above:

<?php
/**
* Download a list subscriber CSV file.
*/
function MYMODULE_download_csv_file($list_id) {
 
// For added security, make sure the beginning of the path is the same as that
  // returned by file_directory_temp() (to prevent users from gaining access to
  // arbitrary files on the server).
 
if (strpos($_SESSION['csv_download_file'], file_directory_temp()) !== 0) {
    return
'Access denied.';
  } 
// Add HTTP headers for CSV file download.
 
drupal_add_http_header('Content-Type', 'text/csv; utf-8');
 
drupal_add_http_header('Content-Disposition', 'attachment; filename=' . $_SESSION['csv_download_filename'], TRUE);  // Allow caching, otherwise IE users can't dl over SSL (see issue #294).
 
drupal_add_http_header('Cache-Control', 'max-age=300; must-revalidate');  // Read the file to the output buffer and exit.
 
readfile($_SESSION['csv_download_file']);
  exit;
}
?>

There are other ways to simply deliver a CSV file, but this seems to work the best for the widest variety of browsers. Setting the Cache-Control header is necessary to allow IE users to download files over SSL (due to caching settings and file path persistence in Windows/IE). Chrome, FireFox and Safari work fine without it...

Conclusion

I hope this example has helped you figure out how to use Batch API for more than just importing; it's a little more involved to build a file or something else using Batch API than to just do something that doesn't require extra steps afterwards. But with this example, hopefully you can start flexing Batch API's muscles to do a bit more for you!

If possible, I would always try using Views Data Export, as it's so much simpler to integrate with my custom data sets, and Views is really fast and easy to implement. But in this case, I had to pull in access-controlled data from user profile fields, from Profile2 profile fields specific to each list, and from some other data sources, all into one CSV file, and this just wasn't going to happen with Views.

I've tested this Batch processing with up to 50,000 users, and it takes a few minutes to generate the resulting ~5MB file. It's much nicer to see that the file is being built over time (the way it is now) than to have to wait while the page is loading (with no feedback), and then get a WSOD because the page timed out after about 10,000 subscribers.

Mar 11 2012
Mar 11

Preparing for your first DrupalCon? Even if this isn't your first, here are a few tips and tidbits I've learned from my first DrupalCon last year, and would like to pass on to you. (I'm posting this now so you have time to order the things you need to make your conference experience better and get it shipped!).

Keep things you need handy

I expected to have some downtime every now and then to run back to my hotel room and grab something I needed for later in the day (like a power cord), but quickly realized that I wouldn't have downtime. Instead, I ended up attending many awesomesauce presentations, BoFs (Birds of a Feather gatherings), core conversations, and informal meetings continuously, from the time I got into the convention floors until about 8 p.m. (and later!).

Bring a bag large enough to hold your laptop or iPad, a charger, a few snacks (granola bars are great!), and any other little devices or chargers you'll need during the day.

Power to the People!

Monster Outlets to GoHotels and convention centers have a very low AC outlet / conference attendee ratio. Usually something like 1:100. Most laptops' batteries last 3-5 hours. You're going to have your laptop on and with you all day, and the battery will die if you don't charge up every now and then.

One of the best things you can do, especially if you want people to not hate you for hogging an entire outlet for one laptop charger, is buy a travel power strip, like the one I bought for this year's DrupalCon—Monster's Outlets to Go Powerstrip*. There are a few other options out there, but I like this one the most due to its compactness. Some adapters even include or two USB plugs (though not all are created equal—check to make sure the USB plugs provide enough power to charge your device!).

Instead of hogging a wall jack all to yourself, you can now power one or two of your own devices, and let one or two other people charge their devices.

For non-US residents, be sure you have the proper power adapters for your devices!

Don't only go to sessions

I made the mistake of trying to attend every session that piqued my interest last year. It wasn't until the last day of the conference that I hopped out of a session that had lost my interest and found that I was missing some of the best parts of DrupalCon:

  • Birds of a Feather gatherings (people basically come together and talk about/work through things things they have in common, like newspaper websites, Church sites, or a passion for DevOps!).
  • Core Conversations (people who want to make Drupal and Drupal.org better come together and, well, make Drupal and Drupal.org better).
  • The Expo area (talking to some of the people in Drupal consultancies, or people from hosting providers, or anyone else on the expo floor, is pretty enriching).
  • The community (getting to meet people I converse with every week on drupal.org, in IRC, etc. is awesome).

Don't get me wrong; the sessions are awesome, but there's so much more to DrupalCon. Don't miss out!

Are you presenting? Don't forget these things!

Apple power extension adapterIf you're presenting, don't presume that everything will be ready for you. Even the best planned events sometimes go a little awry—there's no power outlet, the projector only has HDMI when you only have a DVI adapter, etc.

A couple things that I never forget when traveling and presenting:

  • My extended power cord for my MacBook Air laptop—without it, I only get about 5' between my laptop and an outlet. With it, I get almost 10'. I never present without the laptop plugged in (see: Murphy's Law).
  • Every Mini Display Port-to-anything adapter I have. VGA, DVI, HDMI. Bring adapters for your own laptop... though you can usually borrow one if you need it.
  • A presenter's remote (if you want to move about during your presentation). My favorite is the Kensington Wireless Presenter.

Even if the presenter's manual says you'll be provided with something (power, cables, a microphone...), be prepared for the worst.

Bring an Ethernet cable (or two!)

Even an incredibly-well-planned conference like DrupalCon is a WiFi network administrator's nightmare. With a few thousand attendees, you're talking about 5,000+ wireless devices (everyone seems to have a laptop, tablet, and smartphone). At times, even cell service can be spotty (especially if you use AT&T or Verizon, since a couple thousand other attendees use the same cell as you!).

No matter the planning and number of access points, a wired connection will almost always beat out wireless. And there are usually a few areas you can find someone that has a hub set up to tie into the wired network. If you need to do some things that require a stable connection, having an ethernet cable (and, if you're like me, the proper USB adapter for your MacBook Air) can be a godsend!

Come for the Community

Whatever you do, talk to people! Drupal is awesome because it's a great platform, but it's even more amazing because of the people who use it, develop it, promote it, etc. Talk to other attendees, meet people you only know through their Drupal.org profiles, and have a fun time!

Anything I missed? Share it in the comments.

* Whenever I link to Amazon products, I use my affiliate links. You can just search for the items if you don't want to use the affiliate links, but it helps me get a few cents if you buy something through my affiliate links :) It has also been pointed out to me that Monster may not be a very nice company. YMMV :-/

Mar 09 2012
Mar 09

I had a rather interesting feature to implement on flocknote lately (after doing a pretty vast redesign of the UX/UI on the site over the past month... it was refreshing to dig into PHP again!):

We want to allow insertion of YouTube and Vimeo (and potentially other) videos into 'Notes' on the site, and there are a few moving parts in this equation:

  • I had to create a text format filter similar to the 'Embedded media inline' module in Drupal 6 so people could simply put a 'merge tag' in their Note (like [video=URL]) where they want the video to appear.
  • When a user views the embedded video on the site, the video should show at a uniform width/height, and be able to play the video (basically, a merge tag the user enters should be converted to the proper embed code for the provider (in this case, an <iframe> with the proper formatting).
  • When a user sees the video in the note email, the video can't actually play since very few email clients support any kind of video embedded in an email. So, instead, the video shows as a frame with a play button on top (this is the trickiest part), and links to the video on YouTube, Vimeo, etc.

Creating my own Image Effect for a Video Play Button

What I wanted to end up with was an image that had a custom-made iOS-style play button (play icon in a circle with a translucent grey background) right in the middle (I like the simple look of videos on my iPad...):

Video Play Button Example

So, I decided to work with Drupal's Image Effect API and expose a new image effect, aptly named 'Video Play Button', to Drupal's simple set of 'Resize, Scale, etc.' image effects. This is a pretty simple process:

  1. Implement hook_image_effect_info() to tell Drupal about the new effect.
  2. Process the image (in $image->resource) in the 'effect callback' that you defined in hook_image_effect_info().

In my case, I calculated the center of the image to be processed, then subtracted half the play button's width and height (respectively) from the center dimensions, and used those dimensions, along with the image handle ($image->resource) and the play button image (I used drupal_get_path() to get the path to my custom module directory, and put the image in 'images/play-button.png') to build the final graphic using PHP GD library's imagecopy() function.

Here's the image effect info hook implementation and callback I wrote to put the play button on top of the image:

<?php
/**
* Implements hook_image_effect_info().
*/
function mymodule_image_effect_info() {
  return array(
   
'mymodule_video_play_button' => array(
     
'label' => t('Video Play Button'),
     
'help' => t('Adds a video play button in the middle of a given image.'),
     
'effect callback' => 'mymodule_video_play_button_callback',
     
'dimensions passthrough' => TRUE,
    ),
  );
}
/**
* Video Play Button image callback.
*
* Adds a video play button on top of a given image.
*
* @param $image
*   An image object returned by image_load().
*
* @return
*   TRUE on success. FALSE on failure to colorize image.
*/
function mymodule_video_play_button_callback(&$image) {
 
// Make sure the imagecopymerge() function exists (in GD image library).
 
if (!function_exists('imagecopymerge')) {
   
watchdog('image', 'The image %image could not be processed because the imagecopymerge() function is not available in this PHP installation.', array('%file' => $image->source));
    return
FALSE;
  } 
// Verify that Drupal is using the PHP GD library for image manipulations
  // since this effect depends on functions in the GD library.
 
if ($image->toolkit != 'gd') {
   
watchdog('image', 'Image processing failed on %path. Using non GD toolkit.', array('%path' => $image->source), WATCHDOG_ERROR);
    return
FALSE;
  } 
// Calculate the proper coordinates for placing the play button in the middle.
 
$destination_x = ($image->info['width'] / 2) - 35;
 
$destination_y = ($image->info['height'] / 2) - 35// Load the play button image.
 
$play_button_image = imagecreatefrompng(drupal_get_path('module', 'mymodule') . '/images/play-button.png');
 
imagealphablending($play_button_image, TRUE); // Preserve transparency.
 
imagealphablending($image->resource, TRUE); // Preserve transparency.  // Use imagecopy() to place the play button over the image.
 
imagecopy(
   
$image->resource, // Destination image.
   
$play_button_image, // Source image.
   
$destination_x, // Destination x coordinate.
   
$destination_y, // Destination y coordinate.
   
0, // Source x coordinate.
   
0, // Source y coordinate.
   
70, // Source width.
   
70 // Source height.
 
);  return TRUE;
}
?>

...and a PSD of the play button is attached, in case someone else wants to save themselves 10 minutes' drawing in Photoshop :)

There's another great example image effect, if you want to look at more examples, in the Examples for Developers modules' image_example.module.

imagecopy() vs. imagecopymerge()

...and Photoshop Save for Web vs. PNGOut optimization...

I spent almost an hour working on a couple different problems I encountered caused partly by the fact that I was using a compressed/optimized PNG file, and partly by the fact that I was misreading the PHP.net documentation for two GD library image copy functions, imagecopy() and imagecopymerge().

First of all, instead of spending a ton of time struggling with weird file dimension issues, transparency issues, etc., and thinking your code is causing the problem—even though it may—also try different image files or try exporting the image file you're manipulating/using a different way. In my case, the image I was using was run through PNGout to remove any extraneous data, but apparently too much data was removed for PHP's GD library to understand the file correctly—in my case, the file's dimensions were distorted, the alpha transparency was not respected, and the image had lines of interpolation... all because I had tried to use an optimized PNG instead of the direct 'Save for Web...' image from Photoshop.

With regard to GD image functions, imagecopy() allows you to put one image on top of another one, hopefully preserving alpha transparency, etc., while imagecopymerge() puts an image on top of the other without preserving alpha transparency, but while allowing you to set the opacity of the source image manually (from 0-100%). I was originally trying to get imagecopymerge() to put a circle 'play' button (iOS-style) on top of the video image, but I found that the function was putting a square frame with a grey background instead of the nice transparent area around the circle. Switching to imagecopy() seemed to preserve the 24-bit PNG alpha transparency better.

This bug report on php.net was especially enlightening when I was researching why imagecopymerge() wasn't working for me.

Conclusion

There are a few other moving parts to this equation, like retrieving the YouTube or Vimeo video frames, building the proper markup for different displays (on-site, email, mobile, etc.), etc., that I haven't gone into here, but I figured I'd share my experience creating a custom image effect here in case someone else wants to do something similar (like put watermarks on images for a photo site, or something like that).

Jan 04 2012
Jan 04

Most of the time, Drupal's convention of printing comments and the comment form inside the node template (node.tpl.php) is desirable, and doesn't cause any headaches.

However, I've had a few cases where I wanted to either put comments and the comment form in another place on the page, and in the most recent case, I asked around to see what people recommended for moving comments out of the normal rendering method. I found a few mentions of using Panels, and also noticed the Commentsblock module that does something like this using Views.

However, I just wanted to grab the normal comment information, and stick it directly into a block, and put that block somewhere else. I didn't want Views' overhead, or to have to re-theme and tweak things in Views, since I already have a firm grasp of comment rendering and form theming with the core comment display.

So, I set out to do something similar to this comment on drupal.org (which was also suggested by Jimajamma on Drupal Answers).

First, I had to hide the comments from the normal rendering pipeline in node.tpl.php, which involved using template_preprocess_node() to set 'comment' to 0, and a check in node.tpl.php to make sure $content['comments'] would only be rendered if $comment evaluated to TRUE:

<?php
function THEMENAME_preprocess_node(&$variables) {
 
// For note nodes, disable comments in the node template.
 
if ($variables['type'] == 'note') {
   
$variables['comment'] = 0;
  }
}
?>

Then, I simply built a block in my custom module, and used the magic of comment_node_page_additions() to render the comments and comment form, just as they would render under the node, except in my own, spiffy comment block:

<?php
/**
* Implements hook_node_view().
*/
function MODULENAME_node_view($node, $view_mode) {
  global
$node_comments// Store node comments in global variable so we can put them in a block.
 
if ($node->type == 'note' && isset($node->content['comments'])) {
   
$node_comments = $node->content['comments'];
  }
}
/**
* Implements hook_block_info().
*/
function MODULENAME_block_info() {
 
$blocks['note_comments'] = array(
   
'info' => t('Note Comments'),
   
'cache' => DRUPAL_NO_CACHE,
  );
  return
$blocks;
}
/**
* Implements hook_block_view().
*/
function MODULENAME_block_view($delta = '') {
  global
$user;
  global
$node_comments;
 
$block = array();
  if (
$delta == 'note_comments') {
   
// Get the active menu object.
   
if ($node = menu_get_object()) {
     
// Make sure user is viewing a note.
     
if ($node->type == 'note') {
       
$block['content'] = '';
       
// Set the title of the block.
       
$block['subject'] = NULL;
       
// Render the comments and comment form (access checks, etc. are done
        // by comment_node_page_additions()).
       
$block['content'] .= drupal_render($node_comments);
      }
    }
  }
  return
$block;
}
?>

Then, after a quick trip to the Configure > Blocks page, where I assigned my block to a region, I had a slick comments block that I could render anywhere!

Dec 24 2011
Dec 24

apachebench is an excellent performance and load-testing tool for any website, and Drupal-based sites are no exception. A lot of Drupal sites, though, need to be measured not only under heavy anonymous traffic load (users who aren't logged in), but also under heavy authenticated-user load.

Drupal.org has some good tips for ab testing, but the details for using ab's '-C' option (notice the capital C... C is for Cookie) are lacking. Basically, if you pass the -C option with a valid session ID/cookie, Drupal will send ab the page as if ab were authenticated.

Instead of constantly going into the database and looking up session IDs and such nonsense, I have a simple script, which is quite revised from the 2008-era script originally from 2bits that worked with Drupal 5, which will give you the proper ab commands for stress-testing your Drupal site under authenticated user load. Simply copy the attached script (source pasted below) to your site's docroot, and run the command from the command line as follows:

# [PATH_TO_SCRIPT] [HTTP_HOST] [URL_TO_TEST] [#_SESSIONS] [#_REQUESTS]
$ /path/to/drupal/root/ab-testing-cli.php www.example.com http://www.example.com/node/1 2 10

You'll get back the command to paste into the cli in order to test the URL you provided as an authenticated user. (Note: The sessions table needs to be populated for this to work, so someone (or a few someones) will need to have logged in during the past few hours/days for this to work correctly).

Here's the full code (file attached to bottom of post):

<?php
/**
* @file
*
* Script to generate ab tests for logged in users using sessions from database.
* This script is based on an older script by 2bits for load testing Drupal 5,
* located at: http://goo.gl/4pfku
*
* Place this script into the webroot of your Drupal site.
*
* Usage (from command line):
*   # [PATH_TO_SCRIPT] [HTTP_HOST] [URL_TO_TEST] [#_SESSIONS] [#_REQUESTS]
*   $ php /path/to/drupal/root/ab-testing-cli.php example.com http://www.example.com/ 2 200
*
* After the script runs, it will output a list of commands for you to use to
* test your website as a logged-in user.
*/// Set the variable below to your Drupal root (on the server).
$drupal_root = '/path/to/drupal/root/';// If arguments not supplied properly, warn user.
if ($argc != 5) {
 
$prog = basename($argv[0]);
  print
"Usage: $prog host url concurrency num_requests\n";
  exit(
1);
}
// Get the arguments for ab.
$url = $argv[2];
$number_concurrency = $argv[3];
$number_requests = $argv[4];// Set this directory to your drupal root directory.
chdir($drupal_root);// Set up required variables to help Drupal bootstrap the correct site.
$_SERVER['HTTP_HOST'] = $argv[1];
$_SERVER['PHP_SELF'] = basename(__file__);
$_SERVER['REMOTE_ADDR'] = '127.0.0.1';
define('DRUPAL_ROOT', getcwd());// Boostrap Drupal.
require_once('./includes/bootstrap.inc');
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);// Get as many sessions as the user calls for.
$results = db_query_range("SELECT sid FROM {sessions} WHERE uid > 1", 0, $number_concurrency)->fetchAll();// Loop through the results and print the proper ab command for each session.
foreach ($results as $result) {
 
$cookie = session_name() . '=' . $result->sid;
  print
"ab -c 1 -n $number_requests -C $cookie$url\n";
}
?>
Dec 14 2011
Dec 14

You can do a lot of great things with field display in Drupal 7's 'manage display' tab for a content type. You can control the order and label position of each field attached to a node type in that tab for Full node displays, Teasers, and RSS displays (or other displays you set up).

However, there's no way to change certain aspects of a node's display inside an RSS Feed, such as the 'creator' tag, the 'link' tag, or the 'title' tag. For a news aggregation site I run, I wanted to modify the <link> tag when displaying 'story' nodes, and make the link tag give an absolute URL to the original source instead of to my drupal site (so, instead of http://www.mysite.com/node/12, it would go to http://www.example.com/original-story-url).

A lot of blogs also use this kind of format for reposted blog items (such as Daring Fireball), so users go straight to the source when they click on the title of an item in their RSS reader of choice. My method below can be modified to conditionally change a link if a field has a value (say, a 'RSS absolute URL' field or something like that).

For Drupal 6, some people had suggested using Views RSS for this purpose (it would let me manage a Views-provided feed display with fields instead of using Drupal's built-in node/teaser display), but this module doesn't have a stable D7 release, and it won't help me change things for Drupal's built in feeds.

For Drupal 7, all you need to do is implement hook_node_view() in a custom module, and change the $node->link value to whatever you want:

<?php
/**
* Implements hook_node_view().
*
* For story nodes in RSS feeds, use field_story_url for link element.
*/
function custom_node_view($node, $view_mode, $langcode) {
  if (
$node->type == 'story' && $view_mode == 'rss') {
   
$node->link = $node->field_story_url[$node->language][0]['url'];
  }
}
?>

Easy peasy. If you want to conditionally change the feed item <link> (say, only change the link value if $field_story_url has a value), change the line to:

<?php
    $node
->link = (empty($node->field_story_url)) ? $node->link : $node->field_story_url[$node->language][0]['url'];
?>

You can also change things like $node->title to change what's in the RSS feed's <title> tag.

Dec 07 2011
Dec 07

After reading A successful Git branching model [nvie.com], which I consider one of the best graphical/textual depictions of the ideal Git model for development teams (and most large projects), I simply wanted to adapt a similar (but way less complex) model for some of my smaller sites and multisite Drupal installs.

Since I'm (almost always) the only developer, and I develop locally, I don't want the complexity of working on many branches at once (master, hotfixes, develop, release, staging, etc...), but I do want to have a clean separation between what I'm working on and the actual live master branch that I deploy to the server.

So, I've adopted a simple 'feature branch model' for my smaller projects:

  • master - the live/production code. Only touch when merging in a feature or simply fixing little bugs or really pressing problems.
  • [issue-number]-feature-branches - Where I work on stuff.

Graphically:

Feature branch model

Any time I work on something more complicated than a simple styling tweak, or a fix for a WSOD or something like that, I simply create a feature branch (usually with an issue number that matches up to my internal tracking system). Something like 374-add-node-wizard:

# create (-b) and checkout the 374-add-node-wizard branch.
$ git checkout -b 374-add-node-wizard

While I'm working on the node wizard (which could take a week or two), I might make a couple little fixes on the master branch. After I make the fixes on master (switch to it using $ git checkout master), I switch back to my feature branch and rebase my feature branch:

$ git checkout 374-add-node-wizard # switch back to the feature branch
$ git rebase master # pull in all the latest code from the master branch

I can also create simple .patch files off a branch to pass my work to another server or a friend if I want (I like using patches instead of pushing around branches, simply because patch files are easier for people to grok than more complicated git maneuvers):

# create a diff/patch file from the checked out branch.
$ git diff master..374-add-node-wizard > 374-add-node-wizard-patch.patch

When I finish my work on the feature branch, I switch back to master, merge in the branch, and delete the branch. All done!

$ git checkout master # switch back to master
$ git merge --no-ff 374-add-node-wizard # merge feature branch back into master
$ git branch -d 374-add-node-wizard # delete the feature branch

Finally, I test everything to make sure it's working fine in master, and then push the code changes up to the server.

Since I'm developing alone, this is a lot easier than a more complicated branching setup, and it allows me to work on as many features as I want, without fear of messing things up on master, or having merge conflicts (I rebase early and often).

(Note: I usually work in the command line, because I'm more comfortable knowing what git is doing that way... but I often open up Tower (imo, the best application for visual Git) to inspect branches, commits, and merges/rebases... some people would probably rather just use Tower for everything).

(Note 2: When creating patches to send to someone that include binary files (like a png or a gif, jpeg, whatever), make sure you use $ git diff --full-index --binary [old]..[new] > patchfile.patch so git doesn't barf when you try applying the patch on someone else's end...).

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web