Apr 09 2015
Apr 09

Posts in this series:

In earlier Solr for Drupal Developers posts, you learned about Apache Solr and it's history in and integration with Drupal. In this post, I'm going to walk you through a quick guide to getting Apache Solr running on your local workstation so you can test it out with a Drupal site you're working on.

The guide below is for those using Mac or Linux workstations, but if you're using Windows (or even if you run Mac or Linux), you can use Drupal VM instead, which optionally installs Apache Solr alongside Drupal.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Installing Apache Solr in a Virtual Machine

Apache Solr can be run directly from any computer that has Java 1.7 or later, so technically you could run it on any modern Mac, Windows, or Linux workstation natively. But to keep your local workstation cleaner, and to save time and hassle (especially if you don't want to kludge your computer with a Java runtime!), this guide will show you how to set up an Apache Solr virtual machine using Vagrant, VirtualBox, and Ansible.

Let's get started:

  1. Clone the ansible-vagrant-examples project from GitHub (you can also download ansible-vagrant-examples directly).
  2. Change directory in Terminal to the /solr subdirectory, and follow the instructions in the Solr example's README for installing Vagrant, VirtualBox, and Ansible, then follow the rest of the instructions for building that example (e.g. vagrant up).
  3. At this point, if you visit http://192.168.33.44:8983/solr in your browser, you should see the Apache Solr admin interface:
    Apache Solr Administration Dashboard - 4.10
  4. The next step is to point your local Drupal installation (assuming you have a Drupal site running locally) at this Solr instance and make sure it can connect. We're using the Apache Solr Search module in this example, but Search API Solr Search setup is similar.
    1. Visit /admin/config/search/apachesolr/settings, and click 'Add search environment'.
    2. Enter http://192.168.33.44:8983/solr/collection1 (this is the default search core that Apache Solr includes out of the box) for 'Solr server URL', check the checkbox to make this the default environment, add a description (e.g. 'Local Solr server'), and click 'Save':
      Drupal Apache Solr module search environment configuration form
    3. After saving the new environment, the next page should show the environment with a green-colored background. That means your Drupal site can connect to the Solr server.
  5. After Drupal is able to connect, you need to add the Drupal module's Solr configuration files to the search core you'll be using. This takes a few steps, but will ensure all your Drupal content is indexed by Solr correctly.
    1. Change directory in Terminal to the /solr directory (where you ran vagrant up earlier), and run vagrant ssh to log into the Solr VM.
    2. While logged into the VM, enter the following commands:
      1. curl http://ftp.drupal.org/files/projects/apachesolr-7.x-1.x-dev.tar.gz | tar -xz (download the Apache Solr module into the current directory).
      2. sudo cp -r apachesolr/solr-conf/solr-4.x/* /var/solr/collection1/conf/ (copy the Apache Solr module configuration into the default Solr core).
      3. sudo chown -R solr:solr /var/solr/collection1/conf/* (fix permissions for the copied files).
      4. sudo service solr restart (restart Apache Solr so the configuration is updated).
    3. Once this is complete, go back to the Apache Solr search settings page (/admin/config/search/apachesolr/settings), and click on the 'Index' configuration in your local solr server row. You should see something like drupal-4.3-solr-4.x for the 'Schema', meaning the Drupal module schema.xml has been picked up successfully.

At this point, you should be able to index your site content into Apache Solr (scroll down and check some content types you want to index), and start playing around with Apache Solr search!

The best first steps are to look around in all the Apache Solr configuration pages, test indexing your entire site, then work on setting up search pages and maybe even install the Facet API module to configure some search facets. In very little time, you should be able to make your site search as user-friendly and speedy as Amazon, Newegg, etc.

Further Reading

Apr 02 2015
Apr 02

Drupal.org has an excellent resource page to help you create a static archive of a Drupal site. The page references tools and techniques to take your dynamically-generated Drupal site and turn it into a static HTML site with all the right resources so you can put the site on mothballs.

From time to time, one of Midwestern Mac's hosted sites is no longer updated (e.g. LOLSaints.com), or the event for which the site was created has long since passed (e.g. the 2014 DrupalCamp STL site).

I though I'd document my own workflow for converting typical Drupal 6 and 7 sites to static HTML to be served up on a simple Apache or Nginx web server without PHP, MySQL, or any other special software, since I do a few special things to preserve the original URL alias structure, keep CSS, JS and images in order, and make sure redirections still work properly.

1 - Disable forms and any non-static-friendly modules

The Drupal.org page above has some good guidelines, but basically, you need to make sure to all the 'dynamic' aspects of the site are disabled—turn off all forms, turn off modules that use AJAX requests (like Fivestar voting), turn off search (if it's using Solr or Drupal's built-in search), and make sure AJAX and exposed filters are disabled in all views on the site—a fully static site doesn't support this kind of functionality, and if you leave it in place, there will be a lot of broken functionality.

2 - Download a verbatim copy of the site with SiteSucker

CLI utilities like HTTrack and wget can be used to download a site, using a specific set of parameters to make sure the download is executed correctly, but since I only convert one or two sites per year, I like the easier interface provided by SiteSucker.

SiteSucker lets you set options for a download (you can save your custom presets if you like), and then it gives a good overview of the entire download process:

SiteSucker Drupal Download Site

I change the following settings from the defaults to make the download go faster and result in a mostly-unmodified download of the site:

  • General
    • Ignore Robot Exclusions
      (If you have a slower or shared server and hundreds or thousands of pages on the site, you might not want to check this box—Ignoring the exclusions and the crawler delay can greatly increase the load on a slow or misconfigured webserver when crawling a Drupal site).
    • Always Download HTML and CSS
    • File Modification: None
    • Path Constraint: Host
  • Webpage
    • Include Supporting Files

After the download completes, I zip up the archive for the site, transfer it to my static Apache server, and set up the virtualhost for the site like any other virtualhost. To test things out, I point the domain for my site to the new server in my local /etc/hosts file, and visit the site.

3 - Make Drupal paths work using Apache rewrites

Once you're finished getting all the files downloaded, there are some additional things you need to configure on the webserver level—in this case, Apache—to make sure that file paths and directories work properly on your now-static site.

A couple neat tricks:

  • You can preserve Drupal pager functionality without having to modify the actual links in HTML files by setting DirectorySlash Off (otherwise Apache will inject an extra / in the URL and cause weird side effects), then setting up a specialized rewrite using mod_rewrite rules.
  • You can redirect links to /node (or whatever was configured as the 'front page' in Drupal) to / with another mod_rewrite rule.
  • You can preserve links to pages that are now also directories in the static download using another mod_rewrite rule (e.g. if you have a page at /archive that should load archive.html, and there are also pages accessible at /archive/xyz, then you need a rule to make sure a request to /archive loads the HTML file, and doesn't try loading a directory index!).
  • Since the site is now static, and presumably won't be seeing much change, you can set far future expires headers for all resources so browsers can cache them for a long period of time (see the mod_expires section in the example below).

Here's the base set of rules that I put into a .htaccess file in the document root of the static site on an Apache server for static sites created from Drupal sites:

<IfModule mod_dir.c>
  # Without this directive, directory access rewrites and pagers don't work
  # correctly. See 'Rewrite directory accesses' rule below.
  DirectorySlash Off
</IfModule>

<IfModule mod_rewrite.c>
  RewriteEngine On

  # Fix /node pagers (e.g. '/node?page=1').
  RewriteCond %{REQUEST_URI} ^/node$
  RewriteCond %{QUERY_STRING} ^page=(.+$)
  RewriteRule ^([^\.]+)$ index-page=%1.html [NC,L]

  # Fix other pagers (e.g. '/archive?page=1').
  RewriteCond %{REQUEST_URI} !^/node$
  RewriteCond %{QUERY_STRING} ^page=(.+$)
  RewriteRule ^([^\.]+)$ $1-page=%1.html [NC,L]

  # Redirect /node to home.
  RewriteCond %{QUERY_STRING} !^page=.+$
  RewriteRule ^node$ / [L,R=301]

  # Rewrite directory accesses to 'directory.html'.
  RewriteCond %{REQUEST_FILENAME} -d
  RewriteCond %{QUERY_STRING} !^page=.+$
  RewriteRule ^(.+[^/])/$ $1.html [NC,L]

  # If no extension included with the request URL, invisibly rewrite to .html.
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteRule ^([^\.]+)$ $1.html [NC,L]

  # Redirect non-www to www.
  RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
  RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
</IfModule>

<IfModule mod_expires.c>
  ExpiresActive On
  <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
    ExpiresDefault "access plus 1 year"
  </FilesMatch>
</IfModule>

Alternative method using a localized copy of the site

Another more time-consuming method is to download a localized copy of the site (where links are transformed to be relative, linking directly to .html files instead of the normal Drupal paths (e.g. /archive.html instead of /archive). To do this, download the site using SiteSucker as outlined above, but select 'Localize' for the 'File Modification' option in the General settings.

There are some regex-based replacements that can clean up this localized copy, depending on how you want to use it. If you use Sublime Text, you can use these for project-wide find and replace, and use the 'Save All' and 'Close All Files' options after each find/replace operation.

I'm adding these regexes to this post in case you might find one or more of them useful—sometimes I have needed to use one or more of them, other times none:

Convert links to index.html to links to /:

  • Find: (<a href="http://www.midwesternmac.com/blogs/jeff-geerling/drupal-on-mothballs-convert-static-html/)[\.\./]+?index\.html(")
  • Replace: \1/\2

Remove .html in internal links:

  • Find: (<a href="http://www.midwesternmac.com/blogs/jeff-geerling/drupal-on-mothballs-convert-static-html/[^http].+)\.html(")
  • Replace: \1\2

Fix one-off link problems (e.g. Feedburner links detected as internal links):

  • Find: (href="http://www.midwesternmac.com/blogs/jeff-geerling/drupal-on-mothballs-convert-static-html/).+(feeds2?.feedburner)
  • Replace: \1http://\2

Fix other home page links that were missed earlier:

  • Find: href="index"
  • Replace: href="http://www.midwesternmac.com/"

Fix relative links like ../../page:

  • Find: ((href|src)=")[\.\./]+(.+?")
  • Replace: \1/\3

Fix relative links in top-level files:

  • Find: ((href|src)=")([^/][^http].+?")
  • Replace: \1/\3

This secondary method can sometimes make for a static site that's easier to test locally or distribute offline, but I've only ever localized the site like this once or twice, since the other method is generally easier to get going and doesn't require a ton of regex-based manipulation.

Mar 21 2015
Mar 21

Earlier today, I gave a presentation on Ansible and Drupal 8 at MidCamp in Chicago. In the presentation, I introduced Ansible, then deployed and updated a Drupal 8 site on a cluster of 6 Raspberry Pi computers, nicknamed the Dramble.

Video from the presentation is below (sadly, slides/voice only—you can't see the actual cluster of Raspberry Pis... for that, come see me in person sometime!):

[embedded content]

My slides from the presentation are embedded below, and I'll be posting a video of the presentation as soon as it's available.

Ansible + Drupal: A Fortuitous DevOps Match from geerlingguy

Mar 21 2015
Mar 21

MidCamp Camp Organizers sign

On March 21, 2015, there was a fairly well-attended Camp Organizers BoF at MidCamp in Chicago. I took notes during the BoF and am simply publishing them here for the benefit of camp organizers in the Drupal Community. They're fairly raw, but hopefully they'll be helpful for you!

Camps Represented

  • DrupalCorn (Iowa)
  • MidCamp (Chicago)
  • RADCamp (potential future camp)
  • BADCamp (San Fransisco)
  • DrupalCamp STL (St. Louis)
  • DrupalNorth (Toronto)
  • DrupalCamp Costa Rica - July 29-31

Ideas

  • Add your camp dates to:
  • Pooling resources for things like signage, printed booklets, etc.
  • Video recording equipment at MidCamp is in 3rd revision; next step is to make things 'less unweildy'
  • Pre-communication: especially with speakers
  • Have some people who don't have particular responsibilities (mainly), but just 'do things that need to be done'
    • Some camps have 'runners' (this role, day of and/or before hand)
  • Camp doesn't run itself—but if you can put together a good team, you can make amazing things happen.
  • Best part of this BoF / sharing in camps: cross-pollination for ideas, layout, etc.
  • Bluespark + RedHat thinking about starting a 'RADCamp' :)
  • WiFi: Made sure there were additional access points here, helped give bandwidth/reliable connectivity
  • Some things work great at one camp, disastrous at another (e.g. video recording at Fox Valley vs. BADCamp vs. MidCamp)
  • Good to get a list together of 'things you could do at a DrupalCamp' (e.g. board game night)
  • Resource guide like Rails Girls
    • A kit of information for DrupalCamps
    • Documentation is on GitHub; allow improvements/contributions via GitHub PRs
  • Date selection:
    • Not just a North American problem; in Europe, camp dates run into each other pretty frequently as well.
    • Mailing list: not necessarily the most effective way to organize dates; needs to have low signal/noise ratio.
    • Drupical, coordinate in #drupalcamp, etc. (not much consensus here)

Current resources

Video recording

  • Need sets of video recording equipment for session recording
  • Working on getting equipment into small/neat 'packages', and have them transportable in a pelican case or something
  • MidCamp PVR kit
    • About 3 lbs, costs about $425
    • Currently in 'beta 3' (Fox Valley, BADCamp, MidCamp)
    • Uses HDMI, requires some dongles for VGA and other formats
    • Records audio, but also has a backup Zoom voice recorder
    • Doesn't work great with older PC laptops
    • Goal is for DA and/or Camps to ship around the equipment
    • More information: Blue Drop Shop
  • Pain points:
    • Training the speakers (make sure they do things in the right order)
    • Packaging the parts so they're simpler to set up and use

Dates

  • DrupalCon moving around all the time upsets all the DrupalCamp apple carts!
  • Drupal Association should hopefully be able to help coordinate dates a little better.

Accessibility

  • MidCamp had a few unique accessibility features, e.g. blue tape lines throughout venue for accessible paths, pre-camp online walkthrough
  • Pre-walkthrough:
    • Included pictures for how to get through the actual venue
    • Give plenty of detail/guides/signage for physical location
    • Go around the venue, take lots of pictures, make sure you take lots of notes
  • Wanted to do three more things:
    1. Pay for a 'sprint room' so we have one every day that we do sprints
    2. Pay for captioning of talks while they're happening
    3. Pay for ASL for talks while they're happening
  • ASL Interpreter / Closed Captioning
    • about $125/hour per interpreter... but if it's more than 1 hour, 2 required
    • 2 hour minimum
    • Another idea: Skype the sessions to a remote transcriber, and that captioning can come back in real time to the screen
    • Issues:
      • Might need a second screen, and video post-production gets more complicated
      • Need a low-latency connection to get a skype-based transcription service working

Volunteers / Organizers

  • MidCamp
    • 10 regular volunteers
    • 6 people who came and went
    • Question: How many organizers do you need, how much committment do you need?
    • Some people came in and did small things, then stopped (e.g. putting tape on the floor). That's fine!
  • Need to help people figure out how many organizers / volunteers are needed.
  • Session on recruiting and retaining dedicated volunteers

Session selection

  • MidCamp had anonymous session selection.
    • Problems:
      • Quality: One group said sessions could have presenters who are terrible, making for terrible camp
      • Homogeneity: Worry that entire group of speakers would be a bunch of white guys (not very diverse).
    • Answers:
      • Quality:
        • Past performance doesn't necessarily indicate future success
        • "Take really good care of the presenters"
        • Send out reminders for speakers, help train them, improve them, etc.
        • Communication to the speakers was amazing; lots of great feedback
      • Homogeneity:
        • We have unconscious biases when we know about things. Anonymous selection helps mitigate this.
        • Non-anonymous solicitation helps with this (intentionally email/solicit a more diverse group)
        • MidCamp has about 20% female speakers.
    • Solicitations:
      • YesCT had a huge list of topics that were good at camps; seeded the list to dozens of people via Twitter, email, etc.
    • Process:
      • Dumped everything to spreadsheet, removing all identifying information (e.g. pronouns, names, business, etc.)
      • Can't be 100% anonymous (e.g. 'everyone knows what Larry talks about')
      • Put UIDs on everything to track
      • 25% of people submitted 2 sessions—but picked only one session per presenter
      • Deduplicated all the sessions, made sure groupings were good
      • Read through the sessions before the selection meeting (individuals made some notes)
      • 6 or so people went through the list and voted "Yes/No" (marathon)
      • After selection, selection group was dispersed, but schedule needed to be made
    • "It's a lot of work"

Taking care of yourselves

  • Need to eat well, relax, use good posture, etc.
  • Services like message therapy, aroma therapy, etc.
    • BADCamp: "The Hippie Tent"
    • If no space for an entire tent, maybe at least a table for helping people 'be well'
Feb 26 2015
Feb 26

Dramble - 6 Raspberry Pi 2 model Bs running Drupal 8 on a cluster
Version 0.9.3 of the Dramble—running Drupal 8 on 6 Raspberry Pis

I've been tinkering with computers since I was a kid, but in the past ten or so years, mainstream computing has become more and more locked down, enclosed, lightweight, and, well, polished. I even wrote a blog post about how, nowadays, most computers are amazing. Long gone are the days when I had to worry about line voltage, IRQ settings, diagnosing bad capacitors, and replacing 40-pin cables that went bad!

But I'm always tempted back into my earlier years of more hardware-oriented hacking when I pull out one of my Raspberry Pi B+/A+ or Arduino Unos. These devices are as raw of modern computers as you can get—requiring you to actual touch the silicone chips and pins to be able to even use the devices. I've been building a temperature monitoring network that's based around a Node.js/Express app using Pis and Arduinos placed around my house. I've also been working a lot lately on a project that incorporates three of my current favorite technologies: The Raspberry Pi 2 model B (just announced earlier this month), Ansible, and Drupal!

In short, I'm building a cluster of Raspberry Pis, and designating it a 'Dramble'—a 'bramble' of Raspberry Pis running Drupal 8.

Motivation


This LED will light up that wonderful Drupal Blue, #0678BE

I've been giving a number of presentations on managing infrastructure with Ansible in the past couple years. And in the course of writing Ansible for DevOps (available on LeanPub!), I've done a lot of testing on VMs both locally and in the cloud.

But doing this testing on a 'local datacenter'—especially one that fits in the palm of my hand—is great for two reasons:

  • All networking is local; conferences don't always have the most stable networking, so I can do all my infrastructure testing on my own 'local cloud'.
  • It's pretty awesome to be able to hold a cluster of physical servers and a Gigabit network in my hand!

Lessons Learned (so far!)


Drool... I own these!

Building out the Pi-based infrastructure has taught me a lot about small-scale computing, efficient use of resources, benchmarking, and also how Drupal 8 differs (spoiler: it's way better) from Drupal 7 in terms of multi-server deployment and high-availability/high-performance configurations.

I've also learned:

Benchmarking


Wiring up the mini Cat5e network cables.

I've been benchmarking the heck out of this infrastructure, and besides finding that the major limiting factor with a bunch of low-cost computers is almost always slow I/O, I've found that:

  • On-the-fly gzip actually harms performance (in general) when your CPU isn't that fast.
  • Redis caching gives an immediate 15% speedup for Drupal 8.
  • Different microSD cards deliver order-of-magnitude speedups. As an example, one card took 20 minutes to import a 6MB database; another card? 9 seconds.
  • Drupal 8 is kinda slow (but I don't need to tell you that).
  • Still to come: Nginx vs. Apache with php-fpm, Nginx vs. Varnish for load balancing, Redis vs. Memcached for caching. MySQL vs. MariaDB for database. And more!

Since I have this nice little cluster of Raspberry Pis humming along using half the power of a standard light bulb, the sky is the limit! And the fact that the servers are slower and have different performance considerations than typical modern cloud-based infrastructure actually helps to expose certain performance-related flaws that I wouldn't have otherwise!

Finally, it helps me stay creative in finding ways to eke out another 50 KB/sec of bandwidth here, or 100 iops there :)

See the Dramble in person!

So why am I mentioning all this? Because I want to bring the Dramble with me to some Drupal events, and I'd love to share it with you, explain everything in more detail, and most importantly: demonstrate modern and easy Drupal 8 deployment with Ansible on it.

I'll be bringing it to #MidCamp in Chicago on Saturday, March 21, and I've also submitted a session for DrupalCon LA: Deploying Drupal 8 to Bare Metal with Ansible - Live!

I hope the session is selected and I can bring the Dramble with me to LA in a couple months :)

Also, if you haven't submitted your own session for DrupalCon LA, the deadline is Friday; go submit it now!

For more on the Dramble itself, check out the Raspberry Pi Dramble project on GitHub, and see what I'm working on over in the Dramble issue queue.

Dec 15 2014
Dec 15

I just posted a large excerpt from Ansible for DevOps over on the Server Check.in blog: Highly-Available Infrastructure Provisioning and Configuration with Ansible. In it, I describe a simple set of playbooks that configures a highly-available infrastructure primarily for PHP-based websites and web applications, using Varnish, Apache, Memcached, and MySQL, each configured in a way optimal for high-traffic and highly-available sites.

Here's a diagram of the ultimate infrastructure being built:

Highly Available Infrastructure

The configuration is similar to what many larger Drupal sites would use, and with the exception of the varnish default.vcl and the actual PHP script being deployed (in the example, it's just a PHP file that tests the rest of the infrastructure and outputs success/fail statuses), you could drop a Drupal site on the Apache servers and immediately start scaling up your traffic!

The example highlights the powerful simplicity of Ansible as a tool for not only configuration management (like Puppet, Chef, etc.), but also for provisioning and managing servers in different cloud providers. With under a hundred lines of YAML configuration, I can spin up the exact same infrastructure locally with Vagrant and VirtualBox, on DigitalOcean droplets, or on AWS EC2 instances!

Nov 24 2014
Nov 24

It's been a well-known fact that using native VirtualBox or VMWare shared folders is a terrible idea if you're developing a Drupal site (or some other site that uses thousands of files in hundreds of folders). The most common recommendation is to switch to NFS for shared folders.

NFS shared folders are a decent solution, and using NFS does indeed speed up performance quite a bit (usually on the order of 20-50x for a file-heavy framework like Drupal!). However, it has it's downsides: it requires extra effort to get running on Windows, requires NFS support inside the VM (not all Vagrant base boxes provide support by default), and is not actually all that fast—in comparison to native filesystem performance.

I was developing a relatively large Drupal site lately, with over 200 modules enabled, meaning there were literally thousands of files and hundreds of directories that Drupal would end up scanning/including on every page request. For some reason, even simple pages like admin forms would take 2+ seconds to load, and digging into the situation with XHProf, I found a likely culprit:

is_dir xhprof Drupal

There are a few ways to make this less painful when using NFS (since NFS incurs a slight overhead for every directory/file scan):

  • Use APC and set stat=0 to prevent file lookups (this is a non-starter, since that would mean every time I save a file in development, I would need to restart Apache or manually flush the PHP APC cache).
  • Increase PHP's realpath_cache_size ini variable, which defaults to '16K' (this has a small, but noticeable impact on performance).
  • Micro-optimize the NFS mounts by basically setting them up on your own outside of Vagrant's shared folder configuration (another non-starter... and the performance gains would be almost negligible).

I wanted to benchmark NFS against rsync shared folders (which I've discussed elsewhere), to see how much of a difference using VirtualBox's native filesystem can make.

For testing, I used a Drupal site with about 200 modules, and used XHProf to measure the combined Excl. Wall Time for calls to is_dir, readdir, opendir, and file_scan_directory. Here are my results after 8 test runs on each:

NFS shared folder:

  • 1.5s* (realpath_cache_size = 16K - PHP default)
  • 1.0s (realpath_cache_size = 1024K)
  • Average page load time: 1710ms (realpath_cache_size = 1024K, used admin/config/development/devel)

*Note: I had a two outliers on this test, where the time would go to as much as 6s, so I discarded those two results. But realize that, even though this NFS share is on a local/internal network, the fact that every file access goes through the full TCP stack of the guest VM, networking issues can make NFS performance unstable.

Native filesystem (using rsync shared folder):

  • 0.15s (realpath_cache_size = 16K - PHP default)
  • 0.1s (realpath_cache_size = 1024K)
  • Average page load time: 900ms (realpath_cache_size = 1024K, used admin/config/development/devel)

Tuning PHPs realpath_cache_size makes a meaningful difference (though not too great), since the default 16K cache doesn't handle a large Drupal site very well.

As you can see, there's really no contest—just as NFS is an order of magnitude faster than standard VirtualBox shared folders, native filesystem performance is an order of magnitude faster than NFS. Overall site page load times for the Drupal site I was testing went from 5-10s to 1-3s by switching from NFS to rsync!

I've updated my Drupal Development VM and Acquia Cloud VM to use rsync shares by default (though you can still configure NFS or any other supported share type), and to use a realpath_cache_size of 1024K). Hopefully Drupal developers everywhere will save a few minutes a day from these changes :)

Note that other causes for abysmal filesystem performance and many calls to is_dir, opendir, etc. may include things like a missing module or major networking issues. Generally, when fixing performance issues, it's best to eliminate the obvious, and only start digging deeper (like this post) when you don't find an obvious problem.

Notes on using rsync shared folders

Besides the comprehensive rsync shared folder documentation in Vagrant's official docs, here are a few tips to help you get up and running with rsync shared folders:

  • Use rsync__args to pass CLI options to rsync. The defaults are ["--verbose", "--archive", "--delete", "-z"], but if you want to preserve the files created within the shared folder on the guest, you can set this option, but without --delete.
  • Use rsync__exclude to exclude directories like .git and other non-essential directories that are unneccessary for running your application within the VM. While not incredibly impactful, it could shave a couple seconds off the rsync process.

Not all is perfect; there are a few weaknesses in the rsync model as it is currently implemented out-of-the-box:

  1. You have to either manually run vagrant rsync when you make a change (or have your IDE/editor run the command every time you save a file), or have vagrant rsync-auto running in the background while you work.
  2. rsync is currently one-way only (though there's an issue to add two-way sync support).
  3. Permissions can still be an issue, since permissions inside the VM sometimes require some trickery; read up on the rsync__chown option in the docs, and consider passing additional options to the rsync__args to manually configure permissions as you'd like.
Nov 14 2014
Nov 14

Drupal 8's expanded and broadly-used Entity API extends even to Contact Forms, and recently I needed to create a contact form programmatically as part of Honeypot's test suite. Normally, you can export a contact form as part of your site configuration, then when it's imported in a different site/environment, it will be set up simply and easily.

However, if you need to create a contact form programmatically (in code, dynamically), it's a rather simple affair:

First, use Drupal's ContactForm class at the top of the file so you can use the class in your code later:

<?php
use Drupal\contact\Entity\ContactForm;
?>

Then, create() and save() a ContactForm entity using:

<?php
    $feedback_form
= ContactForm::create([
     
'id' => 'help',
     
'label' => 'Help',
     
'recipients' => ['[email protected]'],
     
'reply' => '',
     
'weight' => 0,
    ]);
   
$feedback_form->save();
?>

If you also want to update the default contact form so you can set your new form as the default sitewide contact form category, you can do so by updating the global contact.settings.

<?php
$contact_settings
= \Drupal::config('contact.settings');
$contact_settings->set('default_form', 'help')->save();
?>

One of the things I'm most excited about in Drupal 8 is how this entire process is the same (or almost exactly so) for every kind of entity—and almost everything's an entity! Need to create a content type? A configuration entity? A node? User? Almost everything follows this pattern now, and Drupal 8's APIs are so much more easy to learn as a side effect.

Nov 13 2014
Nov 13

In support of my mission to make local development easier and faster, I've released boxes for four of the most popular Linux distributions I use and see used for Drupal sites: CentOS 6/7 and Ubuntu 12.04/14.04.

Vagrant Boxes - Midwestern Mac, LLC

I've been using other base boxes in the past, but it's hard to find updated boxes (especially for newer OSes) from people or companies you can trust that are truly minimal base boxes (e.g. no extra configuration management tools or junk to kludge up my development environment!). These boxes are all minimal installs that let you bring your own configuration however you want; I typically use an Ansible playbook to build a LAMP server, or a Solr server, or an ELK server for monitoring all the other servers...

You can find all the info on the boxes (including links to the Packer/Ansible build configuration used to create the boxes) on files.midwesternmac.com, and the boxes are also available on Vagrant Cloud: geerlingguy's boxes.

You can quickly build a Linux VM using Vagrant and VirtualBox for local Drupal development with vagrant init geerlingguy/[boxname] (e.g. for Ubuntu 14.04, vagrant init geerlingguy/ubuntu1404. These boxes are also used as the base boxes for the Drupal Development VM (which is currently being reworked to be much more powerful/flexible) and Acquia Cloud VM (which simulates the Acquia Cloud environment locally).

I'll be writing more about local development with these VMs as well as many other interesting DevOps-related tidbits in Ansible for DevOps, on this blog, and on the Server Check.in Blog.

Nov 06 2014
Nov 06

For all the sites I maintain, I have at least a local and production environment. Some projects warrant a dev, qa, etc. as well, but for the purposes of this post, let's just assume you often run drush commands on local or development environments during development, and eventually run a similar command on production during a deployment.

What happens if, at some point, you are churning through some Drush commands, using aliases (e.g. drush @site.local break-all-the-things to break things for testing), and you accidentally enter @site.prod instead of @site.local? Or what if you were doing something potentially disastrous, like deleting a database table locally so you can test a module install file, using drush sqlq to run a query?

$ drush @site.prod break-all-the-things -y
Everything is broken!                                    [sadpanda]

Most potentially-devastating drush commands will ask for confirmation (which could be overridden with a -y in the command), but I like having an extra layer of protection to make sure I don't do something dumb. If you use Bash for your shell session, you can put the following into your .profile or .bash_profile, and Bash will warn you whenever the string .prod is in one of your commands:

prod_command_trap () {
  if [[ $BASH_COMMAND == *.prod* ]]
  then
    read -p "Are you sure you want to run this command on prod [Y/n]? " -n 1 -r
    if [[ $REPLY =~ ^[Yy]$ ]]
    then
      echo -e "\nRunning command "$BASH_COMMAND" \n"
    else
      echo -e "\nCommand was not run.\n"
      return 1
    fi
  fi
}
shopt -s extdebug
trap prod_command_trap DEBUG

Now if I accidentally run a command on production I get a warning/confirmation before the command is run:

$ drush @site.prod break-all-the-things -y
Are you sure you want to run this command on prod [Y/n]?

This code, as well as other aliases and configuration I use to help my command-line usage more efficient, is also viewable in my Dotfiles repository on GitHub.

Oct 30 2014
Oct 30

I recently ran into an issue where drush vset was not setting a string variable (in this case, a time period that would be used in strtotime()) correctly:

# Didn't work:
$ drush vset custom_past_time '-1 day'
Unknown options: --0, --w, --e, --k.  See `drush help variable-set`      [error]
for available options. To suppress this error, add the option
--strict=0.

Using the --strict=0 option resulted in the variable being set to a value of "1".

After scratching my head a bit, trying different ways of escaping the string value, using single and double quotes, etc., I finally realized I could just use variable_set() with drush's php-eval command (shortcut ev):

# Success!
$ drush ev "variable_set('custom_past_time', '-1 day');"
$ drush vget custom_past_time
custom_past_time: '-1 day'

This worked perfectly and allowed me to go make sure my time was successfully set to one day in the past.

Oct 16 2014
Oct 16

Earlier today, the Drupal Security Team announced SA-CORE-2014-005 - Drupal core - SQL injection, a 'Highly Critical' bug in Drupal 7 core that could result in SQL injection, leading to a whole host of other problems.

While not a regular occurrence, this kind of vulnerability is disclosed from time to time—if not in Drupal core, in some popular contributed module, or in some package you have running on your Internet-connected servers. What's the best way to update your entire infrastructure (all your sites and servers) against a vulnerability like this, and fast? High profile sites could be quickly targeted by criminals, and need to be able to deploy a fix ASAP... and though lower-profile sites may not be immediately targeted, you can bet there will eventually be a malicious bot scanning for vulnerable sites, so these sites need to still apply the fix in a timely manner.

In this blog post, I'll show how I patched all of Midwestern Mac's Drupal 7 sites in less than 5 minutes.

Hotfixing Drupal core - many options

Before we begin, let me start off by saying there are many ways you can apply a security patch, and some are simpler than others. As many have pointed out (e.g. Lullabot, you can simply download the one line patch and apply it to your Drupal codebase using patch -p1.

You could also use Drush to do a Drupal core update (drush up drupal), but you'll still need to do this, manually, on every Drupal installation you manage.

If you have multiple webservers with Drupal (or multiple instances of Drupal 7 on a single server, or spread across multiple servers), then there are simpler ways of either deploying the hotfix, or upgrading Drupal core via drush and/or version control (you are using Git or some other VCS, right?).

Enter Ansible, the Swiss Army Knife for infrastructure

Ansible is a powerful infrastructure management tool. It does Configuration Management (CM), just like Puppet or Chef, but it goes much, much further. One great feature of Ansible is the ability to run ad-hoc commands against a bunch of servers at once.

After installing Ansible, you need to create a hosts file at /etc/ansible/hosts, and tell Ansible about your servers (this is an 'inventory' of servers). Here's a simplified overview of my file:

[mm]
midwesternmac.com drupal_docroot=/path/to/drupal

[servercheck-drupal]
servercheck.in drupal_docroot=/path/to/drupal

[hostedsolr-drupal]
hostedapachesolr.com drupal_docroot=/path/to/drupal

[drupal7:children]
mm
servercheck-drupal
hostedsolr-drupal

There are a couple quick things to note: the inventory file follows an ini-style format, so you define groups of servers with [groupname] (then list the servers one by one after the group name, with optional variables in key=value format after the server name), then define groups of groups with [groupname:children] (then list the groups you want to include in this group). We defined a group for each site (currently each group just has one Drupal web server), then defined a drupal7 group to contain all the Drupal 7 servers.

As long as you can connect to the servers using SSH, you're golden. No additional configuration, no software to install on the servers, nada.

Let's go ahead and quickly check if we can connect to all our servers with the ansible command:

$ ansible drupal7 -m ping
hostedapachesolr.com | success >> {
    "changed": false,
    "ping": "pong"
}
[...]

All the servers have responded with a 'pong', so we know we're connected. Yay!

For a simple fix, we could add a variable to our inventory file for each server defining the Drupal document root(s) on the server, then use that variable to apply the hotfix like so:

$ ansible drupal7 -m shell -a "curl https://www.drupal.org/files/issues/SA-CORE-2014-005-D7.patch | patch -p1 chdir={{ drupal_docroot }}"

This would quickly apply the hotfix on all your servers, using Ansible's shell module (which, conveniently, runs shell commands verbatim, and tells you the output).

Fixing core, and much more

Instead of running one command via ansible, let's make a really simple, short Ansible playbook to fix and verify the vulnerability. I created a file named drupal-fix.yml (that's right, Ansible uses plain old YAML files, just like Drupal 8!), and put in the following contents:

---
- hosts: drupal7
  tasks:
    - name: Download drupal core patch.
      get_url:
        url: https://www.drupal.org/files/issues/SA-CORE-2014-005-D7.patch
        dest: /tmp/SA-CORE-2014-005-D7.patch

    - name: Apply the patch from the drupal docroot.
      shell: "patch -p1 < /tmp/SA-CORE-2014-005-D7.patch chdir={{ drupal_docroot }}"

    - name: Restart apache (or nginx, and/or php-fpm, etc.) to rebuild opcode cache.
      service: name=httpd state=restarted

    - name: Clear Drupal caches (because it's always a good idea).
      command: "drush cc all chdir={{ drupal_docroot }}"

    - name: Ensure we're not vulnerable anymore.
      [redacted]

Now, there are again many, many different ways I could've done this. (And to the eagle-eyed, you'll note I haven't included my test for the vulnerability... I'd rather not share how to test for the vulnerability until people have had a chance to update all their sites).

I chose to do the hotfix first, and quickly, since I didn't necessarily have time to update all my Drupal project codebases to Drupal 7.32, then push the updated code to all my repositories. I did do this later in the day, however, and used a playbook similar to the above, replacing the first two tasks with:

- name: Pull down the latest code changes.
  git:
    repo: "git://[mm-git-host]/{{ inventory_hostname }}.git"
    dest: "{{ drupal_docroot }}"
    version: master

Using Ansible's git module, I can tell Ansible to make sure the given directory (dest) has the latest commit to the master branch in the given repo. I could've also used a command and run git pull from the drupal_docroot directory, but I like using Ansible's git module, which provides great reporting and error handling.

Summary

This post basically followed my train of thought after hearing about the vulnerability, and while there are a dozen other ways to patch the vulnerability on multiple sites/servers, this was the way I did it. Though I patched just 9 servers in about 5 minutes (from the time I started writing the playbook (drupal-fix.yml) to the time it was deployed everywhere), I could just as easily have deployed the fix to dozens or hundreds of Drupal servers in the same amount of time; Ansible is fast and uses simple, secure SSH connections to manage servers.

If you want to see much, much more about what Ansible can do for your infrastructure, please check out my book, Ansible for DevOps, and also check out my session from DrupalCon Austin earlier this year: DevOps for Humans: Ansible for Drupal Deployment Victory!.

Oct 12 2014
Oct 12

Now that Drupal 8.0.0-beta1 is out, and the headless Drupal craze is in full-swing, the Drupal St. Louis meetup this month will focus on using Drupal 8 with AngularJS to build a demo pizza ordering app. (The meetup is on Thurs. Oct. 23, starting at 6:30 p.m.; see even more info in this Zero to Drupal post).

We'll be hacking away and seeing how far we can get, and hopefully we'll be able to leave with at least an MVP-quality product! I'll be at the event, mostly helping people get a Drupal 8 development environment up and running. For some, this alone will hopefully be a huge help, and maybe motivation to adopt Drupal 8 more quickly!

If you're in or around the St. Louis area, consider joining us; especially if you would like to learn something about either Drupal 8 or AngularJS!

P.S. To those who have been emailing: the rest of the Apache Solr search series is coming, it's just been postponed while I've started a new job at Acquia, and had a new baby!

Aug 21 2014
Aug 21

Posts in this series:

Drupal has included basic site search functionality since its first public release. Search administration was added in Drupal 2.0.0 in 2001, and search quality, relevance, and customization was improved dramatically throughout the Drupal 4.x series, especially in Drupal 4.7.0. Drupal's built-in search provides decent database-backed search, but offers a minimal set of features, and slows down dramatically as the size of a Drupal site grows beyond thousands of nodes.

In the mid-2000s, when most custom search solutions were relatively niche products, and the Google Search Appliance dominated the field of large-scale custom search, Yonik Seeley started working on Solr for CNet Networks. Solr was designed to work with Lucene, and offered fast indexing, extremely fast search, and as time went on, other helpful features like distributed search and geospatial search. Once the project was open-sourced and released under the Apache Software Foundation's umbrella in 2006, the search engine became one of the most popular engines for customized and more performant site search.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Timeline of Apache Solr and Drupal Solr Integration

If you can't view the timeline, please click through and read this article on Midwestern Mac's website directly.

A brief history of Apache Solr Search and Search API Solr

Only two years after Apache Solr was released, the Apache Solr Search module was created. Originally, the module was written for Drupal 5.x, but it has been actively maintained for many years and was ported to Drupal 6 and 7, with some major rewrites and modifications to keep the module up to date, easy to use, and integrated with all of Apache Solr's new features over time. As Solr gained popularity, many Drupal sites started switching from using core search or heavily customized Views to using Apache Solr.

Seeing this trend, hosted solutions for Solr search were built specifically for Drupal sites. Some of these solutions included Acquia's Acquia Search (2008), Midwestern Mac's Hosted Apache Solr (2011), and Pantheon's Apache Solr service. Acquia, seeing the need for more stability and development in Drupal's Solr integration module, began sponsoring the development of the Apache Solr Search module in April of 2009 (wayback machine).

Search API came on the scene after Drupal 7 was released in 2011. Search API promised to be a rethink of search in Drupal. Instead of tying to a particular search technology, a search framework (with modular plugins for indexing, searching, facets, etc.) was written to plug into the Drupal database, Apache Solr, or whatever other systems a Drupal site could integrate with. The Search API Solr module was released shortly thereafter, and both Search API and Search API Solr were written exclusively for Drupal 7.

Both Solr integration solutions—Apache Solr Search and Search API Solr—have been actively developed, and both modules offer a very similar set of features. This led to a few issues during reign of Drupal 7 (still the current version as of this writing):

  • Many site builders wonder: Which module should I use?
  • Switching site search between the two modules (for example, if you find a feature in one that is not in the other) can be troublesome.
  • Does corporate sponsorship of one module over the other cause any issues in enterprise adoption, new feature development, or community involvement?

These problems have run their course over the past few years, and cause much confusion. Some add-on modules, like Facet API (which allows you to build facets for your search results), have been abstracted and generalized enough to be used with either search solution, but there are dozens of modules, and hundreds of blog posts, tutorials, and documentation pages written specifically for one module or the other. For Drupal 6 users, there is only one choice (since Search API Solr is only available for Drupal 7), but for Drupal 7 users, this has been a major issue.

Hosted Apache Solr's solr module usage statistics reveal the community's split over the two modules: 46% of the sites using Hosted Apache Solr use the Apache Solr Search module, while 54% of the sites use Search API Solr.

So, is Drupal's Solr community destined to be divided for eternity? Luckily, no! There are many positive trends in the current Solr module development cycle, and some great news regarding Drupal 8.

Uniting Forces

Already for Drupal 7, the pain of switching between the two modules (or supporting both, as Hosted Apache Solr does) is greatly reduced by the fact that both modules started using a unified set of Apache Solr configuration files (like schema.xml, solrconfig.xml, etc.) as of mid-2012 (see the Apache Solr Common Configurations sandbox project).

Additionally, development of add-on modules like Facet API and the like has been generalized so the features can be used today with either search solution with minimal effort.

A Brighter Future

There's still the problem of two separate modules, two separate sets of APIs, and a divided community effort between the two modules. When Drupal 8 rolls around, that division will be no more! In a 2013 blog post, Nick Veenhof announced that the maintainers of Search API and Apache Solr Search would be working together on a new version of Search API for Drupal 8.

The effort is already underway, as the first Drupal 8 Search API code sprint was held this past June in Belgium, after a successful funding campaign on Drupalfund.us.

The future of Solr and Drupal is bright! Even as other search engines like Elasticsearch are beginning to see more adoption, Apache Solr (which has seen hundreds of new features and greater stability throughout it's 4.x release series) continues to gain momentum as one of the best text search solutions for Drupal sites.

Aug 11 2014
Aug 11

Posts in this series:

It's common knowledge in the Drupal community that Apache Solr (and other text-optimized search engines like Elasticsearch) blow database-backed search out of the water in terms of speed, relevance, and functionality. But most developers don't really know why, or just how much an engine like Solr can help them.

I'm going to be writing a series of blog posts on Apache Solr and Drupal, and while some parts of the series will be very Drupal-centric, I hope I'll be able to illuminate why Solr itself (and other search engines like it) are so effective, and why you should be using them instead of simple database-backed search (like Drupal core's Search module uses by default), even for small sites where search isn't a primary feature.

As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.

Why not Database?

Apache Solr's wiki leads off it's Why Use Solr page with the following:

If your use case requires a person to type words into a search box, you want a text search engine like Solr.

At a basic level, databases are optimized for storing and retrieiving bits of data, usually either a record at a time, or in batches. And relational databases like MySQL, MariaDB, PostgreSQL, and SQLite are set up in such a way that data is stored in various tables and fields, rather than in one large bucket per record.

In Drupal, a typical node entity will have a title in the node table, a body in the field_data_body table, maybe an image with a description in another table, an author whose name is in the users table, etc. Usually, you want to allow users of your site to enter a keyword in a search box and search through all the data stored across all those fields.

Drupal's Search module avoids making ugly and slow search queries by building an index of all the search terms on the site, and storing that index inside a separate database table, which is then used to map keywords to entities that match those keywords. Drupal's venerable Views module will even enable you to bypass the search indexing and search directly in multiple tables for a certain keyword.

So what's the downside to database-backed search? Mainly, performance. Databases are built to be efficient query engines—provide a specific set of parameters, and the database returns a specific set of data. Most databases are not optimized for arbitrary string-based search. Queries where you use LIKE '%keyword%' are not that well optimized, and will be slow—especially if the query is being used across multiple JOINed tables! And even if you use the Search module or some other method of pre-indexing all the keyword data, relational databases will still be less efficient (and require much more work on a developer's part) for arbitrary text searches.

If you're simply building lists of data based on very specific parameters (especially where the conditions for your query all utilize speedy indexes in the database), a relational database like MySQL will be highly effective. But usually, for search, you don't just have a couple options and maybe a custom sort—you have a keyword field (primarily), and end users have high expectations that they'll find what they're looking for by simply entering a few keywords and clicking 'Search'.

Why Solr?

What makes Solr different? Well, Solr is optimized specifically for text-based search. The Lucene text search engine that runs behind Apache Solr is built to be incredibly efficient and also offers some other really useful tools for searching. Apache Solr adds some cool features on top of Lucene, like:

  • Efficient and fast search indexing.
  • Simple search sorting on any field.
  • Search ranking based on some simple rules (over which you have complete control).
  • Multiple-index searching.
  • Features like facets, text highlighting, grouping, and document indexing (PDF, Word, etc.).
  • Geospatial search (searching based on location).

Some of these things may seem a little obtuse, and it's likely that you don't need every one of these features on your site, but it's nice to know that Solr is flexible enough to allow you to do almost anything you want with your site search.

These general ideas are great, but in order to really understand what benefits Solr offers, let's look at what happens with a basic search in Apache Solr.

Simple Explanation of how Solr performs a search

This is a very basic overview, leaving out many technical details, but I hope it will help you understand what's going on behind the scenes at a basic level.

When searching with a database-backed search, the database says, "give me a few keywords, and I'll find exact matches for those words," and it only covers a few very specific bits of data (like title, body, and author). Searching with Solr is more nuanced, flexible, and powerful.

Step 1 - Indexing search data

First, when Solr builds an index of all the content on your site, it gathers all the content's data—each entity's title, body, tags, and any other textual information related to the entity. While reading through all this textual information, Solr does some neat things, like:

  • Stemming: taking a word like "baseballs" and adding in 'word stems' like "baseball".
  • Stop Word filtering: Removing words with little search relevance like "a", "the", "of", etc.
  • Normalization: Converting special characters to simpler forms (like ü to u and ê to e so search can work more intuitively).
  • Synonym expansion: Adding synonyms to words, so the words "doctor" and "practitioner" could be equivalent in a search, even if only one word appears in the content.

These functions are collectively known as tokenization, and are actually performed by Lucene, the engine running under Solr. You don't need to know what all this means right now, but basically, if your content has the word "baseball" in it, and a user searches for "baseballs" or "stickball", the "baseball" result will be returned.

Step 2 - Searching with keywords

Second, when someone enters keywords to perform a search, Solr does a few things before it starts the actual search. We'll take the example below and run through what happens:

Baseball hall of fame

The first thing Solr does is splits the search into groupings: first the entire string, then all but one word in every combination, then all but two words in every combination, and so on, until it gets to individual words. Just like with indexing, Solr will even take individual words like "hall" and split that word out into "halls", "hall", etc. (basically any kind of related term/plural/singular/etc.).

So now, at this point, your above search looks kind of like you actually searched for:

"baseball hall of fame"
"baseball hall"
"baseball fame"
"baseballs"
"halls"
...
"baseball"

I've skipped many derivatives for clarity, but basically Solr does a little work on the entered keywords to make sure you're going to get results that are relavant for the terms you entered.

Step 3 - Executing the search

Finally, the search engine takes every one of the parsed keywords, and scores them against every piece of content in the index. Each piece of content then gets a score (higher for the number of possible matches, zero if no terms were matched). Then your search result shows all those results, ranked by how relevant they are to the current search.

If you had an entity with the title "Baseball Hall of Fame", it's likely that would be the top result. But some other content may match on parts or combinations of the keywords, so they'll also show up in the search.

If you know better than the search engine, and only want results that exactly match your search, you can enclose your keywords in quotes, so you would only get results with the exact string baseball hall of fame, and nothing that mentions 'hall of fame' or 'baseball' independently.

Solr also adds in a few nifty features when it returns the search results (or lack thereof); it will give back spelling suggestions, which are based on whether any words in the search index are very close matches to the words or phrase you entered in the keywords, and it will also highlight the matched words or word parts in the actual search result.

Summary

In a nutshell, this post explained how Apache Solr works by indexing, tokenizing, and searching your content. If you read through the entire post, you even have a basic understanding of Levenshtein distance, approximate string matching, and concept search, and can get started building your own Google :)

I'll be diving much more deeply into Apache Solr as time allows, highlighting especially the past, present, and future of Apache Solr and Drupal, as well as ways you can make Apache Solr integrate more seamlessly and effectively with your site, perform better, and do exactly what you want it to do.

Jul 29 2014
Jul 29

I wanted to post this here, since this is more of my sounding board for the Drupal community, but the details are on my personal blog: starting October 6, I will be working for Acquia as a Technical Architect in their Professional Services group!

What does this mean for this site/blog, Hosted Apache Solr, and Server Check.in? Not much, actually—they will continue on, likely at the same pace of development they've been for the past year or so (I'll work on them when I get an odd hour or two...). I am still working on completing Ansible for DevOps, and will actually be accelerating my writing schedule prior to starting the new job, since I'll have a little wedge of free time (a.k.a. unemployment!) between Mercy (my current full-time employer) and Acquia.

I'm excited to start working for Acquia, and am also excited to be able to continue doing what I love—working on Drupal, working on Solr, working on Ansible/infrastructure, and working in their respective communities.

Jan 22 2014
Jan 22

If you're a Drupal or PHP developer used to debugging or troubleshooting some code by adding a print $variable; or dpm($object); to your PHP, and then refreshing the page to see the debug message (or using XDebug, or using watchdog logging...), debugging Varnish's VCL language can be intimidating.

VCL uses C-like syntax, and is compiled when varnish starts, so you can't just modify a .vcl file and refresh to see changes or debug something. And there are only a few places where you can simply stick a debug statement. So, I'll explain four different ways I use to debug VCLs in this post (note: don't do this on a production server!):

Simple Error statements (like print in PHP)

Sometimes, all you need to do is see the output of a variable, like req.http.Cookie, inside vcl_recv(). In these cases, you can just add an error statement to throw an error in Varnish and output the contents of a string, like the Cookie:

error 503 req.http.Cookie;

Save the VCL and restart Varnish, and when you try accessing a page, you'll get Varnish's error page, and below the error message, the contents of the cookie, like so:

Varnish error debug message

This debugging process works within vcl_recv() and vcl_fetch(), but causes problems (or crashes Varnish) in any other functions.

Monitoring Varnish with varnishlog

varnishlog is another simple command-line utility that dumps out every bit of information varnish processes and returns during the course of it's processing, including backend pings, request headers, response headers, cache information, hash information, etc.

If you just enter varnishlog and watch (or dump the info into a file), be prepared to scroll for eons or grep like crazy to find the information you're looking for. Luckily, varnishlog also lets you filter the information it prints to screen with a few options, like -m to define a regular expression filter. For example:

# Display all Hashes.
varnishlog -c -i Hash# Display User-Agent strings.
varnishlog -c -i RxHeader -I User-Agent

There are more examples available on the Varnish cache Wiki.

Monitoring Varnish with varnishtop

varnishtop is a simple command-line utility that displays varnish log entries with a rolling count (ranking logged entries by frequency within the past minute). What this means in laymans terms is that you can easily display things like how many times a particular URL is hit, different bits of information about requests (like Hashes, or headers, etc.).

I like to think of varnishtop as a simple way to display the incredibly deep stats from varnishlog in realtime, with better filtering.

Some example commands I've used when debugging scripts:

# Display request cookies.
varnishtop -i RxHeader -I Cookie

# Display varnish hash data ('searchtext' is text to filter within hash).
varnishtop -i "Hash" -I searchtext

# Display 404s.
varnishlog -b -m "RxStatus:404"

You can change the length of time being monitored from the default of 60 seconds by specifying -p period (note that this setting only works for Varnish > 3.0).

There are a few other common monitoring commands in this StackOverflow answer.

Dry-run Compiling a VCL

Sometimes you may simply have a syntax error in your .vcl file. In these cases, you can see exactly what's wrong by using the command varnishd -Cf /path/to/default.vcl, where default.vcl is the base VCL file you've configured for use with Varnish (on CentOS/RHEL systems, this file is usually /etc/varnish/default.vcl).

The output of this command will either be a successfully-compiled VCL, or an error message telling you on exactly what line the error occurs.

Other debugging techniques

Are there any other simple debugging techniques you use that I didn't cover here? Please let me know in the comments. I wanted to compile these techniques, and a few examples, because I've never really seen a good/concise primer on debugging Varnish configuration anywhere—just bits and pieces.

Jan 01 2014
Jan 01

2014 is going to be a big year for Drupal. I spent a lot of 2013 sprucing up services like Hosted Apache Solr and Server Check.in (both running on Drupal 7 currently), and porting some of my Drupal projects to Drupal 8.

So far I've made great progress on Honeypot and Wysiwyg Linebreaks, which I started migrating a while back. Both modules work and pass all tests on Drupal's current dev/alpha release, and I plan on following through with the D8CX pledges I made months ago.

Some of the other modules I maintain, like Gallery Archive, Login as Other, Simple Mail, and themes like MM - Minimalist Theme, are due to be ported sooner rather than later. I'm really excited to start working with Twig in Drupal 8 (finally, a for-real front-end templating engine!), so I'll probably start working on themes in early 2014.

Drupal in 2014

Drupal 8 Logo

2013 was an interesting year for Drupal, with some major growing pains. Drupal 8 is architecturally more complex (yet simpler in some ways) than Drupal 7 (which was more complex than Drupal 6, etc.), and the degree of difference caused some developer angst, even leading to a fork, Backdrop. Backdrop is developing organically under the guidance of Nate Haug, but it remains to be seen what effect it will have on the wider CMS scene, and on Drupal specifically.

One very positive outcome of the fork that some of the major Drupal 8 DX crises (mostly caused by switching gears to an almost entirely-OOP architecture) are being resolved earlier in the development cycle. As with any Drupal release cycle, the constant changes can sometimes frustrate developers (like me!) who decide to start migrating modules well before code/API freeze. But if you've been a Drupal developer long enough, you know that the drop is always moving, and the end result will be much better for it.

Drupal 8 is shaping up to be another major contender in the CMS market, as it includes so many timely and necessary features in core (Views, config management, web services, better blocks, Twig, Wysiwyg, responsive design everywhere, great language support, etc.). I argue it's hard to beat Drupal 8 core, much less core + contrib, with any other solution available right now, for any but the simplest of sites.

One remaining concern I have with Drupal 8 is performance; even though you can cover some performance problems with caching layers, the core, uncached Drupal experience is historically pretty slow, even without a bevy of contrib modules thrown in the mix. Drupal's new foundation (Symfony) will help in some aspects (probably more so in more complicated environments—sometimes Symfony is downright slow), and there are issues open to try to fix some known regressions, but being a performance nut, I like it when I can shave tens to hundreds of ms per request, even on a simple LAMP server!

Unlike Drupal 7's sluggish adoption—it was months before most people considered migrating, mostly because Views, and to a lesser extent, the Migrate module, was not ready for some time after release—I think some larger sites will begin migrating to 8 with the first release candidate (there are already some personal sites and blogs using alpha builds). For example, when I migrate Server Check.in, I can substantially reduce the lines of custom code I maintain, and build a more flexible core, simply because Drupal 8 offers more flexible and reliable solutions in core, most especially with Views and Services.

Drupal 8 is shaping up to be the most exciting Drupal release to date—what are your thoughts as we enter this new year? Oh, and Happy New Year!

Oct 01 2013
Oct 01

For a recent project, I needed to migrate anything inside <script> and <style> tags that were embedded with other content inside the body field of Drupal 6 nodes into separate Code per Node-provided fields for Javascript and CSS. (Code per Node is a handy module that lets content authors easily manage CSS/JS per node/block, and saves the styles and scripts to the filesystem for inclusion when the node is rendered—read more about CPN goodness here).

The key is to get all the styles and scripts into a string (separately), then pass that data into an array in the format:

<?php
$node
->cpn = array(
 
'css' => '<string of CSS without <style> tags goes here>',
 
'js' => '<string of Javascript without <script> tags goes here>',
);
?>

Then you can save your node with node_save(), and the CSS/JS will be stored via Code per Node.

For a migration using the Migrate module, the easiest way to do this (in my opinion) is to implement the prepare() method, and put the JS/CSS into your node's cpn variable through a helper function, like so:

First, put implement the prepare() method in your migration class:

<?php
 
/**
   * Make changes to the entity immediately before it is saved.
   */
 
public function prepare($entity, $row) {
   
// Process the body and move <script> and <style> tags to Code per Node.
   
if (isset($entity->body[$entity->language][0])) {
     
$processed_info = custom_process_body_for_cpn($entity->body[$entity->language][0]['value']);
     
$entity->body[$entity->language][0]['value'] = $processed_info['body'];
     
$entity->cpn = $processed_info['cpn'];
    }
  }
?>

Then, add a helper function like the following in your migrate module's .module file (assuming your migrate module is named 'custom'):

<?php
/**
* Break out style and script tags in body content into a Code per Node array.
*
* This function uses regular expressions to grab the content inside <script>
* and <style> tags inside the given body HTML, then put them into separate keys
* in an array that can be set as $node->cpn for a node before saving, which
* will store the scripts and styles in the appropriate fields for the Code per
* Node module.
*
* Why regex? I originally tried using PHP's DOMDocument to process the HTML,
* but besides being overly verbose with error messages on all but the most
* pristine markup, DOMDocument processed tags poorly; if there were multiple
* script tags, or cases where script tags were inside certain other tags, only
* one or two of the matches would work. Yuck.
*
* Regex is evil, but in this case necessary.
*
* @param string $body
*   HTML string that could potentially contain script and style tags.
*
* @return array
*   Array with the following elements:
*     cpn: array with 'js' and 'css' keys containing corresponding strings.
*     body: same as the body passed in, but without any script or style tags.
*/
function custom_process_body_for_cpn($body) {
 
$cpn = array('js' => '', 'css' => '');  // Search for script and style tags.
 
$tags = array(
   
'script' => 'js',
   
'style' => 'css',
  );
  foreach (
$tags as $tag => $type) {
   
// Use a regular expression to match the tag and grab the text inside.
   
preg_match_all("/<$tag.*?>
(.*?)<\/$tag>/is", $body, $matches, PREG_SET_ORDER);
    if (!empty($matches)) {
      foreach ($matches as $match_set) {
        // Remove the first item in the set (it still has the matched tags).
        unset($match_set[0]);
        // Loop through the matches.
        foreach ($match_set as $match) {
          $match = trim($match);
          // Some tags, like script tags for embedded videos, are empty, and
          // shouldn't be removed, so check to make sure there's a value.
          if (!empty($match)) {
            // Remove the text from the body.
            $body = preg_replace("/<$tag.*?>(.*?)<\/$tag>/is", '', $body);
            // Add the tag contents to the cpn array.
            $cpn[$type] .= $match . "\r\n\r\n";
          }
        }
      }
    }
  }  // Return the updated body and CPN array.
  return array(
    'cpn' => $cpn,
    'body' => $body,
  );
}
?>

If you were using another solution like the CSS module in Drupal 6, and need to migrate to Code per Node, your processing will be a little different, and you might need to do some work in your migration class' prepareRow() method instead. The main thing is to get the CSS/Javascript into the $node->cpn array, then save the node. The Code per Node module will do the rest.

Sep 30 2013
Sep 30

There are some simple Drupal modules that help with login redirection (especially Login Destination), but I often need more advanced conditions applied to redirects, so I like being able to do the redirect inside a custom module. You can also do something similar with Rules, but if the site you're working on doesn't have Rules enabled, all you need to do is:

  1. Implement hook_user_login().
  2. Override $_GET['destination'].

The following example shows how to redirect a user logging in from the 'example' page to the home page (Drupal uses <front> to signify the home page):

<?php
/**
* Implements hook_user_login().
*/
function mymodule_user_login(&$edit, $account) {
 
$current_path = drupal_get_path_alias($_GET['q']);  // If the user is logging in from the 'example' page, redirect to front.
 
if ($current_path == 'example') {
   
$_GET['destination'] = '<front>';
  }
}
?>

Editing $edit['redirect'] or using drupal_goto() inside hook_user_login() doesn't seem to do anything, and setting a Location header using PHP is not best practice. Drupal uses the destination parameter to do custom redirects, so setting it anywhere during the login process will work correctly with Drupal's built in redirection mechanisms.

Sep 25 2013
Sep 25

CI: Deplyments and Code Quality

tl;dr: Get the Vagrant profile for Drupal/PHP Continuous Integration Server from GitHub, and create a new VM (see the README on the GitHub project page). You now have a full-fledged Jenkins/Phing/SonarQube server for PHP/Drupal CI.

In this post, I'm going to explain how Jenkins, Phing and SonarQube can help you with your Drupal (or hany PHP-based project) deployments and code quality, and walk you through installing and configuring them to work with your codebase. Bear with me... it's a long post!

Code Deployment

If you manage more than one environment (say, a development server, a testing/staging server, and a live production server), you've probably had to deal with the frustration of deploying changes to your code to these servers.

In the old days, people used FTP and manually copied files from environment to environment. Then FTP clients became smarter, and allowed somewhat-intelligent file synchronization. Then, when version control software became the norm, people would use CVS, SVN, or more recently Git, to push or check out code to different servers.

All the aforementioned deployment methods involved a lot of manual labor, usually involving an FTP client or an SSH session. Modern server management tools like Ansible can help when there are more complicated environments, but wouldn't everything be much simpler if there were an easy way to deploy code to specific environments, especially if these deployments could be automated to either run on a schedule or whenever someone commits something to a particular branch?

Jenkins Logo

Enter Jenkins. Jenkins is your deployment assistant on steroids. Jenkins works with a wide variety of tools, programming languages and systems, and allows the automation (or radical simplification) of tasks surrounding code changes and deployments.

In my particular case, I use a dedicated Jenkins server to monitor a specific repository, and when there are commits to a development branch, Jenkins checks out that branch from Git, runs some PHP code analysis tools on the codebase using Phing, archives the code and other assets in a .tar.gz file, then deploys the code to a development server and runs some drush commands to complete the deployment.

Static Code Analysis / Code Review

If you're a solo developer, and you're the only one planning on ever touching the code you write, you can use whatever coding standards you want—spacing, variable naming, file structure, class layout, etc. don't really matter.

But if you ever plan on sharing your code with others (as a contributed theme or module), or if you need to work on a shared codebase, or if there's ever a possibility you will pass on your code to a new developer, it's a good idea to follow coding standards and write good code that doesn't contain too many WTFs/min.

SonarQube Logo

The easiest way to do this is to use static code analysis tools like PHP Mess Detector, PHP CodeSniffer (with Drupal's Coder module), and the PHP Copy/Paste Detector.

It's great to be able to use any of these tools individually, but let's face it—unless they're set up to run and give you reports in some automated fashion, there's little chance you're going to take time out of your busy development schedule to run these helpful code review tools, especially if the boring plain text reports they generate are long.

Jenkins and Phing together will do the heavy lifting of grabbing code from your repository and running it through all these analysis tools (as well as PHPUnit for automated unit testing, or SimpleTest). But we're going to take this to the next level; instead of just leaving you with a long text file to decipher, we're going to use another awesome tool, SonarQube, to generate (automatically) graphs, charts, and custom dashboards showing statistics like lines of code and commented lines of code over time, method/function complexity, coding standards violations, etc.

SonarQube helps highlight areas of your codebase where you can get the most ROI for your cleanup efforts; it's easy to spot that one module or template where a bunch of quickly-written messy code might be lurking, waiting to destroy a week of development time because it's lacking documentation, poorly-written, or an incredibly complicated mess!

Putting It All Together

Vagrant Logo          VirtualBox Logo

Now, into the nitty gritty. We're going to build ourselves a virtual machine that has everything configured to do all the things I mentioned above, and will be flexible enough to allow us to add more code quality analysis (like Drupal SimpleTest integration) and deployment options over time.

We'll build this VM using Vagrant with VirtualBox, which means the VM can be built and rebuilt on any Mac, Windows, or Linux PC. The configuration can also be split up to create separate servers for all the different components—a Jenkins server with Phing and SonarQube Runner to do the deployments and code analysis, and a SonarQube server for the pretty graphs and overview of our code quality.

The complete VM is available on GitHub (Vagrant profile for Drupal/PHP Continuous Integration Server), but I'll go through the configuration step by step. This guide assumes you're running CentOS or some other RHEL-flavored Linux, but it should be easily adaptable to other environments that use apt or another package manager instead of yum. Additionally, I am working on rebuilding this Vagrant profile using Ansible instead of shell scripts, but for now, shell scripts will have to do :-)

Installing Jenkins

Note: I will be using the hostname 'jenkins-sandbox' for this server, and Jenkins will run on port 8080.

To install Jenkins, you need to be running Java (in my situation, 1.6.0 is the latest version offered by the standard CentOS repos):

yum install --quiet -y java-1.6.0-openjdk

After Java is installed and configured (check the version with java -version), install Jenkins from the Jenkins RPM:

wget --quiet -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
rpm --quiet --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key
yum install --quiet -y jenkins

Configure Jenkins to start automatically after system boot:

service jenkins start
chkconfig jenkins on

Force Jenkins to update it's plugin directory (you can do this by visiting Jenkins' update center in your browser, but we'll do it via CLI since it's faster and can be part of the automated server build):

curl -s -L http://updates.jenkins-ci.org/update-center.json | sed '1d;$d' | curl -s -X POST -H 'Accept: application/json' -d @- http://jenkins-sandbox:8080/updateCenter/byId/default/postBack

Install Jenkins' CLI tool so you can run later commands via the command line instead of having to click through the interface:

wget --quiet http://jenkins-sandbox:8080/jnlpJars/jenkins-cli.jar

Install the Jenkins phing and sonar plugins:

java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ install-plugin phing
java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ install-plugin sonar

You can import and export jobs in Jenkins using XML files if you have the Jenkins CLI installed, using the following syntax:

java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ get-job MyJenkinsJob > /path/to/exported/MyJenkinsJob.xml
java -jar jenkins-cli.jar -s http://jenkins-sandbox:8080/ create-job MyJenkinsJob < /path/to/exported/MyJenkinsJob.xml

Restart Jenkins so everything works correctly:

service jenkins restart

Now that Jenkins is installed, you could visit http://jenkins-sandbox:8080/ in your browser and start playing around in the UI... but we're going to keep moving along, getting the rest of our PHP CI system up and running.

Installing PHP and Phing

Since we're going to be building PHP projects in Jenkins, and using a variety of PHP code analysis tools to inspect and test our code, we need to install PHP, PEAR, Phing, and some other plugins.

First, let's install PHP, PEAR, and some other basic dependencies:

yum install --quiet -y php php-devel php-xml php-pear ImageMagick
pear channel-update pear.php.net
pear config-set auto_discover 1

Then, install PHPUnit if you'd like to run unit tests against your code:

pear channel-discover pear.phpunit.de
pear channel-discover pear.symfony.com
pear install phpunit/PHPUnit

Install the PHP Copy/Paste Detector (this will check for duplicate code that could be merged to reduce the lines of code you need to maintain):

pear install pear.phpunit.de/phpcpd

Install the PHP Mess Detector (this will check for poor code quality, overly-complicated code, and code that will introduce lots of technical debt).

pear channel-discover pear.phpmd.org
pear channel-discover pear.pdepend.org
pear install phpmd/PHP_PMD

Install PHP CodeSniffer (this will 'sniff' your code for bad formatting and coding standards violations):

pear install PHP_CodeSniffer

Install XDebug (useful for debugging PHP code, and used by some other tools):

pecl install xdebug

Install Phing (which will be used to coordinate the running of all the other tools we just installed against your code):

pear channel-discover pear.phing.info
pear install phing/phing

Download the Drupal Coder module, copy the Drupal Coding Standards out of the coder_sniffer submodule into the PHP CodeSniffer's standards diretory, then delete the downloaded module:

wget --quiet http://ftp.drupal.org/files/projects/coder-7.x-2.x-dev.tar.gz
tar -zxvf coder-7.x-2.x-dev.tar.gz
mv coder/coder_sniffer/Drupal $(pear config-get php_dir)/PHP/CodeSniffer/Standards/Drupal
rm coder-7.x-2.x-dev.tar.gz
rm -rf coder

At this point, you should have a working PHP installation that has all (or at least most) of the tools you need to find potential issues with your code, and deploy your code using Jenkins and Phing.

Installing MySQL

SonarQube requires a database to function, and it's pretty simple to get MySQL set up and configured to be able to handle anything SonarQube can throw at it. Let's install mysql and mysql server:

yum install --quiet -y mysql-server mysql

Start MySQL and set it to start up at system boot.

service mysqld start
chkconfig mysqld on

You could run the MySQL setup assistant at this point, but we'll just run a few scriptable commands to do the same things as the mysql_secure_installation script would do (configure the root password (we'll use 'root' for simplicity's sake), delete the anonymous user, and delete the test database):

/usr/bin/mysqladmin -u root password root
/usr/bin/mysqladmin -u root -h jenkins-sandbox password root
echo "DELETE FROM mysql.user WHERE User='';" | mysql -u root -proot
echo "FLUSH PRIVILEGES;" | mysql -u root -proot
echo "DROP DATABASE test;" | mysql -u root -proot

Now we just need to create a database and user for SonarQube:

echo "CREATE DATABASE sonar CHARACTER SET utf8 COLLATE utf8_general_ci;" | mysql -u root -proot
echo "CREATE USER 'sonar' IDENTIFIED BY 'sonar';" | mysql -u root -proot
echo "GRANT ALL ON sonar.* TO 'sonar'@'%' IDENTIFIED BY 'sonar';" | mysql -u root -proot
echo "GRANT ALL ON sonar.* TO 'sonar'@'localhost' IDENTIFIED BY 'sonar';" | mysql -u root -proot
echo "FLUSH PRIVILEGES;" | mysql -u root -proot

MySQL is ready to go!

Installing SonarQube Server

SonarQube is a very nice code analysis and code review visualization and tracking tool, and it needs to be installed on a server with Java (which we already have set up for Jenkins) and a database (which we just set up above). First, we'll install Sonar:

wget --quiet http://dist.sonar.codehaus.org/sonar-3.7.1.zip
unzip -q sonar-3.7.1.zip
rm -f sonar-3.7.1.zip
mv sonar-3.7.1 /usr/local/sonar

Next, edit the sonar.properties file so Sonar knows how to connect to the MySQL database we created earlier (the file is located at /usr/local/sonar/conf/sonar.properties). Edit the following configuration options to match:

sonar.jdbc.username: sonar
sonar.jdbc.password: sonar
sonar.jdbc.url: jdbc:mysql://localhost:3306/sonar?useUnicode=true&amp;characterEncoding=utf8&amp;rewriteBatchedStatements=true

Install the PHP plugin for Sonar, so our PHP projects can be analyzed without an ugly error message (you can also install the plugin through Sonar's interface, but this method is faster and easy to include in a script):

wget --quiet http://repository.codehaus.org/org/codehaus/sonar-plugins/php/sonar-php-plugin/1.2/sonar-php-plugin-1.2.jar
mv sonar-php-plugin-1.2.jar /usr/local/sonar/extensions/plugins/

To make sonar easier to manage from the command line, we'll add an init script so you can start/restart/stop it with service and use chkconfig. Create a file at etc/init.d/sonar with the following contents:

#!/bin/sh
#
# rc file for SonarQube
#
# chkconfig: 345 96 10
# description: SonarQube system (www.sonarsource.org)
#
### BEGIN INIT INFO
# Provides: sonar
# Required-Start: $network
# Required-Stop: $network
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Short-Description: SonarQube system (www.sonarsource.org)
# Description: SonarQube system (www.sonarsource.org)
### END INIT INFO
/usr/bin/sonar $*

Next, we'll symlink the appropriate sonar executable into /usr/bin, set the correct permissions, and enable sonar at system boot:

ln -s /usr/local/sonar/bin/linux-x86-64/sonar.sh /usr/bin/sonar
chmod 755 /etc/init.d/sonar
chkconfig --add sonar

Finally, we're ready to start up sonar for the first time:

service sonar start

It will probably take 45 seconds to a minute to start up the first time, as Sonar will generate it's database and configure itself. Once it's started, you can access Sonar at http://jenkins-sandbox:9000/.

Installing SonarQube Runner

There are two parts to SonarQube: the server itself, and the Runner, which is helpful if you're using a language that doesn't need to be compiled, but needs to have code analysis generated and dumped into an active SonarQube installation (basically anything that doesn't use Maven). Here we'll install and configure SonarQube Runner, so we can push the code analysis done on our Drupal site into our SonarQube server.

First, we need to download sonar-runner and place it in the appropriate directory (note: this guide was written for version 2.3... in the future, you may need to update the version number/URL):

wget --quiet http://repo1.maven.org/maven2/org/codehaus/sonar/runner/sonar-runner-dist/2.3/sonar-runner-dist-2.3.zip
unzip -q sonar-runner-dist-2.3.zip
rm -f sonar-runner-dist-2.3.zip
mv sonar-runner-2.3 /usr/local/sonar-runner

Now, configure your sonar-runner instance to point to the SonarQube server we set up earlier by editing the sonar-runner.properties file (located at /usr/local/sonar-runner/conf/sonar-runner.properties). The file should contain something the following (at least):

# SonarQube Host URL.
sonar.host.url=http://jenkins-sandbox:9000
# MySQL connection.
sonar.jdbc.url=jdbc:mysql://localhost:3306/sonar?useUnicode=true&amp;amp;characterEncoding=utf8
# MySQL credentials.
sonar.jdbc.username=sonar
sonar.jdbc.password=sonar

Finally, to allow sonar-runner to work correctly (so you can just cd to a directory containing a sonar properties file for a project and enter sonar-runner to analyze the code), you need to set the environment variable SONAR_RUNNER_HOME and add the sonar-runner bin directory to your PATH. The simplest way to do this is to add the file /etc/profile.d/sonar.sh with the following inside:

# Sonar settings for terminal sessions.
export SONAR_RUNNER_HOME=/usr/local/sonar-runner
export PATH=$PATH:/usr/local/sonar-runner/bin

You can also have Jenkins install SonarQube Runner via the UI, but that spoils the fun of using the shell commands, and also isn't able to be wrapped up in a Vagrant profile :-).

Let's Do This Thing!

Okay, now that we've completed the marathon of installation and configuration (or just finished a cup of coffee if you used the Vagrant profile and vagrant up), it's time to jump into Jenkins, run a deployment, and see our results in Jenkins and SonarQube!

Jenkins Dashboard
The Jenkins Dashboard

Fire up your web browser and visit http://jenkins-sandbox:8080/ to get to the Jenkins dashboard. We'll create a new project to test everything out:

  1. Click on 'New Job'.
  2. Put in a Job name (like 'Drupal 7') and choose 'Build a free-style software project', then click OK.
  3. Under Source Code Management, choose 'Git' and enter the following:
    • Branch Specifier: 7.x
    • (In 'Advanced...') Local subdirectory for repo: drupal
  4. Under Build, click 'Add build step' and choose 'Invoke Phing targets', then enter the following:
    • Targets: build
    • Phing Build File: /vagrant/config/jenkins/drupal-7-example/build.xml
    • Properties: project.builddir=${WORKSPACE}
  5. Under Build, click 'Add build step' and choose 'Invoke Standalone Sonar Analysis', then enter the following:
    • Path to project properties: /vagrant/config/jenkins/drupal-7-example/sonar-project.properties
  6. Click Save at the bottom of the page.

(Note that the build.xml and sonar-project.properties files are in the location they would be if you use the Vagrant profile linked at the top of this post—if you're building the server manually, update the paths to your Phing and Sonar properties files accordingly).

If everything is configured correctly, you can now click 'Build Now', and prepare to be dazzled! After a few minutes (depending on the speed of your connection), Jenkins will clone the Drupal git repository, run some analysis on the code through Phing, archive the codebase, and send the analysis results off to SonarQube.

Once everything is complete (and, hopefully, you get a happy blue ball indicating build success!), you can click the Sonar link from the Project's main page to view the SonarQube analysis.

Conclusion

You now have a Continuous Integration server set up that will enable more automated deployments and make code review a more visual and simple process. Plus, as you improve your codebase, you'll be able to see pretty SonarQube graphs and charts showing you how much the code has improved!

Phing and Jenkins offer many more features—I've barely scratched the surface! Go forward and explore the many things you can now do, like automatically generate API documentation for your custom code or email developers directly when their commits break tests.

And, for Heaven's sake, instead of following the 100+ manual steps above to configure a Continuous Integration server, use the Vagrant profile for Drupal/PHP Continuous Integration Server, and let Vagrant + VirtualBox do the heavy lifting of configuring your server!

Security caveat: The steps above and the Vagrant profile are meant for local testing only—if you build a production/web-accessible CI server, make sure to lock down access with better passwords, authentication, and firewall rules.

Sep 16 2013
Sep 16

Midwestern Mac has been offering Apache Solr hosting for Drupal websites for the past three years, but this service has never been given too much attention or made easy to sign up for and use—until now!

Today we're announcing the re-launch of our service with a new website: Hosted Apache Solr.

Hosted Apache Solr home page - Drupal 7

The website was built on Drupal 7, and uses a custom base theme shared with Server Check.in (our server monitoring service built with Drupal and Node.js). We built a small payment integration module for PayPal subscriptions (though we're considering using Drupal Commerce, so we can use different payment processors more easily), and have built a very simple to use front-end for managing Solr core subscriptions.

If you don't know much about what Apache Solr can do for your site's search and listings, here's a one-sentence summary: Solr enables highly customizable and speedy content indexing, faceted and advanced search filtering abilities, and raw speed—indexing and searching are many times faster than database-backed search (like Drupal's default search or basic Views filtering).

There are a few other companies that offer hosted instances of Apache Solr, notably Acquia, but most other solutions require more expensive contracts or are not tailored specifically towards Drupal sites. We hope you like our offering, and would love to hear your feedback—what can we do to help you choose Hosted Apache Solr as your Drupal search solution?

Check out Hosted Apache Solr, and sign up to improve your search experience!

Midwestern Mac will be soon be posting more stories about Hosted Apache Solr, Apache Solr itself, and Drupal/Solr integrations, so please consider subscribing to our RSS feed to stay informed!

Apache Solr is a trademark of the Apache Software Foundation. Drupal is a registered trademark of Dries Buytaert.

Aug 30 2013
Aug 30

It seems most developers I know have a story of running some sort of batch operation on a local Drupal site that triggers hundreds (or thousands!) of emails that are sent to the site's users, causing much frustration and ill will towards the site the developer is working on. One time, I accidentally re-sent over 9,000 private message emails during a test user migration because of an email being sent via a hook that was invoked during each message save. Filling a user's inbox is not a great way to make that user happy!

With Drupal, it's relatively easy to make sure emails are either rerouted or directed to temp files from local development environments (and any other environment where actual emails shouldn't be sent to end users). Drupal.org has a very thorough page, Managing Mail Handling for Development or Testing, which outlines many different ways you can handle email in non-production environments.

However, for most cases, I like to simply redirect all site emails to my own address, or route them to a figurative black hole.

Rerouting Emails to an Arbitrary Email Address

There's a simple module, Reroute Email, which allows you to have all emails sent through Drupal to be rerouted to a configured email address. This module is simple enough, but for even more simplicity, if you have a custom module, you can just invoke hook_mail_alter() to force all messages to a given email address. Example (assuming your module's name is 'custom' and you want to send emails to the configured 'site_mail' address):

<?php
/**
* Implements hook_mail_alter().
*/
function custom_mail_alter(&$message) {
 
// Re-route emails to admin when override_email variable is set.
 
if (variable_get('override_email', 0)) {
   
$message['to'] = variable_get('site_mail');
  }
}
?>

Now you can just add $conf['override_email'] = 1; to settings.php for any environment where you want all emails to be sent to the 'site_mail' configured email address. Pretty simple!

Directing emails to text files in /tmp

Another simple option, if you still don't want emails to be sent to the end user, but still want to see them in some form (in this case, a text file), is to enable the Devel module, then set your site's mail system to 'DevelMailLog' (like the following inside settings.php):

<?php
$conf
['mail_system'] = array('default-system' => 'DevelMailLog');
?>

Devel will now re-route all emails to .txt files inside your server's /tmp folder.

Aug 21 2013
Aug 21

There are many times when a custom module provides functionality that requires a tweaked or radically altered template file, either for a node, a field, a view, or something else.

While it's often a better idea to use a preprocess or alter function to accomplish what you're doing, there are many times where you need to change the markup/structure of the HTML, and modifying a template directly is the only way to do it. In these cases, if you're writing a generic custom module that needs to be shared among different sites with different themes, you can't just throw the modified template into each theme, because you'd have to make sure each of the sites' themes has the same file, and updating it would be a tough proposition.

I like to keep module-based functionality inside modules themselves, so I put all templates that do specific things relating to that module into a 'templates' subdirectory.

In my example, I'd like to override field-collection-item.tpl.php, which is included with the Field collection module. To do so, I copy the default template into my custom module's 'templates' folder, and modify it how I like. Then I implement hook_theme_registry_alter() to tell Drupal where my template exists:

<?php
/**
* Implements hook_theme_registry_alter().
*/
function custom_theme_registry_alter(&$theme_registry) {
 
// Override the default field-collection-item.tpl.php with our own.
 
if (isset($theme_registry['field_collection_item'])) {
   
$module_path = drupal_get_path('module', 'custom');
   
$theme_registry['field_collection_item']['theme path'] = $module_path;
   
$theme_registry['field_collection_item']['template'] = $module_path . '/templates/field-collection-item';
  }
}
?>

This presumes my module's machine name is 'custom'. Make sure you clear all caches after adding this hook, so Drupal will pick up the hook and your new template!

Note that there are sometimes other/better ways of overriding templates in your module—for example, the views module lets you set a template directory path in hook_views_api(), and will automatically pick up templates from your module. And note again that preprocess and alter hooks are often a better way to go to accomplish small tweaks to content and markup for nodes, fields, views, etc.

Aug 17 2013
Aug 17

I have been at the Midwest Drupal Summit for the past few days, focusing on #D8CX and reducing Drupal 8's technical debt (at least, a tiny bit of it!).

Wysiwyg Linebreaks

My main goal at the conference was to port the Wysiwyg Linebreaks module to Drupal 8. I originally built the module for Drupal 6 while helping the Archdiocese of St. Louis migrate almost 50 separate Joomla-based websites into one organic-groups-driven Drupal site. Their legacy content used linebreaks (rather than markup like <p> and <br /> tags) for paragraphs of text, and when we originally enabled Wysiwyg with TinyMCE, the editor ran all the text together in one big paragraph. The Wysiwyg Linebreaks module fixes that problem by running some JavaScript that adds the required tags when an editor is attached to a textarea, and (optionally) removes the tags when the editor is detached.

The Drupal 6 and Drupal 7 versions of the module depended on the Wysiwyg module, and worked with many different editors—however, the way the linebreaks plugin was added was slightly clunky, and required a little bit of a hack to work well (see Let cross-editor plugins be button-less (aka 'extensions')).

For Drupal 8, the module simply defines an editor plugin without a button (no hacks!), and integrates with CKEditor's API (See change notice: CKEditor module added: WYSIWYG in core!).

This is the second contrib module I've ported (the first being Honeypot), and the process is relatively straightforward. The nicest thing about Drupal 8's refined architecture is that, for modules like Wysiwyg Linebreaks, you don't need to have much, if any, procedural code inside .module and .inc files. For Wysiwyg Linebreaks, there's just the JavaScript plugin code inside /js/linebreaks/linebreaks.js, and a CKEditor plugin definition inside /lib/Drupal/wysiwyg_linebreaks/Plugin/CKEditorPlugin/Linebreaks.php. Very clean!

To anyone else working on a CKEditor plugin or integration with the new Drupal 8 Editor module: The API for dealing with editors, or with CKEditor in particular, is very simple but powerful—see the 'API' section on this change notice for the Editor module, and the 'Provide additional CKEditor plugins' section on this change notice for CKEditor.

One more note: I was made aware of the issue How do we want to facilitate enabling of CKEditor for sites upgraded from Drupal 7 to Drupal 8? just after I finished committing the last fixes for the D8 version of Wysiwyg Linebreaks. This module solves the problem of legacy content that uses the autop filter ("Convert line breaks into HTML (i.e. <br> and <p>)") quite nicely—enable it, and content will look/function as it always has, with or without CKEditor enabled.

MWDS at Palantir's HQ

Bacon Donuts
Bacon Donuts at #MWDS – Palantir, you know us too well...

This was the first year I've attended the Midwest Drupal Summit at Palantir's HQ in Chicago, IL, and it was a great experience! Besides working on porting Wysiwyg Linebreaks and cleaning up Honeypot to work with Drupal 8 head, I worked on:

I was also able to meet and talk to some really awesome Drupal developers—many from Chicago and the surrounding areas, but also a bunch of people who I've met at past DrupalCons and was happy to say hello to again. Palantir provided a great atmosphere, and some amazing food (bacon donuts, good pizza, tasty sandwiches, schwarma, etc.), and even some fun games (though I was unable to stay long enough to enjoy them during the summit).

I learned a lot about Drupal 8's architecture—plugins, controllers and routes especially—and I'm excited about the things this new architecture will afford when building and migrating Drupal modules and sites (like easier/faster testing and more code portability!). While there have been legitimate gripes about the release timeline and API changes for Drupal 8, developers have a tendency to focus too much on what's missing and broken (negatives) during the current core development phase (remember D7's release cycle?), and not on the more positive meta-level view—Drupal 8 has a vastly-improved admin UI, responsive design throughout, first-class HTML5 support, a great template system, a very flexible plugin system, more sane APIs for dealing with entities and fields, etc.

We made good progress in moving Drupal 8 forward during the summit, but there's still a ways to go... And you can help! See: Technical debt in Drupal 8 (or, when will it be ready?) and help push out the first beta release!

Aug 13 2013
Aug 13

The Drupal Way

I've worked with a wide variety of developers, designers, content managers, and the other Drupal users in the past few years, and I'm pretty sure I have a handle on most of the reasons people think Drupal is a horrible platform. But before I get to that, I have to set up the rest of this post with the following quote:

There are not a hundred people in America who hate the Catholic Church. There are millions of people who hate what they wrongly believe to be the Catholic Church — which is, of course, quite a different thing.

Forgive me for diverging slightly into my faith, but this quote is from the late Fulton J. Sheen, and is in reference to the fact that so many people pour hatred on the Catholic Church not because of what the Church actually teaches, but because of what they think the Catholic Church teaches. Once someone comes to understand the actual teaching, they are free to agree or disagree with it—but there are comparatively few people who disagree with teachings they actually understand.

Similarly, the problems most people have with Drupal—and with systems like it—are problems not with Drupal, but with their perception of Drupal.

Java Jane: One-off vs. Flexible Design

A Java developer (let's call her Jane) is used to creating a bunch of base object classes and a schema for a database by hand, then deploying an application and managing the database through her own wrapper code. Jane is assigned to a Drupal project, takes one look at the database, and decides that no sane person would ever design a schema with hundreds of tables named field_data_* and field_revision_* for every single data point in the application!

Why does Drupal have So Many Database Tables?

In reality, Drupal is doing this because The Drupal Way dictates that things like field data should be: flexible (able to be used by different kinds of entities (content)), able to be translated, able to be revised with a trackable history, and able to be stored in different storage backends (e.g. MySQL, MariaDB, MongoDB, SQLite, etc.). If the fields were all stored in a per-entity table as separate columns, these different traits would be much more difficult to implement.

Thus, The Drupal Way is actually quite beneficial—if you want a flexible content management system.

I think a lot of developers hate Drupal because they know they could build a more efficient web application that only has the minimal required features they need by simply writing everything from scratch (or using a barebones framework). But what about the next 72 times you have to build the exact same thing, except slightly different each time, with a feature that's different here, translation abilities there, integration with Active Directory for user login here, integration with a dozen APIs there, etc.?

There's a maxim that goes something like: Every seasoned web developer started with plain HTML and CSS, or some hosted platform, then discovered a dynamic scripting language and built his own CMS-like system. Then, after building the CMS into a small system like many others but hopelessly insecure and unmaintainable, the developer realized that thousands of other people went through the same progression and ultimately worked together on systems like Drupal. Then said developer starts using Drupal, and the rest is history.

I know you could build a small system that beats the pants off Drupal performance-wise, and handles the three features you need done now. But why spend hours on a login form (that probably has security holes), session handling (ditto), password storage (ditto) forms in general (ditto), content CRUD interfaces, a translation system, a theme layer, etc., when you can have that out of the box, and just spend a little time making it look and behave like you want it? The shoulders of giants and all that...

.Net Neil: Letting Contrib/Bespoke Code Let You Down

A .Net developer (lets call him Neil) joins a Drupal project team after having worked on a small custom .Net application for a few years. Not only does he not know PHP (so he's learning by seeing the code already in use), he is also used to a tightly-controlled application code structure, which he knows and owns end-to-end.

After taking a peek inside the custom theme, and a couple of the Drupal modules that the team has built in the past year, .Net Neil feels like he needs to take a shower! He sees raw SQL strings mixed in with user-provided data, he sees hundreds of lines of business logic in two dozen theme template files, and he can't find a line of documentation anywhere!

Why don't you use PDO for Database queries?

Who would blame Neil for washing his hands of Drupal entirely?

However, Neil shouldn't throw out the baby with the bathwater. Unfortunately, due to PHP's (and, by extension, Drupal's) popularity, many non-programmers or junior level programmers work on Drupal sites, and know just enough PHP to be incredibly dangerous.

Now, it doesn't help that Drupal allows PHP inside template files—something that will be corrected in Drupal 8—and it doesn't help that PHP is a quirky language full of inconsistencies and security holes—something that's vastly improved in PHP 5.3+ (especially 5.4+). But while some decide that PHP is a fractal of bad design, or that they simply hate PHP (mostly because of code they've seen that's from either designers or new programmers with a lot to learn... or they have a lot of baggage from pre-PHP 5 days), I think it's best to understand that bad code is bad code regardless of the language. Using Ruby, Django, Go, Node.js, etc. does not automatically make you a better programmer. Just like writing in French doesn't make you a great author. Its just a different language that's useful for different purposes.

One more note here: in all the Drupal code I've seen, there are three levels of quality:

  • Code in Drupal Core: Drupal core is extremely well-documented, has low cyclomatic complexity, has almost full automated test coverage, and has a very high bar for code acceptance. Drupal core is not only a great example of good code in PHP-land, but across languages—especially the latest version (which is on the tail end of some pretty major refactoring).
  • Code in Contrib Modules: Contributed modules can be pretty hit-or-miss. Even with a more rigorous review process in place, many contrib modules have hacked-together code that has some subtle and not-so-subtle security and performance flaws. However, the modules used by a vast array of Drupal installations, and included with popular Distributions (like Views, Panels, Colorbox, etc.) are usually very well constructed and adhere to the Drupal coding standards. (Another good way of knowing a module is good: if Drupal.org uses it).
  • Custom code: Welcome to the wild west. I've seen some of the craziest code in custom templates, hacked installations of Drupal, hacked contrib modules, and strange custom modules that I'm amazed even compile.

When people say Drupal has a terrible security track record, they often point to lists of all Drupal-related security flaws (like this one). Unfortunately for this argument, it holds little water; a quick scan usually finds that well over half the affected modules are used by a very small share of Drupal sites, and a flaw that affects Drupal core is very rare indeed (see how rare on Drupal's security track record page).

The Drupal Way™

Jane and Neil would both come to appreciate Drupal much better if they understood why Drupal does certain things in certain ways. They would also likely appreciate the strictness and thoroughness of Drupal's Coding Standards and security guidelines, and the fact that patches for consideration in Drupal core undergo strict reviews and must pass a full suite of automated tests.

They'd probably also learn to accept some of Drupal's quirks once they realize that the people who built and are making Drupal better range from a mother-of-five-turned-hobbyist-programmer to the world's largest government organizations. Drupal can't be everything to everyone—but it's one of the most flexible web content management systems available.

I'm going to go through some of the main areas where I've seen people get derailed in their understanding of Drupal.

A lot of first-time Drupal users decide they need twenty or thirty modules to add things like share buttons, fancy blogging features, forum tweaks, etc. Eventually, many fresh Drupal sites end up with over 100 enabled modules (of varying quality), and the site takes seconds to load a single page.

This problem is the open buffet syndrome, outlined in detail here. In addition to adding way too much functionality to a site (usually making the site harder to use anyways), adding a ton of extraneous modules makes it harder to track down problems when they occur, and usually makes for slower performance and a very large memory footprint on a server.

How do you combat the open buffet? Be frugal with modules. Only enable modules you really need to help your site run. Instead of adding a module for something, create a new View for a blog page or for a special block that lists a certain type of content. For larger and more customized sites, having a custom module that performs one or two small hook_alters to change a couple things is better than enabling a beefy module that does what you need and a thousand more things besides.

Don't be a module glutton!

One more tip: Whenever you consider using a contributed module, check out its ranking on the Project usage overview page, and check how many sites are currently using the module (under the 'Project Information' heading on the project home page). If the module is only used by a few hundred sites, that could be a sign that it's not going to be updated in a timely fashion, or thoroughly vetted for performance and security issues. I'd always recommend stepping through a module's code yourself if it's not a very popular module—if it's a tangled mess of spaghetti, steer clear, or submit patches to clean it up!

Configuration and Code

Drupal's philosophy when it comes to configuration and settings is that everything, or nearly everything, should be manageable through a user interface. Many developers who work outside of web applications are used to storing a lot of configuration in code, and don't see much value to making sure everything can be configured by administrators on-the-fly. In fact, many developers scoff at such an idea, since they lose some control over the final application/site.

However, this is one of the traits of Drupal that makes it so powerful, and so beloved by site builders and people who actually use the sites developers build for them.

This presents a problem, though—if things are configurable by end-users, how do we version-control settings? How do we deal with different environments, like moving a feature from a development server to a test server, then to the live server? With Drupal <6, this was very challenging indeed, and usually required a lot of manual SQL work in update hooks. However, in Drupal 6 and 7, the situation has improved quite a bit, and in Drupal 8 and beyond, configuration management will likely be a standout feature (see: Configuration management architecture).

The Features module lets developers take things like content types, image styles, site settings, and even content itself (with the help of something like Universally Unique IDentifier), and export them to code. Then, that code can be version controlled and deployed to different environments with some simple drush commands or the click of a button in the UI. As long as the modules you're using use normal Drupal variables, or use CTools Exportables (most of the top modules do), you can use Features to keep things in sync.

Another thing that irks non-Drupal developers (especially those used to 'cowboy coding'—not using any kind of framework or system when they build sites) is the fact that the database is abstracted away. In Drupal, it should be fairly rare that a developer needs to write database queries. Almost everything within Drupal is wrapped in an API, allowing Drupal to work across a variety of platforms and backends. Instead of writing variables to the {variables} database table (and dealing with serialization and unserialization), you use variable_get() and variable_set()—these functions even take care of static caching for performance, and configuration included via settings.php. Instead of querying twenty different tables to find a list of entities that match your conditions, you use EntityFieldQuery. It may seem inefficient at first, but it's actually quite freeing—if you do things The Drupal Way, you'll spend less time worrying about databases and schemas, and more time solving interesting problems.

One more tip: If you ever see the PHP filter module enabled on a site, or something like Views PHP filter, that likely indicates someone getting lazy and not doing things The Drupal Way™. Putting PHP code into the database (as part of content, the body of a node, or as part of a view) is like pouring Mentos into Diet Coke—it's a recipe for disaster! There's always a way to do what you need to do via a .module file or your theme. Even if it's hacked together, that's a million times better than enabling the insecure, developer-brain-draining module that is the PHP filter.

Themes and the .tpl.phps of DOOM!

Drupal has had a long and rocky relationship with themers and designers—and at some times in Drupal's history, the very idea of the responsibility of a 'theme' has been unclear. One principle has always been clear, however: themes should deal with HTML markup, CSS styling, some JavaScript for the user interface, and maybe a tiny bit of PHP to help sort data into certain templates.

That last bit, however—the 'tiny bit of PHP'—has been abused very often due to the fact that Drupal has been using a custom theme engine called PHPTemplate, which allowed the use of any PHP code inside any template (.tpl.php or sometimes referred to as 'tipple fip') file.

Many themers, designers, and new Drupal developers have mangled templates and thrown all kinds of code into template files which simply doesn't belong. The idea that HTML markup and PHP code can be mixed and mashed together is something that comes out of a 'scripting' mentality that is predominant in very old versions of PHP, custom-coded PHP websites, and an old-school PHP <4 mentality. Nowadays, there should be a distinct separation between markup and styling (a theme's responsibility), and the business logic that generates data to be put into markup and styled (a module's responsibilty—or, rarely, inside a theme's template.php).

I've seen sites where there were 30+ copies of the theme's page.tpl.php file, all just to change one variable on different pages on a site. What the developer should've done is use one page.tpl.php, and implemented template_preprocess_page() (which can be invoked in either template.php, or in a module as hook_preprocess_page()). Inside that function, the developer can set the variable depending on which page is being viewed. If the developer were to continue to duplicate page templates, he'd be in a very sorry situation the first time he had to change the page markup sitewide—instead of changing it in one page template, he'd have to change it in 30+ copies, and make sure he didn't miss anything!

Don't Repeat Yourself - DRY

The DRY principle (Don't Repeat Yourself) applies very strongly to themes and templates—instead of making a bunch of duplicate templates and changing little things in each one, use hook_preprocess_hook() functions in either your theme or custom modules.

One other important note: If you're coming from Wordpress or another PHP-based CMS that often mixes together HTML markup and PHP files throughout modules, plugins, themes, etc., please try to get that concept out of your head; in Drupal, you should have one, and only one opening <?php tag inside any PHP code file, and templates (.tpl.php files) should only include the most basic PHP and Drupal theming constructs, like if, else, print(), hide() and render(). If you have any more than that in a template, that's a sign of code smell.

Thankfully, Drupal 8 will use Twig instead of PHPTemplate as the default template engine. Twig is a true templating language, and doesn't allow PHP. It's also more designer-friendly, and doesn't require a rudimentary knowledge of PHP to use—or an advanced knowledge of PHP to use well.

Code Quality

Spaces versus tabs. Putting curly braces on the same line as the if statement or the next. These are the things that will be argued ad infinitum, and these are the things that don't really matter to a compiler. But they matter greatly to a community of developers. The larger and more diverse the community, the more important they are!

Drupal developers come from around the world, from many different cultures. It's important that we have a common way of communicating, and it helps quite a bit if we all use certain standards when we share code.

Since the mid-2000s, the Drupal community has banded together to make and enforce some very thorough coding standards for PHP, JavaScript, CSS, and other code used in Drupal core and contributed projects. The community is in ongoing discussions about code quality and review processes, and continues to adapt to modern software development best practices, and does a great job of teaching these practices to thousands of new developers every release.

Since early in the Drupal 7 development cycle, the Drupal community has written automated tests to cover almost all of Drupal core and many large contributed projects, and has built testing infrastructure to ensure all patches and bugfixes are thoroughly tested before being accepted.

Since early in the Drupal 8 development cycle, the Drupal community has used the concept of core gates and issue count thresholds, as well as divided responsibilities in different core initiatives, to ensure that development didn't get too scattered or start making Drupal core unstable and incoherent. Drupal 8, though in alpha stages, is already very stable, and is looking to be the most bug-free and coherent release yet.

Drupal's strict coding standards already match up pretty well with the suggested PSR standards from the PHP Framework Interop Group, and Drupal 8 and beyond will be taking future PSRs into account as well. This will help the Drupal community integrate more easily into the larger PHP world. By following standards and best practices, less time is spent trying to get individual PHP classes, methods, and configurations to work together, and more time is spent creating amazing websites, applications, and other products.

One tip: The Coder module will help you to review how well your own code (PHP, JS and CSS) follows the Drupal Coding standards. It also helps you make sure you're using best practices when it comes to writing secure code (though automated tools are never a perfect substitute for knowing and writing secure code manually!).

Even further: Many developers who work with PHP-based systems seem to have followed the progression of designer -> themer -> site builder -> developer, and thus don't have a strong background in software architecture or actual 'hard' programming (thus many ridicule the PHP community as being a bunch of amateur programmers... and they're often right!). I'd suggest trying to work on some small apps in other languages as well (might I suggest Node.js, Go, Java, or Ruby), to get a feel for different architectures, and learn what is meant by terms like SOLID, DRY, TDD, BDD, Loose coupling, YAGNI, etc.

Hacking Core and Contrib modules

Every time you hack core, God kills a kitten. Please, consider the kittens.

The above image comes from an idea originally presented at DrupalCon Szeged 2008 by Greg Dunlap. It goes like this: Every line of code you change in Drupal core or one of the contributed modules you're using will add many man-hours spent tracking the 'hack' over time, make upgrading your site more difficult, and introduce unforeseen security holes and performance regressions.

The times when actually modifying a line of code anywhere outside your custom module or theme's folder is a good idea are extremely rare.

If you find you are unable to do something with Drupal core or a contributed module to make it work the way you want, either you haven't yet learned how to do it the right way, or you found a bug. Drupal is extremely flexible with all it's core hooks, alter hooks, preprocess functions, overrides, etc., and chances are, there's a more Drupalish way of doing what you're trying to do.

On the rare occasion where you do have a problem that can only be fixed by patching core or a contrib module, you should do the following:

  1. Search the project's issue queues to see if someone else had the same problem (chances are you're not the first!).
  2. If you found an issue describing the same problem, see if the issue is resolved or still open:
    • If the issue is resolved, you might need to download a later -dev release to fix the problem.
    • If the issue is not resolved, see if there's a patch you can use to fix the problem, test the patch, and post back whether the patch resolves your problem, so the patch progresses towards being accepted.
    • If the issue is not resolved and there is no patch to fix the problem, work on a patch and submit it to the issue queue.

The key takeaway here is the idea of investing in patches. If you find an actual bug or would like to see some improvement to either Drupal core or a contributed project, you should either test and push forward existing patches, or contribute a patch to get your problem resolved.

When you do things this way, you no longer operate on an island, and you'll benefit from community feedback and improvements to your patch. In addition, by only using patches that are tracked on a drupal.org issue, you can track your patches more easily. On the rare occasion when I need to use a patch, I put the patch file (named [issue_number]-[comment_number].patch) into 'sites/all/core-patches' directory, and then add an entry in a 'Patches' file along with a link to the issue, a description of the patch, and why it is necessary.

Participating in the Drupal Community

In the previous section, I mentioned the idea of not being an island when developing with Drupal. How true this is! You're using software that's built by thousands of developers, and used by millions. There are people working on Drupal from every continent, and this diverse community is one of the most positive aspects of Drupal.

On Drupal.org's front page, the first line of text reads:

Come for the software, stay for the community.

With so many people using and building Drupal, chances are you aren't the first person to encounter a particular problem, or build a certain piece of functionality. And if you can't find a module or a simple built-in way to do something you need to do, there are plenty of places to go for help:

And these are just a few of the places where you can discover community and get help!

As I said before: don't be an island. With proprietary, closed-source software, you don't have anywhere to go except official (and expensive) vendor support. With Drupal, you get the code, you get to talk to the people who wrote the code, and you can even help make the code better!

Global state / Assuming too much

Not every request for a Drupal resource (most often a path defined in hook_menu()) comes from a web browser, and many variables and things you assume are always available are not. A lot of developers forget this, and write code that assumes a lot of global state that will be missing at certain times—if drush (or the command line in general) is in use, if data is being retrieved via AJAX, or if a data is being retrieved by some other service.

Always use Drupal's API functionality instead of things like $_GLOBALS and $_GET. To get the current URL path of the page being viewed, use current_path(). To use dynamic URL paths, use paths and the arg() function or Drupal's built-in menu router instead of adding a bunch of query parameters.

Additionally, use Drupal's menu router system and Form API to the fullest extent. When you define a menu item in hook_menu(), you can pass an access callback which integrates with Drupal's menu access system and lets you determine whether a given user has access (return TRUE) or not (return FALSE). Drupal takes care of outputting the proper headers and access denied page for you. When building forms, use the built-in validation and submit callback functionality, along with helper functions like form_set_error(). Using APIs that are already built into Drupal saves you time and code, and usually ensures your forms, content, etc. is more secure and more performant.

Finally, always enable logging (typically via syslog on production servers, or logging errors to the screen in development environments) and check your logs over time to make sure you're not generating a bunch of errors in your custom code.

Drupal 8 will be dropping some bits of global state that are often abused in Drupal 7 and below—the use of the global $user object is discouraged, and $_GET['q'] won't be available at all! Use the API, Luke, and the force will be with you.

The Drop is Always Moving

Though this post is one of the longest I've written on this blog, it barely scratches the surface of a full understanding of The Drupal Way™. The only way to start wrapping your head around how to do things properly with Drupal is to build a site with Drupal. And another site, and another, etc. Then build some modules, and some themes. Build an installation profile or two. Learn drush. Contribute to Drupal core.

Every day, learn something new about Drupal. You'll find that Drupal is a constantly-evolving (and improving!) ecosystem. The best practice today may be slightly different tomorrow—and with Drupal 8 just around the corner, there are many exciting opportunities to learn!

Related Posts from Elsewhere

Discuss this post on Hacker News, Reddit, or below...

Jun 27 2013
Jun 27

BoostI'm a huge fan of Boost for Drupal; the module generates static HTML pages for nodes and other pages on your Drupal site so Apache can serve anonymous visitors the static pages without touching PHP or Drupal, thus allowing a normal web server (especially on cheaper shared hosting) to serve thousands instead of tens of visitors per second (or worse!).

For Drupal 7, though, Boost was rewritten and substantially simplified. This was great in that it made Boost more stable, faster, and easier to configure, but it also meant that the integrated cache expiration functionality was dumbed down and didn't really exist at all for a long time. I wrote the Boost Expire module to make it easy for sites using Boost to have the static HTML cache cleared when someone created, updated, or deleted a node or comment, among other things.

However, the Cache Expiration module has finally gotten solid Boost module integration (through hook_cache_expire()) in version 7.x-2.x, and the time has come for all users of Boost Expire to switch to the more robust and flexible Cache Expiration module (see issue). Here's how to do it:

  1. Disable and uninstall the Boost Expire module (then delete it, if you wish).
  2. Download and enable the Cache Expiration module (make sure Boost is still enabled).
  3. Visit the Cache Expiration configuration page (admin/config/development/performance/expire), and set the following options:
    • Module status: select 'External expiration' to enable cache expiration for the Boost module.
    • Node expiration: check all three checkboxes under Node actions, and make sure the 'Node page' checkbox is checked below.
    • Comment expiration: check all five checkboxes under Comment actions, and make sure the 'Comment page' and 'Comment's node page' checkboxes are checked below.

For the visually inclined, see the screenshots in this comment.

I'd like to thank the 750+ users of Boost Expire for helping me make it a great and robust stopgap solution until Cache Expiration 'cached' up (heh) with Boost in D7, and the author of and contributors to both Boost and Cache Expiration for making some great and powerful tools to make Drupal sites fly!

If you're interested in some other ways to make your Drupal site faster, check out the article Drupal Performance White Paper (still in development) on my personal website.

Jun 25 2013
Jun 25

[Update: And, as quickly as I finished writing this post, I thought to myself, "surely, this would be a good thing to have drush do out-of-the-box. And... it already does, making my work on this shell script null and void. I present to you: drush sql-drop! Oh, well.]

When I'm creating or updating an installation profile/distribution for Drupal, I need to reinstall Drupal over and over again. Doing this requires a few simple steps: drop/recreate the database (or drop all db tables), then drush site-install (shortcut: si) with the proper arguments to install the site again.

In the past, I've often had Sequel Pro running in the background on my Mac, and I'd select all the database tables, right-click, choose 'Delete Tables', then have to click again on a button to confirm the deletion. This took maybe 10-20 seconds, depending on whether I already had Sequel Pro running, and how good my mouse muscles were working.

I created a simple shell script that works with MAMP/MAMP Pro on the Mac (but can easily be modified to work in other environments by changing a few variables), which simply drops all tables for a given database:

#!/bin/bash
#
# Drop all tables from a given database.
#

# Some variables.
MYSQL=/Applications/MAMP/Library/bin/mysql
AWK=$(which awk)
GREP=$(which grep)
USER="valid-username-here"
PASSWORD="your-password-here"

# Database (argument provided by user).
DATABASE="$1"

# Require the database argument.
[ $# -eq 0 ] && {
  echo "Please specify a valid MySQL database: $0 [database_goes_here]" ;
  exit 1;
}

# Drop the given database with mysql on the commind line.
TABLES=$($MYSQL -u $USER -p$PASSWORD $DATABASE -e 'show tables' | $AWK '{ print $1}' | $GREP -v '^Tables')
for TABLE in $TABLES
do
  # echo "Deleting $TABLE table from $DATABASE..."
  $MYSQL -u $USER -p$PASSWORD $DATABASE -e "DROP TABLE $TABLE"
done

I named the script wipe-db.sh, and you can call it like so: $ /path/to/wipe-db.sh database-name. I added a symlink to the script inside my /usr/local/bin folder so I can just type in 'wipe-db' in the Terminal instead of entering the full path. To add the symlink:

$ ln -s /path/to/wipe-db.sh /usr/local/bin/wipe-db

Now I can wipe the database tables within a couple seconds, since I always have Terminal running, and I never have to reach for the mouse!

Apr 11 2013
Apr 11

Edit: There's a module for that™ now: Pingdom RUM. The information below is for historical context only. Use the module instead, since it makes this a heck of a lot simpler.

Pingdom just announced that their Real User Monitoring service is now available for all Pingdom accounts—including monitoring on one site for free accounts!

This is a great opportunity for you to start making page-specific measurements of page load performance on your Drupal site.

To get started, log into your Pingdom account (or create one, if you don't have one already), then click on the "RUM" tab. Add a site for Real User Monitoring, and then Pingdom will give you a <script> tag, which you then need to insert into the markup on your Drupal site's pages.

The easiest way to do this is to use drupal_add_html_head() within the page_alter hook (in your theme's template.php, or in a custom module):

<?php
/**
* Implements hook_page_alter().
*/
function THEMENAME_page_alter($page) {
 
// Add Pingdom RUM code.
 
$pingdom_rum = array(
   
'#type' => 'html_tag',
   
'#tag' => 'script',
   
'#attributes' => array(
     
'type' => 'application/javascript',
    ),
   
'#value' => '[SCRIPT TAG CONTENT HERE]',
  );
 
drupal_add_html_head($pingdom_rum, 'pingdom_rum');
}
?>

Replace THEMENAME with your module or theme name, and [SCRIPT TAG CONTENT HERE] with the content of the pingdom script tag (excluding the two <script> tags at the beginning and end).

Once you've done this, go back to Pingdom, and you can view page load times in real-time:

Pingdom RUM monitoring graph

Pretty nifty!

Note: If you're looking for a great website monitoring service that's a bit simpler and cheaper than something like Pingdom (which we love!), check out Server Check.in :)

Feb 28 2013
Feb 28

Druplicon at DrupalCon - balloonDrupalCon Portland is only a couple months away (early bird registration ends soon, so get your tickets if you haven't already!), and I'll be headed out that way. If this will be your first time attending a DrupalCon, be sure to read my First Timer's Guide to DrupalCon from last year.

At this year's DrupalCon, I'm excited to hear about everything going on with Drupal 8, as we're nearing the end of the development cycle, and a release candidate is on the not-too-distant horizon.

After having a baby and shying away from much Drupal contrib/core work, I finally had some time in the past few weeks to get up to speed on many of the Drupal changes that have been committed in the past month or so—and boy are they amazing (CKEditor in core, new node edit form, new responsive layouts, new admin toolbar, config, views, etc.)!

In addition, since the feature freeze deadline has passed, I decided to try porting Honeypot (a popular spam bot-fighting module) to Drupal 8. So far, most everything works, but I'm still working on making sure new configuration changes are accounted for.

I'd love to talk about everything I've learned while developing Honeypot and running some small—and large—community websites (juicy targets for human and non-human spammers!). To that end, I've submitted the session , and would love to hear (in the session's comments) anything specific you'd like to learn more about. Spam is a difficult problem, but there are many weapons you have to fight it! I'll go through all that and more during the session, if it's accepted.

Also, if you're a daring soul, and would like to help me get Honeypot up and running well in Drupal 8, download Drupal 8 and Honeypot 8.x-dev, and give it a whirl! Hopefully more module and theme maintainers will start porting their projects to Drupal 8 under the banner of #D8CX, now that core is feature frozen!

Feb 14 2013
Feb 14

PHP's command line interface doesn't respect the max_execution_time limit within your php.ini settings. This can be both a blessing and a curse (but more often the latter). There are some drush scripts that I run concurrently for batch operations that I want to make sure don't run away from me, because they perform database operations and network calls, and can sometimes slow down and block other operations.

Memory usage - PHP and MySQL locked from runaway threads
Can you tell when the batch got backlogged? CPU usage spiked to 20, and threads went from 100 to 400.

I found that some large batch operations (where there are hundreds of thousands of items to work on) would hold the server hostage and cause a major slowdown, and when I went to the command line and ran:

$ drush @site-alias ev "print ini_get('max_execution_time');"

I found that the execution time was set to 0. Looking in PHP's Docs for max_execution_time, I found that this is by design:

This sets the maximum time in seconds a script is allowed to run before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. The default setting is 30. When running PHP from the command line the default setting is 0.

I couldn't set a max_execution_time for the CLI in php.ini, unfortunately, so I simply added the following to my site's settings.php:

<?php
ini_set
('max_execution_time', 180); // Set max execution time explicitly.
?>

This sets the execution time explicitly whenever drush bootstraps drupal. And now, in my drush-powered function, I can check for the max_execution_time and use that as a baseline to measure against whether I should continue processing the batch or stop. I need to do this since I have drush run a bunch of concurrent threads for this particular batch process (and it continues all day every day).

Now the server is much happier, since I don't get hundreds of threads that end up locking the MySQL server during huge operations. I can set drush to only run every 3 minutes, and it will only create a few threads that die around the next time another drush operation is called.

Feb 12 2013
Feb 12

...there's a site for that.

Simply Test.me Screenshot

I just found out about SimplyTest.me today, and it allows you to, well, simply test any Drupal.org-hosted module, theme, or distribution in seconds.

No longer do you need to spin up (or maintain) a live website locally (which usually takes an extra minute or two—at least) just to check out a module or make sure a theme or distribution fits your needs before using it on a live or development site.

Instead of simply getting a screen shot or trying a theme on a demo site, you get a full Drupal website set up and configured with the module/theme/distro (as well as it's dependencies), so you can play with it to your heart's content (for 30 minutes if you don't have an account on the site, an hour if you do).

According to the site's Q&A page, Drupal 6, 7, and 8 are all supported—even with sandbox projects! You can read more about the architecture and service implementation on the simplytest.me project page on Drupal.org.

Check it out, and thank Patrick Drotleff and all the sponsors (who help provide the hosting) for the hard work on this awesome tool!

[Update: There's also a great post on the Comm Press Blog about how you can test patches quickly and easily using Simply Test.me: Everyone can test patches. Really! simplytest.me to the rescue.]

Jan 04 2013
Jan 04

Some random bits of news from Midwestern Mac, LLC:

St. Louis-area Drupal Group

After taking a hiatus for the month of December, the St. Louis area Drupal Group will be meeting up (hopefully) on the third Thursday of the month as normal. We're hoping to have more structure to our meetups, and there are already some great ideas for meeting topics in 2013.

If you live in or around St. Louis and use or contribute to Drupal, please make an effort to join us and build up the Drupal community here in St. Louis!

As an aside, we still have a separate website for the St. Louis Drupal group—if anyone has ideas for how we can use that to spread the Drupal love in the center of the U.S., please let us know!

Server Check.in Launched

A couple weeks ago, we (Midwestern Mac, LLC) announced our newest service, Server Check.in, a website and server monitoring service that checks on your sites and servers every 10 minutes, notifying you of any problems. The service runs on Drupal, and integrates with services like Twilio and Stripe to handle SMS messaging and payments, respectively.

I (geerlingguy) wrote up a case study for Server Check.in and posted it to the Community showcase on drupal.org. This is the first application-type service built on by Midwestern Mac on Drupal, and we've already been hard at work improving the service.

If you have any questions about Server Check.in, or how it was built, please ask away; I had a great discussion with some other developers in this thread on Hacker News.

Hosted Apache Solr Search updated to 3.6.x

At the request of many people who wanted to do some neat new things with Solr on their Drupal sites, we've finally followed Acquia's lead and updated some of our Solr search servers to 3.6.x, meaning things like Location-based searching are now possible. And our servers are happier :)

Nov 06 2012
Nov 06

I was recently browsing a very popular review website, when I noticed the following warnings popping up:

Angie's List website errors

From simply loading their web page and seeing these error messages, I could conclude:

  1. The website is using Drupal.
  2. The website is using memcached.
  3. The website is running on Acquia's managed hosting cloud.
  4. The website has error reporting set to print all errors to the screen.

If I were trying to break into this review site, or cause them a bad day, the information presented in this simple error message would help me quickly tailor my attacks to become much more potent than if I started from a blank slate.

Security through obscurity

I will quickly point out that security through obscurity—thinking you're more secure simply because certain information about your website is kept secret—is no security at all. However, that doesn't mean that obscurity is not an important part of your site's security.

Simply because the site above doesn't have the 'display no error messages' setting enabled on the live website, I was able to learn quite a bit about the site. I could've probably found more 'helpful' error messages had I spent a little more time investigating.

At least the site's server-status page is protected! (Many sites leave the Apache server-status page open, exposing a ton of potentially dangerous details).

Keeping certain things secret, like errors that occur on your site, the version of a particular CMS, plugin, module, or theme of your website, or status reporting information, does improve your site's security. It won't prevent a dedicated intruder, but it will definitely slow him down, and will likely deter less-dedicated intruders.

To contribute to the overall security of your website, you should do the following:

  • Make sure server and configuration status pages are secure from outside access. If you need to expose phpinfo() or server-status, make sure only you have access.
  • Turn off error message printing on the screen on your publicly-accessible sites. Only turn on this feature on development or testing sites. (You should still log error messages, but do this behind the scenes, using syslog or some other logging facility).
  • Protect your server configuration, error, and log files from prying eyes; even backups of these files can be a security hole.

Hardening your defenses

Of course, as I mentioned above, security through obscurity is no security at all. Even if someone were to know every detail about your server configuration and setup, your site should still be secure. The following are some essential steps to ensuring the security of your website:

  • Apply patches and updates routinely. Most systems have automatic update systems, or at least notify you when an update is available.
  • Have an outside consultant evaluate the security of any custom code or interfaces you provide (especially for custom forms, API interaction, and file handling).
  • Use automated tools like Fail2Ban on your servers to make sure repeated attempts to access your servers are blocked.
  • Know your options when it comes to spam filters and flood controls; Drupal, as an example, has a plethora of excellent modules and configuration settings to prevent certain security holes from being opened. There's even a nice Security Review module that looks at common site configuration problems and warns you if they're incorrect.
Oct 01 2012
Oct 01

Most people who have grown up on the web, and have used Wysiwyg utilities online, or newer text editors/word processing applications are used to having a simple 'return' create a new paragraph, with (on average) one extra line of empty space between the new paragraph and the one before it.

However, a lot of people like having the 'return' key just go down one line. There are a few ways this is possible in most Wysiwygs:

  • You can change the block style from 'Paragraph' (which creates <p> tags around new lines of text) to 'div' (which creates <div> tags around new lines of text).
  • You can press Shift + Return when you want to just go down one line (using a <br /> tag instead of a <p> tag).

I use the second method when I'm using a Wysiwyg, as I like using paragraphs (which are semantic for text, and which allow for better CSS styling than a monolithic block of text with linebreaks). I also rarely use a Wysiwyg editor, so it's not really an issue for me anyways ;-)

But, some people ask me if they can set up TinyMCE to use line breaks instead of paragraph returns by default, so they don't have to hit Shift + Return all the time (instead, they hit 'Enter Enter'... more keystrokes, but whatever floats their boat!).

Well, as it turns out, TinyMCE does have a setting for this, called forced_root_block. And Drupal's Wysiwyg module allows you to pass along this setting to TinyMCE when TinyMCE is loaded on a page, using hook_wysiwyg_editor_settings_alter() like so (in a custom module):

<?php
/**
* Implements hook_wysiwyg_editor_settings_alter().
*
* Sets defaults for TinyMCE editor on startup.
*/
function custom_wysiwyg_editor_settings_alter(&$settings, $context) {
  if (
$context['profile']->editor == 'tinymce') {
   
// Force linebreaks instead of paragraph returns.
   
$settings['forced_root_block'] = FALSE;
  }
}
?>

Sep 25 2012
Sep 25

I just sent a new note to the Flocknote Development list about making Flocknote speedier. Flocknote is a very complex web application, and at the beginning of this summer, I noticed that some pages were taking more than a second to generate on our server (that's before the page would be sent to the end user!).

Investigating the performance problems using MySQL's EXPLAIN, the PHP profiler XHProf, and Drupal's Devel module, I found the culprits to be some inefficient and memory-hungry caches and some inefficient database queries. Applying a couple patches that are in development for Drupal, and adding a couple indexes on different tables more than halved average page load time.

I also am actively trying to get these patches accepted into Drupal core and the Views module. Once the patches are incorporated, millions of other Drupal websites and applications will be able to conserve memory and clock cycles as well. You could easily substitute 'Wordpress', 'Joomla', 'DotNetNuke', or any other CMS or platform for 'Drupal' here.

When we shave milliseconds off page load times, or optimize CSS and JavaScript to conserve CPU time on an end user's computer or mobile device, we are not only making end users happier, we're effectively:

  • Conserving battery life, and thus recharging time—reducing power demands altogether.
  • Making end users enjoy (and thus continue) using our websites and products.
  • Allowing for more free memory and CPU time on our servers, which in turn increases capacity.

These are very real benefits of pursuing better performance. Do you performance test your code when you add new features? Do you run something like Google PageSpeed to make sure your fancy new scripted widget doesn't kill performance on older Android devices, iPhones, and PCs?

Just like with rampant misuse of Adobe Flash everywhere in the early part of this millenium, many people seem to be adding features, effects and widgets wily-nily to their sites and platforms with little regard for their frying servers or those using the sites. Do you really need a 3D tag cloud on your site, when it costs tons more time to generate on the backend, and tons of time to render in a browser?

Consider learning about improved performance techniques and incorporating performance testing in all the development you do—no matter what kind of software platform or website you're building. And if you can help large web platforms like Drupal, Wordpress and Joomla work faster using less memory, that's a win for everyone.

Sep 05 2012
Sep 05

One Drupal site I manage has seen MySQL data throughput numbers rising constantly for the past year or so, and the site's page generation times have become progressively slower. After profiling the code with XHProf and monitoring query times on a staging server using Devel's query log, I found that there were a few queries that were running on pretty much every page load, grabbing data from cache tables with 5-10 MB in certain rows.

The two main culprits were cache_views and cache_field. These two tables alone contained more than 16MB of data, which was queried on almost every page request. There's an issue on drupal.org (_field_info_collate_fields() memory usage) to address the poor performance of field info caching for sites with more than a few fields, but I haven't found anything about better views caching strategies.

Knowing that these two tables, along with the system cache table, were queried on almost every page request, I decided I needed a way to cache the data so MySQL didn't have to spend so much time passing the cached data back to Drupal. Can you guess, in the following graph, when I started caching these things?

MySQL Throughput graph - munin

APC, Memcached, MySQL Query Cache?

If this site were running on multiple servers, or had a bit more infrastructure behind it, I would consider using memcached, which is a great caching system to run in front of MySQL, especially if you want to cache a ton of things and have a scalable caching solution (read this story for more). Running on one server, though, memcached doesn't offer a huge benefit compared to just using MySQL's query cache and tuning the innodb_buffer_pool_size so more queries come directly from memory. Memcached incurs a slight overhead due to the fact that data is transferred over a TCP socket (even if it's running on localhost).

MySQL's query cache is nice, but doesn't offer a huge speed benefit compared to how much more memory it needs to store a lot of queries.

I've often used APC (an opcode cache for PHP) to cache all a site's compiled PHP files in memory so they don't need to be re-read and compiled from disk on every page request (for most Drupal sites, if you're not already using APC for this purpose, you should be; unless you're using fast SSDs or a super-fast RAID array (and even in that case), APC will give probably a 20-50% gain in page load times).

However, I've never used APCs 'user cache' before, since I normally let APC run and don't want to worry about fragmentation or purging.

APC User Cache

There's a handy Drupal module, APC, which lets you configure Drupal to store certain caches in APC instead of in the database, meaning Drupal can read certain caches directly from RAM, in a highly-optimized key-value cache. APC caching is suited best for caches that don't change frequently (otherwise, you could slow things down due to frequent purging and fragmentation).

Some good candidates I've found include:

  • cache (includes entity_info, filter_formats, image_styles, and the theme_registry, many of which are queried every page load).
  • cache_bootstrap (includes system_list and variables, queried every page load).
  • cache_field (queried whenever field data is needed, grows proportionally to how many fields + instances you have).
  • cache_views (queried whenever a view is loaded—even if your views are all stored in code).

You may find some other caches that are suitable for APC, but when you've decided which caches you'd like in APC, count up the data sizes of all the tables after the cache is warm, and then double that value. This is how many MB you should add to your existing apc.shm_size variable (usually in apc.ini somewhere on your server) to give a good overhead for user cache objects.

Monitor the APC cache size and usage (especially the free space and fragmentation amounts) using either the apc.php file included with APC (instructions), or using something like munin-php-apc along with munin monitoring. Make sure you have a good ratio of available vs. fragmented memory (more blue than orange, in the graph below):

Munin - APC Memory Usage Graph

When NOT to Use APC

APC is awesome for single-server setups. Especially if you have a site with relatively steady traffic, growing organically. APC is NOT helpful when you know you're going to need to scale quickly and will be adding servers (APC only benefits the server on which it's running). For a site that will exceed it's current capacity quickly, you'll probably want to consider first splitting your web server (Apache/PHP) from your MySQL server (but put them both in the same datacenter and connect via a private network), then consider adding a memcached server between the web and database server. From there, you can start adding more memcached servers and database slave servers as needed.

APC is also not very helpful if you don't have enough RAM on your server to store the cached objects (opcode + user cache objects) with at least 20-40% overhead (free space). In almost every situation, the default 32M apc.shm_size won't cut it, and in some cases, you'll need to push 128M or 256M before the server can run swiftly with a normal amount of fragmentation and purges.

Conclusion

It's always important to benchmark and profile everything. It's no use caching things in APC if you have a database query that takes 2 seconds to run, or an external web service call that takes 5! Once you've done things like tune database queries, check for obvious front-end performance flaws, and have your page load down to a couple seconds or less, start working on your caching strategy. APC isn't a good fit for everyone, but in this case, page generation times were cut at least 30% across the board and MySQL data throughput was cut by more than half!

A few important notes if you choose this route:

  • Drush/CLI operations will effectively rebuild the APC cache for the command line every time they run, due to the way APC works (if apc.enable_cli is turned on). However, it seems to have no effect on the separate APC cache for non-cli PHP.
  • Make SURE you monitor your APC memory usage, fragmentation, and purges. If you don't have about twice the required RAM allocated to APC, fragmentation and frequent purging might very well negate any significant performance benefit from using APC.
  • Read through this Stack Overflow question for some more good notes on APC settings: Best APC settings to reduce page execution time.

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web