Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

With or without PHP, we're going to scale this site!

Parent Feed: 

One of the mantras for scalability on the web is "divide and conquer". For Drupal and other PHP-bsaed sites, it's common to eschew mod_php and run PHP as a CGI process managed by mod_fcgid. This allows apache to keep its processes lightweight and nimble so that they can efficiently handle static requests for images, css, and javascript files. This concept takes on many forms, including replacing Apache with an alternative web server such as nginx, or preventing those static requests from reaching the web server at all by placing a proxy cache such as Squid or Varnish on the front lines.

These implementations vary but the philosophy is the same: not all requests are created equal, and when scalability is the name of the game you should handle different types of requests with core competency solutions. Beyond static vs. dynamic, there's a significant benefit to tailoring your installation based on the nature of your PHP requests.

We use the Ad module for many of our ad-supported Drupal sites. It's a good fit over working with a third-party advertising service because site administrators can control ads and content in one place. It's easier to honor your privacy policy when your visitors' data isn't being farmed out to parties unknown. You can control the appearance of ads based on specific and fine-grained content relationships, and you can enable features like selling an ad directly from your site.

But serving these ads can put some serious load on your server. In order to support ad counters for tracking statistics, "something smart" must be in control of delivering an ad to the user and tracking the request. Based on the current architecture of the ad module, "something smart" means passing each ad through a PHP script. This means that each visit to every page of the site causes several, sometimes dozens of concurrent PHP requests to your site. Because they're all PHP, none of these requests can be intercepted by Apache, ngnix, Varnish, or Squid.

One site in particular is the Twin Cities Daily Planet, which is a growing news network for the Minneapolis and St. Paul Metro area. Each day, the Daily Planet publishes original articles and blog entries and also republishes several articles and blog entries from our 100+ community media partners. Their traffic has steadily doubled year by year, and they continue to expand their reach by implementing new features. We try to keep a lean and mean PHP installation, but new features often mean enabling PHP extensions for json, xml, image handling, hashing and file manipulation, which add resource consumption to every PHP process. There's also a whole lot of Drupal going on: APC caching loaded modules and data, plus whatever data is left behind after bootstrapping Drupal and handling its request.

Now, the ad counters aren't served up by Drupal directly. That is, they don't go through index.php, but rather through the serve.php file that is included in the ad module. serve.php serves and tracks an ad without necessitating the call to drupal_bootstrap() that takes on the resource impact of loading up Drupal.

That's great and all, but we're still burdened by same problem that a mod_fcgid configuration is designed to address: lots of bloated processes are hanging around in the same pool, waiting to serve up a mixture of lightweight and resource-hungry requests. We're spawning hundreds of PHP processes, and each process can reach 100MB of resident memory during its lifetime. Every page request is a rallying cry for a dozen of these processes to spin up and start lumbering towards those ad requests.

My first thought for this was to set up a separate VirtualHost for ad serving, but that sounded like a huge pain for an already-running site. And then it dawned on me what the design goals for mod_fcgid actually are: it's not about separating PHP from static requests, it's about designating different process groups for different applications. That's exactly what I want!

My original configuration has the following, per mod_fcgid's documentation:


<VirtualHost *:80>
...
FCgidWrapper /www/tcdailyplanet.net/bin/php .php
</VirtualHost>

And the wrapper script at /www/tcdailyplanet.net/bin/php looks something like this, also per the documentation:


#!/bin/sh
CONF="`path/to/php.ini"
exec /usr/local/bin/php-cgi -c $CONF

This has us running every request for anything that uses PHP through the same configuration. Instead, we want to run ad requests through a separate PHP wrapper. So we add this to the end of our apache configuration file:


<Files ~ serve.php$>
FCgidWrapper /www/tcdailyplanet.net/ad/bin/php .php
</Files>

Now, any request for the file serve.php uses the alternative php wrapper script that lives in the 'ad' folder instead of the default wrapper. Automatically, this reduces overhead because the PHP processes for serve.php remain unencumbered by the baggage of handling a lot of Drupal page requests.

I can go even further: because we're using a different wrapper script, I can also specify a separate php.ini file. In the php.ini file for serve.php, I have omitted any php extensions that aren't pertinent to counting ads. Gone are json, geoip, xml, zip, and anything else that's not applicable to incrementin an integer.

As a result, each PHP process in in the serve.php pool consumes only 3-5MB of resident RAM. Occasionally, when it's time for the the ad process to dump its statistics to the database, it executes a drupal_bootstrap(). This results in one process increasing to about 24MB of RAM until it eventually dies off. A small price to pay!

This radically changed the way the site performs. We freed approximately 1.5GB of RAM overall, and our anonymous page generation times went from a 2-second average down to about 50 milliseconds. As for load average? The included graph shows a pretty clear indication of "before" and "after".

Partitioning our requests in this manner required no new software and only a few lines of Apache configuration, which is great for growing organizations that can't just throw hardware at a problem.

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web