Jul 16 2016
Jul 16

Sounds like a bad design? When the first time I found out this, I thought that we should have avoided it in design. But, that is not what we are talking today. After we figured out a way to fix the performance, it seems quite a powerful way to deal with the business logic.

How to accommodate the request that a node holds thousands of multiple value items in one field? When it comes to editor experience, we have something to share. Multiple values field for a field-collection-item is a usual setup for a content type. When there are only couple dozens of values, everything is good. The default field-collection embed widget with a multivalue field is working well.

As the number of items goes up, the editing page become heavier. In our case, we have a field collection contains five subfields. There is one entity reference field pointing to nodes, two text fields, one taxonomy entity reference field and a number field. Some nodes have over 300 such field collection items. The editing pages for those nodes are like taking forever to load. Updating the node getting more and more difficult.

for such a node, the edit form has thousands of form elements. It is like loading an adult elephant with a small pickup truck. Anything can slow down the page. That can be from web server performance to the network bandwidth and our local computer browser capability. So, we need to find a valid way to handle it. We want the multiple value field to be truly unlimited. Make it capable of holding thousands of field-collection-items value in a single node.

After doing some research, we come with a pretty good solution. Here is what we had done to deal with it.

We use Embeded Views Field to build a block for the field collection items. We paginate it and break down 300 items into 12 pages. Then, we insert the views block into the node editing page. Not loading all the elements into the node editing form, it speeds up the page loading immediately. Display the field collection items in views block is not enough, we need to edit them. I had tried to use the VBO to handle editing and deleting. It did not work. Then we built some custom ajax functions for editing and deleting. We use the ctools modal window as front end interface to edit, delete and add new items. That works well. With modal window and Ajax, we can keep the main node edit page untouched. There is no need to refresh the page every time they change the field-collection-items. Thanks to the pagination of the views block. We now can add as many items as we want into the field collection multivalue field. We added views sorting function to the embedded views field.

Sounds pretty robust, but wait, there is something missing. We quickly running into problem soon we implement it. What about the form to create a new node? On the new node page, the embedded views field block is not working. A new node does not have its node id. We fixed it by using the default widget. It is just for the new node page. We used the following function to switch the field widget.

function MODULENAME_field_widget_properties_alter(&$widget, $context) {
  if ($context['entity_type'] == 'node') {
    if (!isset($context['entity']->nid)) {
      if ($context['field']['field_name'] == 'FIELD_MACHINE_NAME') {
        if ($widget['type'] == 'field_collection_hidden') {
          $widget['type'] = 'field_collection_embed';
        }
      }
    }
  }
}

Apr 03 2015
Apr 03

Drupal field was part of the Drupal core since version 7. The Field extends her ability to build different kinds of systems. Since it is basic units of each entity, it is one of the most important parts of the open source software. But, when it comes to the efficiency of using SQL storage engine, the field can still do better with efficiency. I sincerely believe that we may not afford to ignore it. Let put it under a microscope had a close look at field SQL storage.

Case study:

I had built a patient scheduling system for a couple clinic offices. The project itself is not complicated. I have attached the patient profile picture on this article. We built a patient profile node type on the form. It is not a complicated form, but there are over 40 fields. It is not difficult to set up a nice patient profile node form. I also created appointment node type that connected patient profile and doctor profile with entity reference fields. Using views with exposed filter for the various reports.

It was the project where I find the issue. I am a little bit uncomfortable after I take a close look at the database. Each field has two almost identical tables. I think fields took too much unnecessary database space. I have dumped one of the fields database information to explain my concern.

1) Base table: field_data_field_initial

+----------------------+------------------+------+-----+---------+-------+
| Field                | Type             | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+-------+
| entity_type          | varchar(128)     | NO   | PRI |         |       |
| bundle               | varchar(128)     | NO   | MUL |         |       |
| deleted              | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id            | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id          | int(10) unsigned | YES  | MUL | NULL    |       |
| language             | varchar(32)      | NO   | PRI |         |       |
| delta                | int(10) unsigned | NO   | PRI | NULL    |       |
| field_initial_value  | varchar(255)     | YES  |     | NULL    |       |
| field_initial_format | varchar(255)     | YES  | MUL | NULL    |       |
+----------------------+------------------+------+-----+---------+-------+

Base table SQL script:

CREATE TABLE `field_data_field_initial` (
`entity_type` varchar(128) NOT NULL DEFAULT '',
`bundle` varchar(128) NOT NULL DEFAULT '',
`deleted` tinyint(4) NOT NULL DEFAULT '0',
`entity_id` int(10) unsigned NOT NULL,
`revision_id` int(10) unsigned DEFAULT NULL,
`language` varchar(32) NOT NULL DEFAULT '',
`delta` int(10) unsigned NOT NULL,
`field_initial_value` varchar(255) DEFAULT NULL,
`field_initial_format` varchar(255) DEFAULT NULL,
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
KEY `entity_type` (`entity_type`),
KEY `bundle` (`bundle`),
KEY `deleted` (`deleted`),
KEY `entity_id` (`entity_id`),
KEY `revision_id` (`revision_id`),
KEY `language` (`language`),
KEY `field_initial_format` (`field_initial_format`)

2) Revision table: field_revision_field_initial

+----------------------+------------------+------+-----+---------+-------+
| Field                | Type             | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+-------+
| entity_type          | varchar(128)     | NO   | PRI |         |       |
| bundle               | varchar(128)     | NO   | MUL |         |       |
| deleted              | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id            | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id          | int(10) unsigned | NO   | PRI | NULL    |       |
| language             | varchar(32)      | NO   | PRI |         |       |
| delta                | int(10) unsigned | NO   | PRI | NULL    |       |
| field_initial_value  | varchar(255)     | YES  |     | NULL    |       |
| field_initial_format | varchar(255)     | YES  | MUL | NULL    |       |
+----------------------+------------------+------+-----+---------+-------+

Revision table SQL script:

CREATE TABLE `field_revision_field_initial` (
  `entity_type` varchar(128) NOT NULL DEFAULT '',
  `bundle` varchar(128) NOT NULL DEFAULT '',
  `deleted` tinyint(4) NOT NULL DEFAULT '0',
  `entity_id` int(10) unsigned NOT NULL,
  `revision_id` int(10) unsigned NOT NULL,
  `language` varchar(32) NOT NULL DEFAULT '',
  `delta` int(10) unsigned NOT NULL,
  `field_initial_value` varchar(255) DEFAULT NULL,
  `field_initial_format` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`entity_type`,`entity_id`,`revision_id`,`deleted`,`delta`,`language`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `language` (`language`),
  KEY `field_initial_format` (`field_initial_format`)

Here are my concerns.

1) Normalization.

Here is one of the fields' data record.

+-------------+--------+---------+-----------+-------------+----------+-------+---------------------+----------------------+
| entity_type | bundle | deleted | entity_id | revision_id | language | delta | field_initial_value | field_initial_format |
+-------------+--------+---------+-----------+-------------+----------+-------+---------------------+----------------------+
| node        | patient_profile      |       0 |      1497 |        1497 | und      |     0 | w                   | plain_text        |
+-------------+--------+---------+-----------+-------------+----------+-------+---------------------+----------------------+

We have value "W" in the Initial field. One character took 51 bytes for storage that had not included index yet. It took another 51 byte in the revision table and more for index. In this case here, only less than two percents of space are used for real data the initial 'W', and over 98% of space is for other purposes.

For the sake of space, I think we should not use varchar for entity_type, bundle, language, field_format column. Use small int, tiny int or intÎÎ that will only take one to four bytes. The field is a basic unit of a Drupal website. A medium website can hold millions of fields. Saved one byte is equal to multiple megabytes in precious MySQL database.

2) Too complicated primary key

Each field table has a complicated primary key. Base table use `entity_type`, `entity_id`, `deleted`, `delta`, `language` as primary key. Revision table use `entity_type`, `entity_id`, `revision_id`, `deleted`, `delta`, `language` as primary key. "In InnoDB, having a long PRIMARY KEY wastes a lot of disk space because its value must be stored with every secondary index record."ÎÎÎ. It may be worthy to add an auto increasing int as a primary key.

3) Not needed field column

I found bundle type column is not necessary. We can have the system running well without bundle type column. In my clinic project, I named the node type "patient profile". The machine name patient_profile appears in each field record's bundle type column. As varchar (255), it uses 16 bytes for each table record. Let do a quick calculation. if there are 100, 000 nodes and each node have 40 fields, 100,000 x 40 x 2 x 16 = 122MB are taken for this column. Or at least, we use 2 bytes small int that will take only one-eighth of the spaces.

4) Just use revision table.

Remove one of the field's data tables. It may need a little bit more query power to get field data, but it save time when we insert, update and delete field's data. By doing so, we maintain one less table per field, edit content faster. It helps to bring better editor experience and to save on database storage space.

A contributed module field_sql_leanÎÎ addressed some of the concerns here. It still needs a lot of work on itself and if we want other contributed modules compatible with it. After all, it changed the field table structure.

Reference:

1: http://dev.mysql.com/doc/refman/5.1/en/integer-types.html
2: http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html
3: Field SQL storage lean solution
4: Patient profile form:medical form

Feb 27 2015
Feb 27

Begin to design a Drupal website with a million nodes in mind. We build a Drupal website. It runs well at beginning. Until one day, the system has hundreds of thousands of node. We found the site became slow. We need wait many seconds before we can open a new page. Not only it is slow, but also sometimes we got errors like memory exhausted.

Most time the problem was existed at the beginning stage of a system. When designing a site, there are something we as a developer have to take care. We need bear in mind the site will grow and more and more nodes will come. Everytimes creating a function, we need to make sure the function will work fine when there are hundreds of thousands of nodes in the system. Otherwise, those functions may time out or finish all the memory by those ever increasing nodes in the system.

PHP have a maximum memory limit for each user. Sometimes it is 128 MB. Sometimes it is 256MB. The number is limited, and it is not infinite large for sure. There is no limit on how many nodes can exist on our website. As our system getting larger and larger with more nodes created, we will face the memory limitation sooner or later if we did not take it into consideration at the beginning.

Here is a quick sample. Drupal have a function node_load_multiple(). This function can load all nodes in the database to memory. Here are some codes from one of our contributed module.

foreach (node_load_multiple(FALSE) as $node) {
  // Modify node objects to be consistent with Revisioning being
  // uninstalled, before updating the {taxonomy_index} table accordingly.
  unset($node->revision_moderation);
  revisioning_update_taxonomy_index($node, FALSE);
}

This code is in an implementation of hook_uninstall. It will run into a problem if there are over 10,000 nodes in the system. As a result, we can not uninstall this module. Here is the error message:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 36 bytes) in ...

It used up all 256MB memory before it can load all the nodes. As the result, the module can never be uninstalled from the site.

It is an extreme case. As we troubleshooting an existing site, we may notice similar case here and there. I also notice that we can do something on the field_sql_storage module to make Drupal running faster and keep SQL database smaller.

Jan 28 2015
Jan 28

As the largest bicycling club in the country with more than 16,000 active members and a substantially larger community across the Puget Sound, Cascade Bicycle Club requires serious performance from its website. For most of the year, Cascade.org serves a modest number of web users as it furthers the organization’s mission of “improving lives through bicycling.”

But a few days each year, Cascade opens registration for its major sponsored rides, which results in a series of massive spikes in traffic. Cascade.org has in the past struggled to keep up with demand during these spikes. During the 2014 registration period for example, site traffic peaked at 1,022 concurrent users and >1,000 transactions processed within an hour. The site stayed up, but the single web server seriously struggled to stay on its feet.

In preparation for this year’s event registrations, we implemented horizontal scaling at the web server level as the next logical step forward in keeping pace with Cascade’s members. What is horizontal scaling, you might ask? Let me explain.

[Ed Note: This post gets very technical, very quickly.]

Overview

We had already set up hosting for the site in the Amazon cloud, so our job was to build out the new architecture there, including new Amazon Machine Images (AMIs) along with an Autoscale Group and Scaling Policies.

Here is a diagram of the architecture we ended up with. I’ll touch on most of these pieces below.

Cascade-scaling2

Web Servers as Cattle, Not Pets

I’m not the biggest fan of this metaphor, but it’s catchy: The fundamental mental shift when moving to automatic scaling is to stop thinking of the servers as named and coddled pets, but rather as identical and ephemeral cogs–a herd of cattle, if you will.

In our case, multiple web server instances are running at a given time, and more may be added or taken away automatically at any given time. We don’t know their IP addresses or hostnames without looking them up (which we can do either via the AWS console, or via AWS CLI — a very handy tool for managing AWS services from the command line).

The load balancer is configured to enable connection draining. When the autoscaling group triggers an instance removal, the load balancer will stop sending new traffic, but will finish serving any requests in progress before the instance is destroyed. This, coupled with sticky sessions, helps alleviate concerns about disrupting transactions in progress.

The AMI for the “cattle” web servers (3) is similar to our old single-server configuration, running Nginx and PHP tuned for Drupal. It’s actually a bit smaller of an instance size than the old server, though — since additional servers are automatically thrown into the application as needed based on load on the existing servers — and has some additional configuration that I’ll discuss below.

As you can see in the diagram, we still have many “pets” too. In addition to the surrounding infrastructure like our code repository (8) and continuous integration (7) servers, at AWS we have a “utility” server (9) used for hosting our development environment and some of our supporting scripts, as well as a single RDS instance (4) and a single EC2 instance used as a Memcache and Solr server (6). We also have an S3 instance for managing our static files (5) — more on that later.

Handling Mail

One potential whammy we caught late in the process was handling mail sent from the application. Since the IP of the given web server instance from which mail is sent will not match the SPF record for the domain (IP addresses authorized to send mail), the mail could be flagged as spam or mail from the domain could be blacklisted.

We were already running Mandrill for Drupal’s transactional mail, so to avoid this problem, we configured our web server AMI to have Postfix route all mail through the Mandrill service. Amazon Simple Email Service could also have been used for this purpose.

Static File Management

With our infrastructure in place, the main change at the application level is the way Drupal interacts with the file system. With multiple web servers, we can no longer read and write from the local file system for managing static files like images and other assets uploaded by site editors. A content delivery network or networked file system share lets us offload static files from the local file system to a centralized resource.

In our case, we used Drupal’s S3 File System module to manage our static files in an Amazon S3 bucket. S3FS adds a new “Amazon Simple Storage Service” file system option and stream wrapper. Core and contributed modules, as well as file fields, are configured to use this file system. The AWS CLI provided an easy way to initially transfer static files to the S3 bucket, and iteratively synch new files to the bucket as we tested and proceeded towards launch of the new system.

In addition to static files, special care has to be taken with aggregated CSS and Javascript files. Drupal’s core aggregation can’t be used, as it will write the aggregated files to the local file system. Options (which we’re still investigating) include a combination of contributed modules (Advanced CSS/JS Aggregation + CDN seems like it might do the trick), or Grunt tasks to do the aggregation outside of Drupal during application build (as described in Justin Slattery’s excellent write-up).

In the case of Cascade, we also had to deal with complications from CiviCRM, which stubbornly wants to write to the local file system. Thankfully, these are primarily cache files that Civi doesn’t mind duplicating across webservers.

Drush & Cron

We want a stable, centralized host from which to run cron jobs (which we obviously don’t want to execute on each server) and Drush commands, so one of our “pets” is a small EC2 instance that we maintain for this purpose, along with a few other administrative tasks.

Drush commands can be run against the application from anywhere via Drush aliases, which requires knowing the hostname of one of the running server instances. This can be achieved most easily by using AWS CLI. Something like the bash command below will return the running instances (where ‘webpool’ is an arbitrary tag assigned to our autoscaling group):

[[email protected] ~]$aws ec2 describe-instances --filters "Name=tag-key, Values=webpool" |grep ^INSTANCE |awk '{print $14}'|grep 'compute.amazonaws.com'

We wrote a simple bash script, update-alias.sh, to update the ‘remote-host’ value in our Drush alias file with the hostname of the last running server instance.

Our cron jobs execute update-alias.sh, and then the application (both Drupal and CiviCRM) cron jobs.

Deployment and Scaling Workflows

Our webserver AMI includes a script, bootstraph.sh, that either builds the application from scratch — cloning the code repository, creating placeholder directories, symlinking to environment-specific settings files — or updates the application if it already exists — updating the code repository and doing some cleanup.

A separate script, deploy-to-autoscale.sh, collects all of the running instances similar to update-alias.sh as described above, and executes bootstrap.sh on each instance.

With those two utilities, our continuous integration/deployment process is straightforward. When code changes are pushed to our Git repository, we trigger a job on our Jenkins server that essentially just executes deploy-to-autoscale.sh. We run update-alias.sh to update our Drush alias, clear the application cache via Drush, tag our repository with the Jenkins build ID, and we’re done.

For the autoscaling itself, our current policy is to spin up two new server instances when CPU utilization across the pool of instances reaches 75% for 90 seconds or more. New server instances simply run bootstrap.sh to provision the application before they’re added to the webserver pool.

There’s a 300-second grace time between additional autoscale operations to prevent a stampede of new cattle. Machines are destroyed when CPU usage falls beneath 20% across the pool. They’re removed one at a time for a more gradual decrease in capacity than the swift ramp-up that fits the profile of traffic.

More Butts on Bikes

With this new architecture, we’ve taken a huge step toward one of Cascade’s overarching goals: getting “more butts on bikes”! We’re still tuning and tweaking a bit, but the application has handled this year’s registration period flawlessly so far, and Cascade is confident in its ability to handle the expected — and unexpected — traffic spikes in the future.

Our performant web application for Cascade Bicycle Club means an easier registration process, leaving them to focus on what really matters: improving lives through bicycling.

Previous Post

2015 Digital Trends for Influence

Next Post

Communicating Data for Impact takes Seattle

Jan 08 2015
Jan 08

When we talk about the performance of Drupal, the first thing come to my mind is caching. But today I found another way to make Drupal run a little bit faster. It is not a profound thing, but something may be ignored by many. In work, I need process 56916 records constantly with automated Cron process. It took 13 minutes 30 seconds to process all those records. Adding a new database field index, I reduced the processing time to one minute 33 seconds only. It is more than eight times faster.

Here is the detail. I have about fifty thousand of record that updated daily. Each record I had a hash created and stored in a field. Whenever inserting or updating a record, and I would check and see if this hash code existed in the database. The project requires searching on the field revision table. Here is the code in my custom module.

$exist = db_query("SELECT EXISTS(Select entity_id from {field_revision_field_version_hash} where field_version_hash_value = :hash)", array(':hash' => $hash))->fetchField();
// Return when we had imported the schedule item before.
if ($exist) {
  return;
}

So, checking the hash code in the database became one of the heavy operations. It consumed a lot of system resource. By adding a single query to the field revision table make the process eight times faster. Here is the code I put in the module install file.

// Add version-hash indexes.
if (!db_index_exists('field_revision_field_version_hash', 'version_hash')) {
  db_add_index('field_revision_field_version_hash', 'version_hash', array('field_version_hash_value'));
}

When we build a Drupal website, we are not dealing with database directly. Even though Drupal creates the tables for us, we can still alter the table and make it better.

Apr 22 2012
Apr 22

We had a site for a client that was stable for close to two years, then suddenly started to experience switches from the master to the geographically separate slave server as frequently as twice a week.

The site is an entertainment news site, and its articles get to Google News on occasions.

The symptoms was increased load on the server, a sudden influx of traffic causing over 800 simultaneous connections all in the ESTABLISHED state.

Normally, a well tuned Drupal site can withstand this influx, with server optimization and proper caching. But for this previously stable site, we found that a combination of factors, some internal to the sites, and the other external, participated to cause the site to switch.

The internal factor was the way the site was setup using purl, and other code around. The links of a URL changed to add a top level section, which redirected to the real URL. This caused around 30% of accesses to the URLs to cause a 302 redirect. Since redirects are not cached, they incurred more overhead than regularly served pages.

Investigating the root cause

We started checking if there is a pattern, and went back to analyse the server logs as far back as a year.

We used the ever helpful Go Access tool to do most of the investigative work.

A week in April 2011, had 28% redirects, but we found an anomaly of the browser share over the months. For that same April week, the browser breakdown are 34% MSIE, 21% Safari and 21% Firefox.

A week in Sep 2011, redirects are 30%, browsers are 26% Safari, 25% MSIE and 20% Firefox. These make sense as Safari gains more market share and Microsoft loses market share.

But when checking a week in Feb 2012, redirects are 32%, but look at the browsers: 46% Firefox, 16% Safari, 14% Others and 12% MSIE

It does not make sense for Firefox to jump by that much and gain market share from thin air.

A partial week in March 2012, shows that redirects are 32%, and again, the browsers are 52% Firefox, 14% Others, 13% Safari and 10% MSIE.

That MSIE dropped is something that one can understand. But the jump in Firefox from Sep to Feb/March is unjustified, and tells us that perhaps there are crawlers, scrappers, leachers or something else masking as Firefox and hitting our content.

Digging deeper, we find that the top 2 Firefox versions are:

27,092 Firefox/10.0.2
180,420 Firefox/3.0.10

The first one is understandable, a current version of Firefox. The second one is a very old version from 2009, and has 6.6X the traffic of the current version!

The signature for the user agent is all like so, with a 2009 build:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)

We went back and looked at a week in September (all hours of the day), with that browser signature, and lo and behold:

Unique visitors that suck lots of bandwidth:

  88      10.49%  24/Sep/2011  207.76 MB
  113     13.47%  23/Sep/2011  994.44 MB
  109     12.99%  22/Sep/2011    1.44 GB
  133     15.85%  21/Sep/2011    1.70 GB
  134     15.97%  20/Sep/2011    1.68 GB

There were only 335 different IP addresses!

But look at the same user agent in March for a week:

   94479  38.36%  15/Mar/2012   16.38 GB
  102037  41.43%  14/Mar/2012   17.13 GB
   38795  15.75%  13/Mar/2012   12.48 GB
   11003   4.47%  12/Mar/2012   10.90 GB

See the number of unique visitors compared to September?
And now there are 206,225 different IP addresses!

For a few days in March, Monday to Thursday, here are the figures for this user agent.

Total requests to pages (excluding static file): 1,122,229
Total requests that have an empty referer: 1,120,843
That is, 99.88% are from those botnets!

Verifying the hypothesis

Looking at the web server logs through awstats, we found that a year ago, Feb 2011, the market share for Firefox overall was 24.7% with 16,559,999. And at that time, Firefox 3.0.10 had only 44,436

That is 0.002 % of the total.

In Sep 2011 it had 0.2% with 241,869 hits.

Then in Feb 2011, that old version from 2009 have 2.2% share of hits, with 4,409,396 hits.

So, from 0.002% to 2.2% of total, for an obsolete version of Firefox. This means growth by a factor of 1,100 X in one year.

Does not make sense.

Botnet hammering the site

So, what does this tell us?

Looking at a sample of the IP addresses, we found that they all belong to Cable or DSL companies, mainly in the USA.

This tells us that there is a massive botnets that infect lots of PCs.

They were piloting the botnet in September and went full speed after that, and they are hitting the server hard.

The programs of the botnet seem to have a bug in them that prevent them from coordinating with each other, and they all try to grab new content at the same time. This poor coding causes the sudden influx of traffic that brings the server to its knees, combined with the non-caching of 302 redirects.

Just to make sure, we checked two other sites that we manage quickly for the same symptoms. One entertainment site is showing similar signs, the other, a financial sites is not showing the signs. Both have good caching because of no redirects (97% to 98% return code of 200), and that is why the entertainment site can stand the onslaught.

Solution: block the botnet's user agent

Since the botnet is coming from hundreds of thousands IP addresses, it is not possible to block based on the IP address alone.

Therefore, the solution was to block requests coming with that browser signature from 2009 only, and only when there is no referer.

This solution, that goes into settings.php, prevents Drupal from fully booting when a bad browser signature is encountered and the referer is empty.

We intentionally sent the humorous, but still legitimate, 418 HTTP return code so we can filter by that when analysing logs.

$botnet = 'Gecko/2009042316 Firefox/3.0.10';
if ($_SERVER['HTTP_REFERER'] == '') {
  if (FALSE !== strpos($_SERVER['HTTP_USER_AGENT'], $botnet) {
    header("HTTP/1.0 418 I'm a teapot");
    exit();
  }
}

The above should work in most cases.

However, a better solution is to keep the changes at the Apache level and never bother with executing any PHP code if the conditions are met.

# Fix for botnet crawlers, by 2bits.com, Inc.
#
# Referer is empty
RewriteCond  %{HTTP_REFERER}    ^$
# User agent is bogus old browser
RewriteCond  %{HTTP_USER_AGENT} "Gecko/2009042316 Firefox/3.0.10"
# Forbid the request
RewriteRule  ^(.*)$ - [F,L]

The drawback is that we are using a 403 (access denied) instead of the 418 (I am a teapot), which can skew the statistics a bit in the web server logs.

Further reading

After investigating and solving this problem, I discussed the issue with a friend who manages several high traffic sites that are non-Drupal, and at the time, he did not see the same symptoms. However, a few weeks later he started seeing the same symptoms, and sent me the first two articles. Months later, I saw the third:

Apr 17 2012
Apr 17

A lot of very interesting things are happening to make Drupal's caching system a bit smarter. One of my favorite recent (albeit smaller) developments is a patch (http://drupal.org/node/1471200) for the Views module that allows for cached views to have no expiration date. This means that the view will remain in the cache until it is explicitly removed.

Before this patch landed, developers were forced to set an arbitrary time limit for how long Views would store the cached content. So even if your view's content only changed every six months, you had to choose a time limit from a list of those predefined by Views, the maximum of which was 6 days. Every six days, the view content would be flushed and regenerated, regardless of whether its contents had actually changed or not.

The functionality provided by this patch opens the door for some really powerful behavior. Say, for instance, that I have a fairly standard blog view. Since I publish blog posts somewhat infrequently, I would only like to clear this view's cache when a new blog post is created, updated, or deleted.

To set up the view to cache indefinitely, click on the "Caching" settings in your view and select "Time-based" from the pop-up.

Then, in the Caching settings form that follows, set the length of time to "Custom" and enter "0" in the "Seconds" field. You can do the same for the "Rendered output" settings if you'd like to also cache the rendered output of the view.

Once you save your view, you should be all set.

Next, we need to manually invalidate the cached view whenever its content changes. There are a couple different ways to do this depending on what sort of content is included in the view (including both of the modules linked to above). In this case, I'll keep it lightweight and act on hooks in a custom module:

/**
* Implements hook_node_insert().
*/
function MY_MODULE_node_insert($node) {
  if ($node->type == 'blog') {
    cache_clear_all('blog:', 'cache_views', TRUE); 
  }
}...Same for hook_node_update() and hook_node_delete()...

And just like that, my view is only regenerated when it needs to be, and should be blazing fast in between.

The patch was committed to the 7.x-3.x branch of Views on March 31, 2012, so for now you will have to manually apply the patch until it is released in the next point release.

Happy caching!

Mar 09 2012
Mar 09

The Gateway to 21st Century Skills (www.thegateway.org) is a semantic web enabled digital library that contains thousands of educational resources and as one of the oldest digital libraries on the web, it serves educators in 178 countries. Since 1996, educational activities, lesson plans, online projects, and assessment items have been contributed and vetted by over 700 quality organizations.

Given their rich pedigree, the site serves over 100,000 resources each month to educators worldwide. Since 2005, the Gateway has been managed by JES & Co., a 501(c)(3) non-profit educational organization. The original site was built on Plone several years ago. In recent years the constraints of the old site proved too great for the quality and quantity of content, and the needs of its increasingly engaged readership. It was becoming difficult and expensive to manage and update in its current configuration.


JES & Co., as an organization with a history of embracing innovation, decided to move the Gateway onto Drupal and looked to 10jumps to make the transition happen. The site had to be reliable with very high up time. Moreover the site would have to be able to handle the millions of hits without batting an eyelid. And most importantly, the faceted search would have to work well with the semantically described records. Based on the requirements, Acquia’s managed cloud seemed like the best approach. It can help a site scale across multiple servers and Acquia provides high-availability with full fail-over support.
“If something does go down, we know that Acquia 24x7 support has our back” - 10jumps


How they did it


There were several hosting options, but very few that met the requirements for the Gateway. And definitely none that made the development-testing-production migration seamless and easy. Usually there are too many manual steps raising the chances of error.

After a few rounds of technology and support evaluation calls, Acquia was retained to provide hosting and site support. A good support package, combined the expertise of Acquia’s support team was a compelling reason to make the move. The technical team at 10jumps was also fairly confident that the move would be a good choice for their customer – the Gateway, freeing them to focus on the site development. With Acquia’s Managed Cloud and the self-service model, code, local files and database can be migrated between development, testing and production systems literally with mouse clicks. With the seamless migration, the development cycle became shorter and with Git in place, collaboration between developers became easier. Moreover caching for anonymous content was provided out of the box and the 10jumps developers did not have to navigate tricky cache settings. Moving the developers to the new platform was the first step and soon the team was on an agile development track, being able to develop and roll out features quickly.

The result

After the new site went live, we were certain that

TheGateway.org would not be effected by traffic spikes, nor would the site be down because of a data center outage. More importantly, the semantically described data could be searched more efficiently because of the integration with Apache’s Solr search that comes from being in the Acquia cloud.
The development life cycle had gone from being clunky and broken to being smooth and agile. The redesigned site makes it simpler for the end users to navigate through large amounts of data and the powerful search is returning better results - improving overall user experience.

Feb 11 2012
Feb 11
Beer and developer conferences go hand in hand.

A few weeks ago I presented “CDNs made simple fast and cheap” at the Drupal Downunder conference in Melbourne Australia.

The talk covered:

  • the importance of good client side performance,
  • how A CDN works,
  • recommended CDN providers (from an Australian’s perspective),
  • a demonstration of how to set up a CDN and
  • a summary of the results (better YSlow score and page download times).


Setting up a CDN is very easy to do and cost effective. If you want you users to have the best online experience then there is nothing stopping you!

The CDN presentation is available as PDF slides and a video.

Thanks to my employer PreviousNext who kindly sponsored my trip to Melbourne. Hats off to Wim Leers for contributing the CDN module.

[embedded content]

Be Sociable, Share!
Dec 23 2011
Dec 23

About thegateway.org:

The Gateway has been serving teachers continuously since 1996 which makes it one of the oldest publically accessible U.S. repositories of education resources on the Web. The Gateway contains a variety of educational resource types from activities and lesson plans to online projects to assessment items.

The older version of the website was on plone. The team hired us to migrate it to Drupal. It was an absolutely right choice to make. Given that, with Drupal comes a lot more benefits.

We redesigned the existing website giving it a new look and on Drupal. Then we hosted it on Acquia managed could to boost its performance and scalability. The new look is more compact, organized and easier to use.

It was a very interesting project for us and our team is proud to be a part of such a great educational organization serving the nation.

Looking forward to a grand success of the new launch!

thegateway.org BEFORE:

 

thegateway.org NOW:

Sep 20 2011
Sep 20

Drupal has a presence problem when it comes to front end performance. Drupal has for the most part ignored front end performance. According to a study by Strangeloop, 97% of the time it takes a mobile page to render is in the front end. For desktop browser the front end makes up 85% of the time. These numbers may feel high. But, when pages take 500ms to render in Drupal but 6 seconds to display in an end users browser you can see where this comes from.

The presence problem for Drupal can be seen in several places:

  1. A the past few DrupalCons how many sessions have touched on front end performance? I can only recall one of them while there have been many covering memcache, apc, and other server side technologies.
  2. Take a look at the documentation pages on profiling Drupal. Or, search for documentation pages on performance. You'll find discussions about apache benchmark, learn about varnish, etc. You won't learn about font end performance.
  3. Drupal doesn't provide minified JavaScript. For production environments this is considered a standard practice.
  4. The Drupal 8 development "gates" RFC gives 1 of 6 performance items to font end performance. The other 5 are tips/gates in detail for back end issues we've commonly run into. The front end one is a basic one liner.

Front end performance is a big deal. This is more so true as we enter into the dominance of mobile where mobile devices are low powered and on high latency networks.

Pointing out problems is no good without solutions. The problem is in the amount of face time front end performance gets. So, lets get it some face time.

  • At DrupalCamps let's start presenting on it.
  • Drupal companies could benefit from having someone knowledge in house. Come up with ways to add it to your expertise. Maybe hold a book club and discuss a Steve Souders book.
  • When we learn about useful tools like ImageOptim or Sprite Cow lets share them.
  • If you see a contrib module serving JavaScript up that has not been minified file a patch. You can use UglifyJS easily through the web. UglifyJS is what jQuery uses.

Front end performance is a big deal. It's the largest part of the performance equation an end user experiences. Companies have done studies showing the financial and usage impact of end user performance. Let's elevate front end performance to the place it needs to be in the Drupal community.

––
Posted in:

Aug 20 2010
Aug 20

on 20 August, 2010

In this article we will talk through setting up a simple load testing scenario for Drupal applications using Amazon’s Elastic Cloud computing. Amazon EC2 will enable you to easily set up testing scenarios for a relatively low cost, e.g. you can find out what the effect of adding an additional database server will make without actually buying one! JMeter will allow us to create complex test plans to measure the effect of our optimisations, we'll set up a remote JMeter load generator on EC2 that we'll control from our desktop.

Improving Drupal's performance is beyond the scope of this article, but we'll talk more about that in future. If you need some suggestions now then check out the resources section for links to good Drupal optimisation articles.

Setting up your test site on EC2

If you don’t already have an account then you’ll need to sign up for Amazon Web Services. It’s all rather space-age and if you haven’t been through the process before then it can be a bit confusing. We want to set up a working copy of our site to test on EC2, so once you have your AWS account, the process goes something like this:

  • Select an AMI (Amazon Machine Image) that matches your production environment - we use alestic.com as a good source of Debian and Ubuntu AMIs.

  • Create a high-CPU instance running your AMI. Small-CPU instances only have one virtual CPU, which can be stolen by other VMs running on the same physical hardware, which can seriously skew your results when running a test. There is always going to be a certain amount of variance in the actual CPU time available to your AMI, since it’s always going to sharing the physical hardware, but we find that high-CPU instances tend to reduce the CPU contention issues to a reasonable level.

  • Give your instance an elastic IP, which is Amazon's term for a static IP that you can use to connect to it.

  • Ssh into the machine, you’ll need to make sure that port 80 and 22 are open in the security group, and set up a keypair. Download the private key and use that when connecting, the simplest way is to do:

ssh -i /path/to/your/private/key.pem [email protected] 
  • Install the LAMP server packages you require, try to mirror the production environment as closely as possible. A typical LAMP server can be installed on Debian/Ubuntu by running:
apt-get  install apache2 php5 php5-mysql php5-gd mysql-server php5-curl 
  • Now you need to set up a copy of the site you want to test on your new server. EC2 instances give you a certain amount of ephemeral storage, which will be destroyed when the AMI is terminated, but will persist between reboots - this can be found at /mnt. If you want to terminate your AMI but may need the test sites that you are going to create again, it's a good idea to back up /mnt to Amazon S3.

  • We will create two copies of the site, one called “control” and another called “optimised”. Give them each their own virtual host definition and make sure that they each point to their own copy of the database. “Control” should be left alone, we’ll use this version to get baseline statistics for each test plan. We’ll tweak and tune “optimised” to improve the performance, and compare our results with “control”. Give each of the sites an obvious subdomain so that we can connect to them easily without getting confused. You should end up with two copies of your site set up on /mnt, with separate domains and dbs, something like this:

http://foo-control.bar.com   -> /mnt/sites/foo/control/   -> DB = foo_control
http://foo-optimised.bar.com -> /mnt/sites/foo/optimised/ -> DB = foo_optimised

Setting up JMeter to generate load

We don't want fluctuating network bandwidth to affect our results, so it's best to run a JMeterEngine on a separate EC2 instance and control that from JMeter running on our local machine. First we'll get JMeter generating load from our local machine, then we'll set up a remote JMeterEngine on EC2.

  • First download JMeter, you'll also need a recent copy of the Java JVM running. On OS X, I moved the downloaded JMeter package to Applications, and then ran it by executing bin/jmeter.

  • If you're new to Jmeter, you can download some sample JMeter test plans for stress testing a Drupal site from the nice guys at Pantheon. Or just create your own simple plan and point it at your test server on EC2.

  • Now we have a basic test plan in place, we should spin up another EC2 instance that we'll use to generate the load on our test server. This will provide more reliable results as we're removing our local network bandwidth from the equation. We'll still use our local JMeter to control the remote load generator. We used a prebuilt AMI that comes with Ubuntu and JMeter already installed. JMeter has some good documentation on how to tell your local JMeter to connect to the remote machine, in essence you need to add the remote machine's IP address to your local jmeter.properties file.

  • You'll need to open a port on EC2 for JMeter to talk to the remote engine, add 9090 TCP to your security group that the AMI is running under.

  • We found that JMeter started to hang when we increased the amount of data being transferred in our test plans. Tailing jmeter.log told us that we were running out of memory, increasing the heap size available solved this.

  • Test, test, and more tests. It's important to repeat your tests to make sure you're getting reliable results. It's also important to know that your tests are representative of average user behaviour, it's possible to set up JMeter as a proxy that will capture your browsing activity and replay that as a test. It's also possible to replay apache logs as test plans.

Resources

May 10 2010
May 10

In environments where there are many databases running on the same machine (ex. shared hosting), or in high traffic environments (ex. enterprise sites) it is a common problem that unterminated connections to the database linger around indefinitely until MySQL starts spitting out the "Too many connections" error. The fix for this is decrease the wait_timeout from the default 8hrs to something more in the range of 1-3 minutes. Make this change in your my.cnf file. This means that the MySQL server will terminate any connections that have been sitting around doing nothing for 1-3 minutes.

But this can lead to problems on the other side where now MySQL is terminating connections that are just idle, but will be called on to do something. This results in the "MySQL has gone away" error. This problem is unlikely to happen with stock Drupal, but is much more frequent with CiviCRM (or any other time where you connect to more than one database). The issue being that sometimes an intensive process will happen in one database, then action needs to return to the other database, but oops MySQL has terminated that connection. This is most likely to happen on anything that takes a long time like cron, contact imports, deduping, etc.

There's a little known trick with changing wait_timeout on the fly. You can do this because wait_timeout is both a global and per-session variable. Meaning each connection uses the global wait_timeout value, but can be changed at any time affecting only the current connection. You can use this little function to do this:

<?php
/**
* Increase the MySQL wait_timeout.
*
* Use this if you are running into "MySQL has gone away" errors.  These can happen especially
* during cron and anything else that takes more than 90 seconds.
*/
function my_module_wait_timeout() {
  global
$db_type, $db_url;
 
 
watchdog('my_module', 'Increasing MySQL wait timeout.', array(), WATCHDOG_INFO);
  if (
is_array($db_url)) {
   
$current_db = db_set_active();
    foreach (
$db_url as $db => $connection_string) {
     
db_set_active($db);
     
db_query('SET SESSION wait_timeout = 900');
    }
    if (
$current_db) {
     
db_set_active($current_db);
    }
  }
  else {
   
db_query('SET SESSION wait_timeout = 900');
  }
 
  if (
module_exists('civicrm')) {
   
civicrm_initialize();
    require_once(
'CRM/Core/DAO.php');
   
CRM_Core_DAO::executeQuery('SET SESSION wait_timeout = 900', CRM_Core_DAO::$_nullArray);
  }

}

?>

Then call this function before anything that might take a long time begins.

There's also an issue in the CiviCRM issue queue to make CiviCRM do this before any of it's long internal operations.

Feb 12 2010
Feb 12

Caching is one of the common ways of improving the performance of a website. Caching aims to reduce the number of trips made to the database by storing the snapshot of the results in a location (like database or file structure or memory) from where it can be retrieved faster the next time. Caching works best for information that do not change often and/or frequently consumed and/or expensive to process. Periodic maintenance need be done on the cached information so that the website users only get the latest information and not 'stale' information. During development one of the most common frustrations is not seeing the latest changes that have been made, because the webpage information is retrieved from the cache that has old information.

Drupal's File-based Cache: Drupal provides a way to consolidate all the css and javascript files into fewer files. This is very useful for pages where many javascript and css files are used to render. Instead of having many roundtrips downloading each of those files, a consolidated file would reduce the roundtrips and can decrease the page loading time significantly. The setting for optimizing the css and javascript files can be found at Administer->Site Configuration->Performance. If “Optimize CSS files” is enabled the css files are compressed by removing the whitespace and line breaks and stored in the “css” directory within the “files” directory as set on the file system settings page. If “Optimize JavaScript files” is enabled the javascript files (without compression) are stored in the “js” directory within the “files” directory. The “Download method” in the file system settings page has to be set as “Public” for these options to be available. Your browser may not support display of this image. It is better to turn on these options only in the production environment as it can interfere with the theme and module development. Also if you are running a load-balancer along with two or more servers please make sure that the cached javascript and css files are available and identical on all the servers.Drupal's Database-based Cache: Drupal comes with a nice cache mechanism that is used by the Drupal core. Drupal exposes this as an api called Cache api that can be used by developers to add caching to their own modules. The Cache api by default stores the information to be cached in a table called 'cache'. If every module uses the same table for caching, the table can grow exponentially leading to increase in overhead rather than reducing it. Hence it is better to have several cache tables to store your information. If needed module developers can add their own cache table, although it should be identical in structure to the default cache table. It is also a good practice to prepend cache_ to your own cache table. Drupal core comes with seven cache tables.

  1. cache: An “all purpose” table that can be used to store a few rows of cached data. Drupal uses this table to store the following:
    Variable data: The table “variable” stores most of the administrative settings. These settings are added to a PHP array(), serialized and stored as a single row in the cache table. This way the settings can be retrieved quickly and avoids making multiple database queries. All variables that uses variable_get() and variable_set() are cached in the cache table.
    Theme registry data: Each of the themes registries are cached in the cache table.

Schema data: Information about the structure of all the tables in the database is cached in the cache table.

    2. cache_block:
    Content generated by the blocks are cached in this table. This reduces Drupal from querying the database repeatedly to get the block contents. The block developer can choose based on the content displayed in the block, whether the block can be cached or not. If the developer decides to cache the block content, he/she has four ways of caching the block content.
      1. Cache block content separately for each available role combination
      2. Cache block content separately for each user
      3. Cache block content separately for each page
      4. Cache block content once for all the users
    Drupal, in addition to above caches block content separately for each theme and for each language supported.
    Block caching can be changed at Administer->Site Configuration->Performance. Also if any of the content access modules that restrict access to different kinds of content are enabled, then block caching is automatically disabled.

3. cache_filter:

    Filters are used to filter node content to make it more secure and consumable. This is a very expensive operation to perform every time a node is viewed. Filters are applied to the node content when they are saved or created, and the results are cached in the cache_filter table.

4. cache_form:

    The form API uses this table to cache the generated form. This saves Drupal from regenerating the unchanged forms.
    5. cache_menu:
    The menu system caches all created menus along with its hierarchies and path information in this table. This saves Drupal from having to regenerate the menu information for each page load. Menus are cached separately for each user and for each language.
    6. cache_update:
    Information about all the installed modules and themes are cached in this table.
    7. cache_page:
    The page caching is one of the important optimization that can truly improve the performance of a heavily used website. Entire page views are cached in the cache_page table. Drupal's full page caching only caches anonymous user pages. It does not cache authenticated user pages because they are often customized for the user and so they are less effective. This saves Drupal from making expensive calls to generate the page repeatedly, instead the cached page content can be retrieved in a single query. Administer->Site Configuration->Performance has several settings that affect the page caching.
    Caching mode:
    Disabled: This disables the page caching although other types of caching would still continue.
    Normal: This mode offers a substantial improvement over the “Disabled” mode. Drupal’s bootstrapping is designed such a way that, only the minimum required amount of code and queries are executed to render a page from the cache. Even with this minimum code, the database system is initialized, access and sessions are initialized, module system is initialized and hooks - hook_boot and hook_exit are called on the modules that implemented them. Only then the cached page is rendered.
    Aggressive: This mode skips the initialization of the module system and so hook functions are never called. This shaves valuable time each time an anonymous user requests a page. But this means that the modules that have implemented the hooks may not work properly. Drupal warns (see image) by listing all the modules that may not function properly if the Aggressive mode is enabled.
    Minimum cache lifetime:
    This sets the minimum lifetime of a cached content. If a user changes content, other users would have wait until this expires, to see the changed content. If it is set to “none”, then there is no wait, and all users can see the latest content immediately.
    Page compression:
    If enabled the page contents would be compressed before caching it. This saves bandwidth and improves the download times. However if the webserver already performs the compression, then this should be disabled. In Apache server you can use module mod_deflate to turn on the compression. IMHO it is better to use this functionality (may be enhance it with modules like css_gzip or javascript_aggregator) rather than mod_deflate because the latter do not have any caching.

Note: Even if the Page cache and/or Block cache are disabled, other types of caching like menus, filters, etc. would still continue to happen and they cannot be disabled. Clear cached data: In the performance page at Administer->Site Configuration->Performance, you can use the “Clear cached data” to clear ALL of the caches in the system including the css and javascript optimizations. If you have your own cache table, you can use the hook_flush_caches() to clear the cache when the “Clear cached data” is executed. Pluggable Cache: Drupal provides a way to plug in a customized caching solution such as memory or file-based or hybrid (memory and database for fail-safe) caching. There are 2 levels of plugging: 1. Solutions that uses the Drupal cache API, but stores information in a customized manner (like memory) instead of the Drupal 'cache*' tables.

    Example: memcache module
    2. Solutions that provide their own cache API implementation, together with customized storage of information. Essentially complete Drupal cache system is bypassed. This is called the 'fastpath' mode (or is it page_fast_cache?). When a cached page is rendered using this technique most of the Drupal bootstrap technique are skipped, and hence the Drupal statistics would not be updated. So for statistics like page views, Drupal statistics may not be accurate.
    Example: cacherouter module

Drupal's modules: There are many contributed drupal modules that extend the Drupal cache, or provides integration with an external caching solution. Some of them even work for some parts of the authenticated user pages. It is certainly possible to combine many contributed modules to get better performance. Here are some of them with a very brief summary (in no particular order): Memcache: This module lets you use a memcache server to do the caching. The nice thing is it provides 2 types of caching, one memory only, and another memory or database. The latter approach certainly has lesser performance improvement than the first, but it gives a failsafe mechanism where if the memcache is not available, it would use the database for caching. Authcache: This module can cache authenticated user pages if the pages are same for a particular user role. This module can be combined with Memcache or Cacherouter to have a customized caching. Advcache: This module extends the caching to areas that the Drupal core does not cover. The main advantage is for authenticated non admin users that have a single role. This module can be combined with Memcache or Cacherouter to have a customized caching. Cacheexclude: You can use this to exclude certain anonymous user pages from being cached. Cacherouter: This module allows users to have different caching technology for different cache tables. It has native support to many caching technologies like APC, Memcache, XCache, even database and file system. Boost: Another external caching mechanism for mostly anonymous users. Cache: Similar to Cacherouter, provide mechanism to use different caching technologies.

References: Pro Drupal Development, Second Edition by John K. VanDyk

Feb 08 2010
Feb 08
To do list

Time to revisit the different types of Drupal sites to see where gains can be made. What type of site do you have? This quick reference recaps the previous articles and lists the areas where different types of Drupal sites can improve performance.

All Sites

  • Get the best server for your budget and requirements.
  • Enable CSS and JS optimization in Drupal
  • Enable compression in Drupal
  • Enable Drupal page cache and consider Boost
  • Install APC if available
  • Ensure no slow queries from rouge modules
  • Tune MySQL for decent query cache and key buffer
  • Optimize file size where possible

Server: Low resources

  • Boost stops PHP load and Bootstrap
  • Sensible module selection
  • Avoid node load in views lists
  • Smaller JVMs possibly if running Solr
  • Nginx smaller than Apache
  • mod_fcgid has smaller footprint over mod_php

Server: Farm

  • Split off Solr
  • Split off DB server, watch the latency
  • With Cache Router select Memcache over APC for shared pools
  • Master + slaves for DB
  • Load balancing across web servers

Size: Many Nodes

  • Buy more RAM for database indexes
  • Index columns, especially for views
  • Thoroughly check slow queries
  • Warm up database
  • Swap in Solr for search
  • Solr to handle taxonomy pages

Activity: Many requests

  • Boost or
  • Pressflow and Varnish
  • Nginx over Apache
  • InnoDB on cache tables

Users: Mainly logged in

  • View/Block caching
  • CacheRouter (APC or Memcache)

Contention: Many Writes

  • InnoDB
  • Watchdog to file

Content: Heavy

  • Optimized files
  • Well positioned server
  • CDN

Functionality: Rich

  • Well behaved modules
  • Not too many modules
  • View/Block caching

Page browsing: Dispersed

  • Boost over Varnish if RAM is tight

Audience: Dispersed

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.

Be Sociable, Share!
Feb 07 2010
Feb 07
Slow

The time for a page to render in a user’s browser is comprised of two factors. The first is the time it takes to build a page on the server. The second is the time it takes to send and render the page with all the contained components. This guide has mainly been concerned with the former – how to get the most from your server, however, it is estimated that 80% to 90% of page rendering time is taken up during the rendering phase.

It’s no good to serve a cached page in the blink of an eye if there are countless included files which need to be requested and many large images which need to be transported across the globe. Optimizing page rendering time can make a noticeable difference to the user and is the cream on the cake of a well optimized site. It is therefore important to consider and optimize this final leg of the journey.

Improving Drupal’s page loading performance Wim Leers covers all the bases on how to improve loading performance. High Performance Web Sites: Essential Knowledge for Front-End Engineers Steve Souders, Chief Performance Yahoo! and author of YSlow extension, covers the Yahoo recommedations in this book. High Even Faster Web Sites: Performance Best Practices for Web Developers Another Steve Souders book covering Javascript (AJAX), Network (Image compression, chuncked encoding) and browser (CSS selectors, etc).

It is worthwhile reviewing Yahoo’s YSlow recommendations to see all of the optimizations which are possible. We cover selected areas where the default Drupal install can be improved upon.

Combined Files

The Out of The Box section covered the inbuilt CSS and JS aggregation and file compression. The use of “combined files” is a significant factor in Drupal’s relatively good score in the YSlow tests. Make sure you have this enabled.

All sites: Enable CSS and JS aggregation.

CSS Sprites

CSS Image Sprites are another method of cutting down the number of requests. This approach combines a number of smaller images into one large one which is then selectively displayed to the user through the use of background offset in CSS. It is a useful approach for thing such as small icons which can have a relatively large amount of HTTP overhead for each request. Something for the theme designers to consider.

Custom designs: Use CSS sprites if appropriate.

CSS Sprites: Image Slicing’s Kiss of Death Overview of how CSS sprites work and how they can be used. A lesson in the usefulness of CSS sprite generators Covers commonly used spite generators.

This is the number two recommended best practice.

A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.
http://developer.yahoo.com/performance/rules.html#cdn

Of all the CDN web services SimpleCDN seems to be getting positive press amongst Drupal folks as it is simple and cheap. It offers the “origin pull” Mirror Buckets service which will serve content from 3.9 cents to 1.9 cents per GB. At this price you will probably be saving money on your bandwidth costs as well as serving content faster.

The CDN integration module is the recommended module to use for integration with content delivery networks as it supports “origin pull” as well as push methods. It supports content delivery for a all CSS, JS, and image files (including ImageCache).

High traffic, geographically dispersed: use CDN

CDN integration module Wim Leers’ fully featured module which integrates with a wide range of CDN servers. SimpleCDN module Simple CDN re-writes the URL of certain website elements (which can be extended using plugins) for use with a CDN Mirror service. Drupal CDN integration: easier, more flexible and faster! Slides covering advantages of CDNs and possible implementations. mod_cdn Apache2 module which shows some promise but not much info available for it with regards to Drupal. Best Drupal CDN module? Drupal Groups discussion.

On a related note many sites can benefit from judicial placement of the server if traffic tends to come from one place and no CDN is being used. Sites based out of the US may find the proximity of a site hosted in their area worth the extra cost of hosting.

When a file is served by a web server an “Expires” header can be sent back to the client telling it that the content being sent will expire at a certain date in the future and that the content may be cached until that time. This speeds up page rendering because the client doesn’t have to send a GET request to see if the file has been modified.

By default the .htaccess file in the root of Drupal contains rules which sets a two week expiry for all files (CSS, JS, PNG, JPG, GIF) except for HTML which are considered to be dynamic and therefore not cachable.


# Requires mod_expires to be enabled.

# Enable expirations.
ExpiresActive On
# Cache all files for 2 weeks after access (A).
ExpiresDefault A1209600
# Do not cache dynamically generated pages.
ExpiresByType text/html A1

The Expires header will not be generated unless you have mod_expires enabled in Apache. To make sure it is enabled in Apache2 run the following as admin.


# a2enmod expires
# /etc/init.d/apache2 restart

Ensuring this is enabled will elevate your YSlow score by about 10 points or so.

All sites: Configue Apache correctly for fewer requests.

You can Gzip by enabling compression in the performance area of admin. Alternatively you could configure Apache to do it.

All Sites: Enable Gzip compression

Binary files do not shrink significantly after Gzip compression. Gains can be made by ensuring that rich media such as images, audio and video are (i) targeted for the correct display resolution and (ii) have an appropriate amount of lossy compression applied. Since these files will generally only be downloaded once they do not benefit from caching in the client and so care must be taken to ensure that they are as small as reasonably possible.

All Sites: Compress binary files

Pngcrush Pngcrush is an optimizer for PNG (Portable Network Graphics) files. It can be run from a commandline in an MSDOS window, or from a UNIX or LINUX commandline.

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.

Be Sociable, Share!
Feb 06 2010
Feb 06
Blue Tape

Benchmarking a system is a reliable way to compare one setup with another and is particularly helpful when comparing different server configurations. We cover a few simple ways to benchmark a Drupal website.

A performant system is not just one which is fast for a single request. You also need to consider how the system performs under stress (many requests) and how stable the system is (memory). Bechmarking with tools such as ab allows you to stress the server with many concurrent requests to replicate traffic when a site is being slashdotted. With a more customised setup they can also be used in more sophisticated ways to mimic traffic across a whole site.

Documentation which covers tools of the trade including Apache Bench (ab) and SIEGE.

ab is the most commonly used benchmarking tool in the community. It shows you have many requests per second your site is capable of serving. Concurrency can be set to 1 to get end to end speed results or increased to get a more realistic load for your site. Look to the “failed requests” and “request per second” results.

In order to test the speed of a single page, turn off page caching and run ab with concurrency of one to get a baseline.

ab -n 1000 -c 1 http://drupal6/node/1

To check scalability turn on the page cache and ramp up concurrent connections (10 to 50) to see how much the server can handle. You should also make sure keep alives are turned (-k) on as this leads to a more realistic result for a typical web browser. At higher concurrency levels making new connections can be a bottleneck. Also, set compression headers (-H) as most clients will support this feature.

ab -n 1000 -c 10 -k -H 'Accept-Encoding: gzip,deflate' http://drupal6/node/1

Testing with ab and simple changes you can make within Drupal. Covers server side tools and walks through ab options and use. Demonstrates how to pull out current session id and how to pass that to ab so that authenticated users can be tested. Illustrative discussion where different Drupal setups are benchmarked with ab.

JMeter is a Java desktop app designed to test function and performance. It is the preferred testing tool of many administrators.

Perl script which runs a JMeter test on Drupal and provides graphs. Some scripts to get you started testing with JMeter.

Benchmarking is essential if you wish to have an objective comparison between different setups. However, it is not the final measurement with regards to performance. Remember that page rendering times are what are important for users and that too needs to be optimized. Also, benchmarks tend to be artificial in the sense that they often measure unrealistic situations. Will all of your requests be for one anonymous page only? Maybe in the Slashdot situation but there are other considerations obviously. Finally, it is easy to focus intently on the number, especially when it comes to caching scores, and forget that minor differences may not make so much of a difference to real life scenarios. Don’t forget the logged in user.

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web