Nov 07 2018
Jay
Nov 07

Determining how content on your site should be cached isn't a simple topic. Last time, I covered cache contexts and tags. Today, I'd like to get into a couple more advanced topics: The use of custom cache tags and of max-age.

Custom Cache Tags

Drupal's built-in selection of cache tags is large, and some contributed modules add additional tags appropriate to what they do, but for really refined control you might want to create your own custom cache tags. This is, surprisingly, quite easy. In fact, you don't have to do anything in particular to create the tag – you just have to start using it and, somewhere in your code, invalidate it:

 \Drupal\Core\Cache\Cache::invalidateTags(['my_module:my_custom_tag']);

 As an example, to continue the scenario from part one about a page showing recent articles, there is one thing about this page that the tags we've already looked at don't quite cover. What if a new article gets created with a publish date that should be shown on your page? Or, maybe an article which isn't currently displayed has its publish date updated, and now it should start showing up? It's impractical to include a node-specific tag for every article that might possibly have to show up on your page, especially since those articles might not exist yet. But we do want the page to update to show new articles when appropriate.

The solution? A custom cache tag. The name of the tag doesn't matter much, but might be something such as my_module:article_date_published. That tag could be added on the page, and it could be invalidated (using the function above) in a node_insert hook for articles and in a node_update anytime that the Date Published field on an article gets changed. This might invalidate the cached version of your page a little more frequently than is strictly necessary (such as when an article's publish date gets changed to something that still isn't recent enough to have it show up on your custom page), but it certainly shouldn't miss any such updates.

This is a simple example of a custom cache tag, but they can be used for many other situations as well. The key is to figure out what conditions your content needs to be invalidated in and then start invalidating an appropriate custom tag when those conditions are met.Ready to get the most out of Drupal?  Schedule a free consultation with an Ashday Drupal Expert. 

Rules for Cache Max-Age

Setting a maximum time for something to be cached is sort of a fallback solution – it's useful in situations where contexts and tags just can't quite accomplish what you need, but should generally be avoided if it can be. As I mentioned in a previous article, a great example of this is content which is shown on your site but which gets retrieved from a remote web service. Your site won't automatically know when the content on the remote site gets updated, but by setting a max-age of 1 hour on your caching of that content, you can be sure your site is never more than an hour out of date. This isn't ideal in cases where you need up-to-the-minute accuracy to the data from the web service, but in most scenarios some amount of potential "delay" in your site updating is perfectly acceptable, and whether that delay can be a full day or as short as a few minutes, caching for that time is better than not caching at all. 

However, there is one big caveat to using max-age: It isn't directly compatible with the Internal Page Cache module that caches entire pages for anonymous users. Cache contexts and tags "bubble up" to the page cache, but max-age doesn't. The Internal Page Cache module just completely ignores the max-age set on any parts of the page. There is an existing issue on Drupal.org about potentially changing this, but until that happens, it's something that you'll want to account for in your cache handling.

For instance, maybe you have a block that you want to have cached for 15 minutes. Setting a max-age on that block will work fine for authenticated users, but the Internal Page Cache will ignore this setting and, essentially, cause the block to be cached permanently on any page it gets shown on to an anonymous user. That probably isn't what you actually want it to do.

You have a few options in this case.

First, you could choose to not cache the pages containing that block at all (using the "kill switch" noted in part one). This means you wouldn't get any benefit from using max-age, and would in fact negate all caching on that page, but it would guarantee that your content wouldn't get out of date. As with any use of the "kill switch," however, this should be a last resort.

Second, you could turn off the Internal Page Cache module. Unfortunately, it doesn't seem to be possible to disable it on a page-by-page basis (if you know a way, please drop us a line and we'll update this post), but if most of your pages need to use a max-age, this may be a decent option. Even with this module disabled, the Internal Dynamic Page Cache will cache the individual pieces of your page and give you some caching benefits for anonymous users, even if it can't do as much as both modules together.

My preferred option for this is actually to not use a max-age at all and to instead create a custom, time-based cache tag. For instance, instead of setting a max-age of 1 hour, you might create a custom cache tag of "time:hourly", and then set up a cron task to invalidate that tag every hour. This isn't quite the same as a max-age (a max-age would expire 1 hour after the content gets cached, while this tag would be invalidated every hour on the hour) but the caching benefits end up being similar, and it works for anonymous users.

Up Next

Now that we've gotten an overview of how to determine what rules you should use to cache content on your site, it's time to get a little bit more technical. Next time, I'll be taking a look at how Drupal stores and retrieves cached data, which can be immensely useful to understanding why the cache works the way it does, and it's also quite helpful to know when fixing any caching-related bugs you might encounter. Watch this blog for all the details!

Offer for a free consultation with an Ashday expert

Oct 31 2018
Jay
Oct 31

Leveraging Drupal's cache effectively can be challenging at first, but the benefits for your site's performance make it well worth the effort. It all starts with figuring out what sort of rules your site should use to cache its content. The improved page load times that come from properly handling caching rules can help improve SEO and are generally more appealing to those that visit the website. In the right hands, the techniques outlined in this article can get a website into a much more specialized caching system.

I previously covered the fundamentals of how caching works in Drupal 8, including what the two core caching modules do and what cache tags, contexts, and max-age are for. If you're familiar with those things, then this post is for you; otherwise, check out my previous article and get up to speed before we dive into a slightly more in-depth topic: Figuring out how you should set up caching on your site.

If you're using a simple Drupal installation with no custom code and with well-maintained contributed modules, Drupal's Internal Page Cache and Dynamic Internal Page Cache modules will likely cover your caching needs. This article focuses on some more complex and custom scenarios which, nonetheless, come up with some frequency. 

The Guiding Principle

Perhaps the most frequent issue custom code has when it comes to caching is that it doesn't account for caching at all. This isn't ideal if you want to take advantage of Drupal's caching system to optimize your site's speed, and it points to one principle which can be tricky to learn and is critical to master: If you write custom code, always think about its caching implications. Always. 

Often, the implications will be minimal, if there are any at all. It's important not to assume that every bit of custom code will cache perfectly on its own - that's a mistake that could lead to something being cached either for too long or not at all. 

When to Disable the Cache 

Most everything Drupal renders as output (to a web browser, to a RESTful web service, etc.) can be cached. However, sometimes the development time to ensure that your custom code handles caching precisely outweighs the performance benefits that caching might provide. So the first question when you're writing custom code is, should this be cached at all?

Ready to get the most out of Drupal?  Schedule a free consultation with an Ashday Drupal Expert. 

If you're building something like an administrative page that only a few users with special permissions will ever see, it may not be worth the time and effort to make sure it is cached perfectly, especially if the rules for doing so would be complicated. For these scenarios, Drupal has a "page cache kill switch" that can be triggered in code:

 \Drupal::service('page_cache_kill_switch')->trigger();

Calling the kill switch will stop both the page cache and the dynamic page cache from doing any caching on that page.

This should be used with caution though, and only as a last resort in situations where figuring out proper caching logic isn't worth the time. It should never be used on a page which is expected to be viewed by a large number of your website's visitors.

Rules for Cache Contexts

You should consider using a cache context if your content should look different when displayed in different situations. Let's look at a couple scenarios that benefit from using a cache context:

Say you have a site which can be accessed from two different domains and you want to display something a little different depending on which domain someone is looking at. Perhaps the site's logo and tagline change a little. In this case, the block containing the logo and tagline should be given the url.site context. With this context in place, Drupal will cache a separate version of the block for each domain and will show each domain's visitors the appropriate one.

Or, perhaps a block contains a bit of information about which content the currently logged-in user has permission to edit. This sounds like an excellent case for using the user.permissions context to indicate to Drupal that the block is different for each possible combination of permissions that a user might have. If two users have the same permission, the same cached version can be used for both of them.

There are many other contexts are available as well; take a look at the full list to see if one or more of them is applicable to your code.

Rules for Cache Tags 

Cache tags are probably the most important caching mechanism available to custom code. Drupal includes countless cache tags which can be used to invalidate a cache entry when something about your site changes, and it is also very easy to create your own cache tags (which we'll get to in a minute). For now, I'm going to focus on some of the cache tags Drupal has by default.

Say you're creating a page which shows the top five most recently published articles on your site. Now, Drupal sites can often make use of the Views module for this sort of thing, but depending on your exact requirements Views may not be the best approach – for instance, maybe part of the content has to come from a remote service that Views can't readily integrate with. The most obvious tags needed for this page are the tags for the specific pieces of content that are being shown, which are tags in the format of node:<nid>, for instance, node:5 and node:38. With these tags in place, whenever the content gets updated, the cache entry for your page gets invalidated, and the page will be built from scratch with the update information the next time somebody views it.

But that's not all there is to think about. Perhaps this page also shows what categories (using a taxonomy structure) each article is in. Now, the articles each have an entity reference field to their categories, so if a user changes what categories the article is in, the relevant node:<nid> tags already added to your page will get cleared. Easy enough. But what if somebody changes the name of the category? That involves editing a taxonomy term, not the article node, so it won't clear any node:<nid> tags. To handle this situation, you'd want to have appropriate taxonomy_term:<id> tags. If an article with ID 6 has terms with IDs 14 and 17, the tags you'd want are node:6, taxonomy_term:14, and taxonomy_term:17, and you'll want to do this for every article shown on your page.

Fortunately, most of the time, you don't need to worry about the specific tag names. Nodes, terms, and other cacheable objects have a getCacheTags() method that gets exactly whatever tags you should use for that object.

These are all simple entity-based tags, but there are many more available as well. There are tags for when various aspects of Drupal configuration changes as well as for things such as when certain theme settings get changed. Unfortunately, since the available cache tags vary from site to site, there isn't a ready-made list of them available for you to use as a reference. You can, however, look at the "cachetags" table in your Drupal database to see a list of all the tags that have been invalidated at least once on the site. This will be pretty minimal if your site is brand-new, but as people use the site it will start filling up.

The basic idea of tags is this: If you render something on a page, and there's a chance that something displayed on it might change in the future, there should be an appropriate tag in place to watch for that change.

Up Next

This is a big topic, but it looks like we're out of time for today. Next time, we'll delve a bit deeper into cache tags by seeing how to create custom ones that perfectly fit your site's needs and will also cover how to use max-age, including one important gotcha that makes them more complicated than they look.

Offer for a free consultation with an Ashday expert

Jan 21 2017
Jan 21
You have to cache results of heavy functions, sql queries and markup if it's possible because it reduces load on a system in general. If you have a block which renders a lot of items shiped from a database - cache it. If you have a function that performs heavy calculations and this function is called many times during a script execution - cache a result of this function.

Static cache


It's recommended to implement static cache for functions that are called many times during a script execution. It's easy to do: just store a function result first time when function is called and skip heavy calculation next time and return saved value from a static variable.
<?php

/**
 * Some function performs heavy calculations.
 *
 * @return mixed
 *   A result.
 */
function module_heavy_function() {
  // Function drupal_static() returns
  // a static variable by name. Usually
  // variable name equals a function
  // name.
  $result = &drupal_static(__FUNCTION__);

  if (!isset($result)) {
    // Perform heavy calculation here.
    ...

    // After that assign result to a
    // static variable.
    $result = $result_of_heavy_calculations;
  }

  return $result;
}

Database cache


If you want to save a function result for some period of time you have to implement database cache. You can specify a cache lifetime (when it will be expired) or invalidate cache manually when do you need.
<?php

/**
 * Some function performs heavy calculations.
 *
 * @return mixed
 *   A result.
 */
function module_heavy_function() {
  $cache_name = 'module:module_heavy_function';
  $result_of_heavy_calculations = NULL;

  // Cache miss: If there is no cached data or
  // cache have expired or you invalidated cache
  // manually do heavy calculations and save
  // a result into a cache.
  if (!cache_get($cache_name)) {
    // Perform heavy calculation here.
    ...

    // After that set cache. Data will be saved
    // into cache bin 'cache' for one hour.
    cache_set($cache_name, $result_of_heavy_calculations, 'cache', time() + 60 * 60);
  }
  // Cache hit: load cached data from a database
  // and return it.
  else {
    $cache = cache_get($cache_name);
    $result_of_heavy_calculations = $cache->data;
  }

  return $result_of_heavy_calculations;
}
Manual cache invalidating:
<?php

// You can invalidate all cached data
// that matches a wildcard cid 'module:*';
cache_clear_all('module:', 'cache', TRUE);

// Or you can invalidate cache by specific
// cid.
cache_clear_all('module:content', 'cache');
You can also cache a data of a renderable array by specifying #cache key.
<?php

$renderable_array['content'] = [
  '#cache' => [
    'cid' => 'module:content',
    'bin' => 'cache',
    'expire' => time() + 60 * 60,
  ],
  '#markup' => 'This markup will be cached',
];

Key notes:

Jul 28 2016
Jul 28

This Drupal cache case study began on a pleasant January afternoon in beautiful Goa. The entire company flew in for our annual retreat, enjoying the scenic beauty of the sandy beaches and the cool breeze. Between the fun times, we were ducking into our remote workspace, poring over a cachegrind report sent over by our client.

This report was generated for ajax responses coming from a product page on an e-commerce website. We could clearly see that the page was extremely slow to render and that this was true for all ajax responses.

The Cachegrind report as sent by the client - Drupal Cache: A Case StudyThe Cachegrind report as sent by the client – Drupal Cache: A Case Study

Progressive Doping

Digging deeper to figure out what was causing the performance bottleneck, we use XDebug to generate cachegrind files, which we could then analyze to determine what functions were costing us the most, or where there were functional paths that could be truncated via caching. Performing this process iteratively we came up with a list of major pain areas. This list was very specific to our project.

  1. Function “get nid from product_id” was being frequently called with a resulting resource consuming data query.
  2. Function “hook_commerce_checkout_pane_info_alter” was being executing on each request.
  3. Product filtering for a particular type of user was a resource consuming activity
  4. Function “get product_id matching attributes” was being frequently called, with a resulting resource consuming query.
  5. The theme function “theme_extra_bundles_radio” was processing some resource consuming business logic.
  6. A lot of calls were made to the “entity_load” function.

Drupal Cache

It is a widely accepted fact that as great as the Drupal 7 core can be, it doesn’t scale well for web sites with a lot of content and a lot of users. To extract the best performance under such scenarios, it’s necessary to make use of several strategies, tools, and techniques. Keeping this in mind, we settled down with the following strategies:

  • Basic Drupal caching:

Drupal offers basic performance settings at:

Administration > Configuration > Performance > Development > Performance (admin/config/development/performance)

Here we enabled block and page caching which cached the dynamic complex SQL queries of Drupal for quick retrieval of content. We also enabled the Compress cached pages, Aggregate and compress CSS files and Aggregate JavaScript files which further helped in bandwidth optimization by reducing the number of HTTP calls.

Lineup/Dealership cache - Drupal Cache: A Case StudyLineup/Dealership cache – Drupal Cache: A Case Study

You can also define your panel cache method for your particular use case and cache the panel content.  First, define a ctool plugin like:

&lt;?php // Plugin definition. $plugin = array( 'title' =&gt; t("Example cache"),
  'description' =&gt; t('Provides a custom option to cache the panel content.'),
  'cache get' =&gt; 'my_module_cache_get_cache',
  'cache set' =&gt; 'my_module_cache_set_cache',
);

/**
 * Get cached content.
 */
function my_module_cache_get_cache($conf, $display, $args, $contexts, $pane = NULL) {
   $cache = cache_get(‘my_panel_data', 'cache_panels');
  if (!$cache) {
        return FALSE;
  }

  if ((time() - $cache-&gt;created) &gt; $conf['lifetime']) {
        return FALSE;
  }

  return $cache-&gt;data;
}

/**
 * Set cached content.
 */
function my_module_cache_set_cache($conf, $content, $display, $args, $contexts, $pane = NULL)    {

        cache_set('my_panel_data', $content, 'cache_panels');
}

?&gt;
  • Custom caching:

After looking at few more cachegrind reports, we identified some custom functions that were causing a performance problem and thrashing the database to perform complex queries and expensive calculations every time a user viewed specific pages. This prompted  us to use Drupal’s built-in caching API’s

The rule is never to do something time-consuming twice if we can hold onto the result and reuse them. This can be depicted in a simple example:

&lt;?php function my_module_function() { $data = &amp;drupal_static(__FUNCTION__); if (!isset($data)) { // Here goes the complex and expensive calculations and populate $data // with the output of the calculations. } return $data; } ?&gt;

The drupal_static function provides a temporary storage to functions for data that sticks around even after they are done executing. drupal_static adds the magic to this function. It returns an empty value on the first call and preserves the data on next call in the same request. That means our function now can determine if the variable is already populated or not. If it has been populated, it will return the value and skip the expensive calculations.

The static variable technique stores data for only a single page load. To overcome this situation Drupal cache API persists the cached data in a database. The Following code snippet illustrates the use of Drupal cache API

&lt;!--?php function my_module_function()   {   
$data   =       &amp;drupal_static(__FUNCTION__);           
if (!isset($data)) {
  $cache = cache_get('my_module_data', 'cache_my_module')
  if ($cache &amp;&amp; time() &lt; $cache-&gt;expire) { 
    $data = $cache---&gt;data;
  }
  else {
    // Here goes the complex and expensive calculations and populate
   // $data with the output of the calculations.
   cache_set('my_module_data', $data, 'cache_my_module', time() + 360);
  }
}
return $data;
}
?&gt;

This version of the function uses database caching. Drupal caching API provides three key functions –  cache_get(), cache_set() and cache_clear_all(). The initial check is made to the static variable, and then this function checks Drupal’s cache for data stored with a particular key. If data is found the $data is set to $cache->data.

If no cached version is found, the function performs the actual work to generate the data and saves it to the static variable at the same time which means that in the next call in the same request, it does not even need cache_get.

Finally, we get a slick little function that saves time whenever it can—first checking for an in-memory copy of the data, then checking the cache, and finally calculating it from scratch.

Conclusion:

After putting all these together, we were able to reduce the total number of functions call to 741 and the execution time reduced to 665ms. Hence we get the 1500% of performance boost.

Cachegrind Report Generated After Implementing Drupal Cache. - Drupal Cache: A Case StudyCachegrind Report Generated After Implementing Drupal Cache. – Drupal Cache: A Case Study Want to learn how to set up RESTful Drupal Caching?

References:

This article was originally published in April 2015.

Jun 15 2016
Jun 15

Whilst working on a Drupal 8 project, we found that cache tags for a Block Content entity embedded in the footer weren't bubbling up to the page cache.

Read on to find out how we debugged this and how you can ensure this doesn't happen to you too.

The problem

On our site we had a Block Content entity embedded in the footer that contained an Entity Reference field which allowed site admins to highlight author profiles. The issue was that if the admin edited this block and changed the referenced authors, or the order of the authors - the changes didn't reflect for anonymous users until the page cache was cleared.

This immediately sounded like an issue with cache tags bubbling.

About cache tags

So what are cache tags. Well lets quote the excellent handbook page:

Cache tags provide a declarative way to track which cache items depend on some data managed by Drupal.

So in our case, as the page is built, all of the content that is rendered has its cache tags bubble up to the page level cache entry. When any of the items that form the page are updated, all cache entries that match that item's tags are flushed. This ensures that if a node or block that forms part of a page is updated, the cache entry for the page is invalidated. For entities, the tags are in the format {entity type}:{entity id} - e.g. node:2 or block_content:7

Clearly this wasn't happening for our block.

Debugging cache tags

So the first step with debugging this issue is to see what cache tags were associated with the page.

Luckily, core lets you do this pretty easily.

In sites/default/services.yml you'll find this line:

http.response.debug_cacheability_headers: false

Simply change it to true, and rebuild your container (clear caches or drush cr). Then browse to your site and view the headers in the Network panel of your developer toolbar. You'll start seeing headers showing you the cache tags like so:

Response headers

Checkout the X-Drupal-Cache-Tags one to see what tags make up the page.

So in our case we could see that the block we were rendering wasn't showing up in the cache tags.

Digging into the EntityViewBuilder for the block and block content entity, we could see that the right tags were being added to the $content variable, but they just weren't bubbling up to the page level.

Rendering individual fields in Twig templates

Now, this particular block had it's own Twig template, we were controlling the markup to ensure that one field was rendered in a grid layout and another was rendered at the end. The block type had two fields, and we were rendering them using something like this:

<div{{ attributes.addClass('flex-grid__2-col') }}>
  {% if label %}
    <h2{{ title_attributes.addClass('section-title--light') }}>{{ label }}</h2>
  {% endif %}
  {% block content %}
    <div class="flex-grid">
      {{ content.field_featured_author }}
    </div>
    <div class="spacing--small-before">
      {{ content.field_more_link }}
    </div>
  {% endblock %}
</div>

i.e We were rendering just field_featured_author and field_more_link from the content variable. And this is the gotcha. You have to render the content variable to ensure that its cache tags bubble up and end up in the page cache.

The fix

There were only two fields on this block content entity, and we wanted control over how they were output. But we also had to render the content variable to make sure the cache tags bubbled. This was a chance for the Twig without filter to rescue the day. The new markup was:

<div{{ attributes.addClass('flex-grid__2-col') }}>
  {% if label %}
    <h2{{ title_attributes.addClass('section-title--light') }}>{{ label }}</h2>
  {% endif %}
  {% block content %}
    {{ content|without('field_featured_author', 'field_more_link') }}
    <div class="flex-grid">
      {{ content.field_featured_author }}
    </div>
    <div class="spacing--small-before">
      {{ content.field_more_link }}
    </div>
  {% endblock %}
</div>

In other words, we still render the fields on their own, but we make sure we also render the top-level content variable, excluding the individual fields using the without filter.

After this change, we started seeing our block content cache tags in the page-level cache tags and as to be expected, changing the block triggered the appropriate flushes of the page cache.

Twig Caching Drupal 8 Cache Tags
Feb 25 2015
Feb 25

Drupal 8 comes with many improvements over its predecessor we have grown to both love and hate. Next to prominent systems such as Views in core, configuration management or a useful translation service, there are also less known changes but that are equally important to know and use. One such improvement has been the cache API that solves many performance problems we have in Drupal 7.

drupal8wide

In this article, I want to shine a bit of light over the new cache API. To this end, we are going to look at how we can use it in our custom modules as we are encouraged to do so much more in Drupal 8.

Additionally, I have prepared a little demonstration in the shape of a module you can install for testing the impact of the cache API. It’s a simple page that in its rendering logic makes an external API call (to a dummy JSON endpoint) and caches its results. The page then displays the actual time it takes for this to happen, contrasting the external call time vs. the cached version time.

The new cache API

Cache concept.

Bins

The new cache API (with the default DatabaseBackend storage) is stored in multiple bins which map to tables that start with the prefix cache_. When interacting with the cache, we always start by requesting a cache bin:

$cache = \Drupal::cache();

Where $cache will be an instance of the DatabaseBackend object that represents the default bin (cache_default). To request a particular bin we pass in the name in the constructor:

$render_cache = \Drupal::cache('render');

Where $render_cache will represent the render cache bin (which is new in Drupal 8 and is supposed to improve render performance across the board).

As you can see, we are requesting the cache service statically using the \Drupal class. If we are working inside classes, it is best practice to inject the service from the container. You can do so by specifying as an argument to your service the relevant cache bin service (such as cache.default). Here you can get a list of all core services including the ones related to cache.

But for the sake of brevity, we will use it statically here.

Retrieving cached items

Once we know which bin we want to work with (for custom modules this will usually be the default bin), we can retrieve and store cache items.

$cache = \Drupal::cache()->get('my_value');

It’s that simple. $cache will be a stdClass object containing some metadata about the cache item plus the actual data available under the $cache->data property. The my_value parameter is the cache ID.

An important thing to keep in mind is that using the get() method without a second parameter will not return the cache item if it has been invalidated (either programatically or through expiration). Passing the boolean true as a second parameter will force it to return the data.

Storing cache items

Although storing new items in the cache is just as easy as retrieving them, we have more options when doing so. To store an item we use the set() method (instead of get() like before), a method that takes 2 mandatory parameters and 2 optional ones:

  • the cache ID (the string by which we can later reference the item)
  • the data (a PHP value such as a string, array or object that gets serialised automatically and stored in the table – should not be over 1MB in size)
  • the expiration time (a timestamp in the future when this cache item will automatically become invalid or -1 which basically means this item never expires. It is best practice to use the Drupal\Core\Cache\CacheBackendInterface::CACHE_PERMANENT constant to represent this value)
  • tags (an array of cache tags this item can be later identified by)

As an example:

Drupal::cache()->set('my_value', $my_object, CacheBackendInterface::CACHE_PERMANENT, array('my_first_tag', 'my_second_tag'));

This will set a permanent cache item tagged with 2 tags and store a serialised version of $my_object as the data.

Cache invalidation and removal

Cache invalidation means that the respective items are no longer fresh and are unreliable in terms of what data they hold. They will be removed at the next garbage collection which can also be called using the garbageCollection() method on the CacheBackend object.

As mentioned above, when storing a cache item we can specify an expiration time. When this time lapses, the cache item becomes invalid but still exists in the bin and can be retrieved. However, we can also invalidate items manually using the invalidate(), invalidateMultiple() or invalidateAll() methods on the CacheBackend object.

Removing items altogether can be done using the delete(), deleteMultiple() or deleteAll() methods. These actions also happen only on the bin the CacheBackend is wrapping and completely remove the respective table records.

Cache tags

Another cool new feature of the Cache API in Drupal 8 are the cache tags (the fourth parameter in the setter method). The role of the tags is to identify cache items across multiple bins for proper invalidation. The purpose is the ability to accurately target multiple cache items that contain data about the same object, page, etc. For example, nodes can appear both on a page and in a view (stored in different cache items in different bins but both tagged with the same node:nid formatted tag). This allows invalidating both cache items when changes happen to that node without having to know the cache ids.

To manually invalidate caches using the tags, we can use the invalidateTags() method statically on the \Drupal\Core\Cache\Cache class:

\Drupal\Core\Cache\Cache::invalidateTags(array('node:5', 'my_tag'));

This will call the cache invalidator service and invalidate all the cache items tagged with node:5 and my_tag.

Additionally, for Drupal entities we don’t have to create our own tags but can retrieve them from the entity system:

  • \Drupal\Core\Entity\EntityInterface::getCacheTags()
  • \Drupal\Core\Entity\EntityTypeInterface::getListCacheTags()

This keeps the tags for Drupal entities consistent across the board.

Demonstrating the cache API

As I mentioned before, I created a small module that allows us to see the benefits of caching data. You can find the module in this git repository but here is the crux of it:

Please note that in this example I access the cache backend service statically to save some space. For a dependency injection approach (the correct approach), take a look at the repository code.

A route file that adds a new route to the /cache-demo path:

cache_demo_page:
  path: 'cache-demo'
  defaults:
    _controller: '\Drupal\cache_demo\Controller\CacheDemoController::index'
    _title: 'Cache demo'
  requirements:
    _permission: 'access content'

And the controller class that returns the page inside src/Controller/CacheDemoController.php:

<?php

/**
 * @file
 * Contains \Drupal\cache_demo\Controller\CacheDemoController.
 */

namespace Drupal\cache_demo\Controller;

use Drupal\Core\Cache\CacheBackendInterface;
use Drupal\Core\Controller\ControllerBase;
use Drupal\Core\Url;
use \GuzzleHttp\Client;

/**
 * Cache demo main page.
 */
class CacheDemoController extends ControllerBase {

  public function index(Request $request) {
    $output = array();

    $clear = $request->query->get('clear');
    if ($clear) {
      $this->clearPosts();
    }

    if (!$clear) {
      $start_time = microtime(TRUE);
      $data = $this->loadPosts();
      $end_time = microtime(TRUE);

      $duration = $end_time - $start_time;
      $reload = $data['means'] == 'API' ? 'Reload the page to retrieve the posts from cache and see the difference.' : '';
      $output['duration'] = array(
        '#type' => 'markup',
        '#prefix' => '<div>',
        '#suffix' => '</div>',
        '#markup' => t('The duration for loading the posts has been @duration ms using the @means. @reload',
          array(
            '@duration' => number_format($duration * 1000, 2),
            '@means' => $data['means'],
            '@reload' => $reload
          )),
      );
    }

    if ($cache = \Drupal::cache()->get('cache_demo_posts') && $data['means'] == 'cache') {
      $url = new Url('cache_demo_page', array(), array('query' => array('clear' => true)));
      $output['clear'] = array(
        '#type' => 'markup',
        '#markup' => $this->l('Clear the cache and try again', $url),
      );
    }

    if (!$cache = \Drupal::cache()->get('cache_demo_posts')) {
      $url = new Url('cache_demo_page');
      $output['populate'] = array(
        '#type' => 'markup',
        '#markup' => $this->l('Try loading again to query the API and re-populate the cache', $url),
      );
    }

    return $output;
  }

  /**
   * Loads a bunch of dummy posts from cache or API
   * @return array
   */
  private function loadPosts() {
    if ($cache = \Drupal::cache()->get('cache_demo_posts')) {
      return array(
        'data' => $cache->data,
        'means' => 'cache',
      );
    }
    else {
      $guzzle = new Client();
      $response = $guzzle->get('http://jsonplaceholder.typicode.com/posts');
      $posts = $response->json();
      \Drupal::cache()->set('cache_demo_posts', $posts, CacheBackendInterface::CACHE_PERMANENT);
      return array(
        'data' => $posts,
        'means' => 'API',
      );
    }
  }

  /**
   * Clears the posts from the cache.
   */
  function clearPosts() {
    if ($cache = \Drupal::cache()->get('cache_demo_posts')) {
      \Drupal::cache()->delete('cache_demo_posts');
      drupal_set_message('Posts have been removed from cache.', 'status');
    }
    else {
      drupal_set_message('No posts in cache.', 'error');
    }
  }

}

Inside the index() method we do a quick check to see whether the clear query parameter is present in the url and call the clearPosts() method responsible for deleting the cache item. If there isn’t one, we calculate how long it takes for the loadPosts() method to return its value (which can be either the posts from the cache or from the API). We use Guzzle to make the API call and when we do, we also store the results directly. Then we just output the duration of the call in milliseconds and print 2 different links depending on whether there is cache stored or not (to allow us to clear the cache item and run the API call again).

When you navigate to cache-demo for the first time, the API call gets made and the 100 posts get stored in the cache. You can then reload the page to see how long it takes for those posts to be retrieved from the cache. Upon doing that, you’ll have a link to clear the cache (by a page refresh with the clear query string) followed by another link which refreshes the page without the clear query string and that in turn makes the API call again. And on like that to test the contrast in duration.

Conclusion

In this article we’ve looked at how easy it is to use the Cache API in Drupal 8. There are some very simple class methods that we can use to manage cache items and it has become too straightforward for us not to start using it in our custom modules. I encourage you to check it out, play around with the API and see for yourself how easy it is to use.

Jul 16 2014
Jul 16
teaser image for blog post

Since we integrated the EdgeCast CDN for one of our clients, and released a related EdgeCast Drupal module we have been encouraging more and more clients to consider a CDN layer to accelerate performance to multiple geographic locations and maintain an excellent uptime even during site maintenance periods.

A recent international client who is running many domains with federated content using the Domain module needed to make use of the content delivery network to improve performance and resiliance for their sites.

The use of the Domain module created a problem with the purge process employed in the EdgeCast v1.x module. As with most of the cache purging modules available for Drupal it sent each URL individually and waited for confirmation back to say it the request had been successful.

With a large multi domain site this multiplied the page load time upon submitting a node edit form to the point of a PHP timeout and Ngnix HTTP 500 errors. Increasing the timeouts was a short term fix, but it still meant content editors were slowed down waiting for forms to submit.

There was also another effect cause by external feed aggregation which created nodes, which in turn triggered URL purge requests, and slowed down each node creation step.

We toyed with the idea of using a Drupal batch queue to process the purge requests during the regular cron call but this delayed the refreshing of content pages far too long for the content editors.

We settled on what appears to be the little used Multi Curl facility. This allowed us to build up an array of all the URLs using the Expire module and then send a single HTTP request to the EdgeCast API followed by ignoring the response back from the API. This created a much faster content editing experience for the users.

Note: The Expire module doesn't currently implement Domain module purging - the code class is empty. See issue #2293217 for details. We worked around this using the Cache Expiration Alias module.

The side effect of the Multi Curl development work was the speeding up of the single domain sites purge requests too!

The v2.0 EdgeCast module is available now for Drupal 7.

Find out more about the content delivery network from Edgecast.

Feb 25 2013
Feb 25

In the Drupal community, we always recommend using the Drupal API, and best practices for development, management and deployment. This is for many reasons, including modularity, security and maintainability.

But it is also for performance that you need to stick to these guidelines, refined for many years by so many in the community.

By serving many clients over many years and specifically doing Drupal Performance Assessments, we have seen many cases where these guidelines are not followed, causing site slowdowns and outages.

Here are some examples of how not to do things.

Logic in the theme layer

We often find developers who are proficient in PHP, but new to Drupal misuse its API in many ways.

In extreme cases, they don't know they should write modules to house the application logic and doing data access, and leave only presentation to be done in the theme layer.

We saw a large site where all the application logic was in the theme layer, often in .tpl.php files. The logic even ended with an exit() statement!

This caused Drupal page caching mechanism to be bypassed, resulting in all page accesses from crawlers and anonymous users to be very heavy on the servers, and complicating the infrastructure by over-engineering it to compensate for such a development mistake.

Using PHP in content (nodes, blocks and views)

Another common approach that most developers start using as soon as they discover it, is placing PHP code inside nodes, blocks or views.

Although this is a quick and dirty approach, the initial time savings cause lots of grief down the road through the life cycle of the site. We wrote an article specifically about that, which you will find a link to below.

Heavy queries in the theme layer, when rendering views

In some cases, the logic for rendering individual nodes within a view is complex, and involves code in the view*.tpl.php file that has SQL queries, or calls to heavy functions, such as node_load() and user_load().

We wrote an article on this which you can find the link to below.

Conclusion

Following Drupal's best practices and community guidelines is always beneficial. Performance is just one of the benefits that you gain by following them.

Further reading

Feb 19 2013
Feb 19

When doing performance assessment for large and complex sites to assess why they are not fast or scalable, we often run into cases where modules intentionally disable the Drupal page cache.

Depending on how often it happens and for which pages, disabling the page cache can negatively impact the site's performance, be that in scalability, or speed of serving pages.

How to inspect code for page cache disabling

If you want to inspect a module to see if it disables the page cache, search its code for something like the following:

// Recommended way of disabling the cache in Drupal 7.x
drupal_page_is_cacheable(FALSE);

Or:

$GLOBALS['conf']['cache'] = 0;

Or:

$GLOBALS['conf']['cache'] = CACHE_DISABLED;

Or:

$conf['cache'] = FALSE;

Modules that disable the page cache

We have found the following modules that disable the page cache in some cases:

Bibliography Module

In biblio_init(), the module disables the page cache if someone is visiting a certain URL, such as "biblio/*" or "publications/*", depending on how the module is configured.

if ($user->uid === 0) { 
  // Prevent caching of biblio pages for anonymous users
  // so session variables work and thus filering works
  $base = variable_get('biblio_base', 'biblio');
  if (drupal_match_path($_GET['q'], "$base\n$base/*"))
    $conf['cache'] = FALSE;
}

Flag Module

This code in flag/includes/flag_handler_relationships.inc

if (array_search(DRUPAL_ANONYMOUS_RID, $flag->roles['flag']) !== FALSE) {
  // Disable page caching for anonymous users.
  drupal_page_is_cacheable(FALSE);

Or in Drupal 6.x:

if (array_search(DRUPAL_ANONYMOUS_RID, $flag->roles['flag']) !== FALSE) {
  // Disable page caching for anonymous users.
  $GLOBALS['conf']['cache'] = 0;

Invite Module

case 'user_register':
  // In order to prevent caching of the preset 
  // e-mail address, we have to disable caching 
  // for user/register.
  $GLOBALS['conf']['cache'] = CACHE_DISABLED;

CAPTCHA Module

The CAPTCHA module disables the cache wherever a CAPTCHA form is displayed, be that in a comment or on the login form.

This is done via the hook_element_info() which sets a callback in the function captcha_element_process().

If you find other modules that are commonly used, please post a comment below about it.

May 21 2012
May 21

Posted May 21, 2012 // 9 comments

Caching is commonly used to boost the performance of Drupal based websites and there are many modules and options for fine grained control of when items expire. However, the standard caching settings available under performance are frequently misinterpreted and different caching backends have subtly different behavior. Here is a breakdown of what the settings mean for page caching and how the two most common storage engines behave for particular events.

Cache Pages for Anonymous Users

The first caching setting under performance is cache pages for anonymous users. This must be set for the other options below to take effect and this tells Drupal it's ok to store the rendered version of a page generated for one non authenticated user and serve it to another. Pages are stored with a cache ID of the URL of the request, technically $base_root . request_uri(), so a request for http://example.com/page_one and http://example.com/page_one?rand=1 are considered two different requests and generated separately even if the query string portion has no effect on the rendering of the page.

Once page caching is enabled there are two settings to consider: Minimum cache lifetime and expiration of cached pages.

Minimum cache lifetime is often misinterpreted as meaning "pages will be regenerated after this much time has passed". What it actually means is that pages will not be regenerated until at least this much time has passed and a cache clearing event has happened. I'll discuss cache clearing events after covering expiration of cached pages.

Expiration of cached pages is also sometimes misinterpreted. This value controls what is sent as a max-age value in a Cache-Control header and thus advises proxy servers how long they may serve the page without asking your Drupal install for a new copy. This does not mean that the page will be regenerated after this much time, it just means that the proxy server must check back with Drupal to see if a new version of the page exists after this much time. Drupal will only regenerate a page after a cache clearing event occurs.

Cache Clearing Event

I've mentioned cache clearing event several times at this point and it's the most important thing to understand when dealing with Drupal's page caching. Drupal will only regenerate a page when it has some reason to suspect that the results of the page regeneration will be different than the previous results. The things that make Drupal think the page generation results may be different are what I call cache clearing events, primarily because they cause the cache_clear_all function to be called. What is considered a cache clearing event varies based on the cache storage engine being used and how it interacts with the minimum cache lifetime also will vary.

The two most common cache storage engines are the database and Memcached (DrupalDatabaseCache class and MemCacheDrupal class). Below is a chart showing their behavior during typical cache clearing events.

Cache Cron Cache Clear All Content Editing Database Page cache will clear in minimum cache lifetime. Page cache clears immediately and will clear again in minimum cache lifetime. Page cache will clear in minimum cache lifetime. Memcached Not considered a cache clearing event Page cache considers all items generated prior to the current time minus minimum cache lifetime expired (thus clearing them). Items generated after that are not considered for expiration until the next cache clearing event. Page cache considers all items generated prior to the current time minus minimum cache lifetime expired (thus clearing them). Items generated after that are not considered for expiration until the next cache clearing event.

The treatment of a cron run as a cache clearing event when using the database for cache storage combined with frequent cron runs is what typically leads users familiar with Drupal to think of minimum cache lifetime as "my pages will be regenerated after this much time has passed". It is also a source of frustration for users who desire pages to remain cached until content is actually edited. Understanding the caching behavior of Drupal, it's interactions with proxy servers like Varnish and how your cache storage engine reacts in various situations are all critical to ensuring your content is regenerated when necessary and properly cached when not.

As a Senior Web Developer with us, Chris Johnson specializes in high-performance database-backed systems; particularly those that are highly configurable and self-documenting. (Plus, he amicably explains what all that means to inquiring ...

Apr 17 2012
Apr 17

A lot of very interesting things are happening to make Drupal's caching system a bit smarter. One of my favorite recent (albeit smaller) developments is a patch (http://drupal.org/node/1471200) for the Views module that allows for cached views to have no expiration date. This means that the view will remain in the cache until it is explicitly removed.

Before this patch landed, developers were forced to set an arbitrary time limit for how long Views would store the cached content. So even if your view's content only changed every six months, you had to choose a time limit from a list of those predefined by Views, the maximum of which was 6 days. Every six days, the view content would be flushed and regenerated, regardless of whether its contents had actually changed or not.

The functionality provided by this patch opens the door for some really powerful behavior. Say, for instance, that I have a fairly standard blog view. Since I publish blog posts somewhat infrequently, I would only like to clear this view's cache when a new blog post is created, updated, or deleted.

To set up the view to cache indefinitely, click on the "Caching" settings in your view and select "Time-based" from the pop-up.

Then, in the Caching settings form that follows, set the length of time to "Custom" and enter "0" in the "Seconds" field. You can do the same for the "Rendered output" settings if you'd like to also cache the rendered output of the view.

Once you save your view, you should be all set.

Next, we need to manually invalidate the cached view whenever its content changes. There are a couple different ways to do this depending on what sort of content is included in the view (including both of the modules linked to above). In this case, I'll keep it lightweight and act on hooks in a custom module:

/**
* Implements hook_node_insert().
*/
function MY_MODULE_node_insert($node) {
  if ($node->type == 'blog') {
    cache_clear_all('blog:', 'cache_views', TRUE); 
  }
}...Same for hook_node_update() and hook_node_delete()...

And just like that, my view is only regenerated when it needs to be, and should be blazing fast in between.

The patch was committed to the 7.x-3.x branch of Views on March 31, 2012, so for now you will have to manually apply the patch until it is released in the next point release.

Happy caching!

Mar 01 2012
Mar 01

A client contacted us to assist them in finding a solution for slow page times for their site.

All the pages of the site were slow, and taking 2.9 to 3.3 seconds.

Upon investigation, we found that one view was responsible for most of that time.

However, the query execution itself was fast, around 11 ms.

But, the views rendering time was obscenely high: 2,603.48 ms!

So, when editing the view, you would see this at the bottom:

Query build time        2.07 ms
Query execute time     11.32 ms
View render time    2,603.48 ms

Since this view was on each page, in a block on the side bar, it was causing all the pages of the site to be slow.

The underlying reason was really bad coding in the views-view--viewname.tpl.php, which is too long to explain. But the gist of it is that the view returned several thousands rows of taxonomy terms, and was was supposed to render them in a tree. However, the actual view template just looped through the dataset and did not do much and displayed static HTML in the end!

The solution for this was quite simple: enable Views caching.

Global Caching

If most of your visitors to the site are anonymous (i.e. not logged in to Drupal), then it is simpler and better to use global caching.

To do this, go to the view's Defaults, then Basic settings, then Caching. Change to Time Based, then select at least 1 hour for each of Query results and Rendered output.

Block Caching

If your site has a significant portion of its visitors logged in, then you can be more granular in caching, per role, per user, ...etc.

To do this, go to the view's Block menu on the left, then Block settings, then Caching.

You can select various options, such as per role, or per user, depending on the specific block.

Remember that you have to do this for each block display, if you have multiple blocks for the same view.

Now, save the view, and you will see a positive impact on performance of your pages.

Jan 18 2012
Jan 18
Lullabot logo

Lullabot has trained thousands of Drupal developers & guided the development of some of the largest Drupal websites.

Dec 24 2011
Dec 24

http://www.watchseika.com/ The web and celebrities go jointly like Ga Ga along with glamour. From its beginnings, the Internet has played an integral role [url=http://www.watchseika.com/]腕時計 レディース[/url]in making in addition to breaking countless famous these people :, from politicians to celebrities to sports figures.

http://www.findwatchjp.com/ But the influence flows in both directions. since the high point of her party years, but together with your ex cohorts, Hilton and Winehouse and also Britney Spears, Lohan set the common for fast youth in the online age. Lohan's [url=http://www.findwatchjp.com/]ブルガリ 時計[/url] impact on the net may not become a good thing to get her professionally, but it has only helped a booming Online gossip business.

http://www.watchbasis.com/ Twitter has created on the list of strangest intersections on online, a junction where celebrities not only tweet the details in their daily lives for numerous regular [url=http://www.watchbasis.com/]腕時計 メンズ[/url]
folks to read, but occasionally read what the off the shelf folks are

http://www.cathkidstonbaggu.com/ Without the computer along with software innovation of mega developers like Jobs and Bill Gates, there would be zero thriving[url=http://www.cathkidstonbaggu.com/]キャスキッドソン 店舗[/url] computer infrastructure which we could use the internet. But Jobs' Apple travelled further, helping to revolutionize the way we live online.

Aug 25 2011
Aug 25

Note: I am hosting a BoF at DrupalCon London about this: Join us in Room 333 on Thursday 25th August from 11:00 - 12:00 (second half).

Introduction

Varnish is a fast, really fast reverse-proxy and a dream for every web developer. It transparently caches images, CSS / Javascript files and content pages, and delivers them blazingly fast without much CPU usage. On the other hand Apache - the most widely used webserver for providing web pages - can be a real bottleneck and take much from the CPU even for just serving static HTML pages.

So that sounds like the perfect solution for our old Drupal 6 site here, right? (Or our new Drupal 7 site.)

We just add Varnish and the site is fast (for anonymous users) ...

The perfect solution?

But if you do just that you'll be severely disappointed, because Varnish does not work with Drupal 6 out of the box and, even with Drupal 7, you can run into problems with contrib modules. You also need to install an extra Varnish Module and learn VCL. And if you have a module using $_SESSION for anonymous users, you have to debug this and find and fix it all, because if Varnish is seeing any cookie it will by default not cache the page. The reason for this is that Varnish can't know if the output is not different, which is actually true for the SESSION cookie in Drupal. (Logged in users see different content from logged out ones). That means that those pages are not cached at all and that is true for all pages on a stock (non pressflow) Drupal installation.

So Varnish is just for experts then? Okay, we go with just Boost then and forget about Varnish. Boost just takes a simple installation and some .htaccess changes to get up and running. And we'll just add more Apache machines to take the load. (10 machines should suffice - no?)

Not any longer! Worry no more: Here comes the ultimate drop-in Varnish configuration (based on the recent Lullabot configuration) that you can just add. With minimal changes, it'll work out of the box.

That means that if you have Boost running successfully and can change your Varnish configuration (and isntall varnish on some server), you can run Varnish, too.

How to Boost your site with Varnish

And here are the very simple steps to upgrade your site from Boost to Boosted Varnish.

1. Download Varnish configuration here: http://www.trellon.com/sites/default/files/boosted-varnish.vcl_.txt
2. Install and configure Boost (follow README.txt or see documentation on Boost project page)
3. Set Boost to aggressivly set its Boost cookie
4. Setup Apache to listen on port 8080
5. Setup Varnish to listen to port 80
6. Replace default.vcl with boosted-varnish.vcl

Now we need to tweak the configuration a little:

There is a field in Boost where you can configure pages that should not be boosted. We want to make sure those pages don't cache in Varnish either.

In Boost this will just be a list like:

user
user/*
my-special-page

In Varnish we have to translate this to a regexp. Find the line in the configuration to change it and do:

##### BOOST CONFIG: Change this to your needs
       # Boost rules from boost configuration
       if (!(req.url ~ "^/(user$|user/|my-special-page)")) {
         unset req.http.Cookie;
       }
##### END BOOST CONFIG: Change this to your needs

And thats it. Now Varnish will cache all boosted pages for at least one hour and work exactly like Boost - only much faster and much more scalable.

We had a site we worked on where we had a time of 4s for a page request under high load and brought this down to 0.17s.

The only caveat to be aware of here is that pages are cached for at least one hour, so there is an hour of delay until content appears for anonymous users. But this can be set to 5 min, too, and you'll still profit from the Varnish caching. In general this setting is similar to the Minimum Cache Lifetime setting found in Pressflow.

The code line to change in boosted-varnish.vcl is:

##### MINIMUM CACHE LIFETIME: Change this to your needs
    # Set how long Varnish will keep it
    set beresp.ttl = 1h;
##### END MINIMUM CACHE LIFETIME: Change this to your needs

Even 5 min of minimum caching time give tremendous scalability improvements.

Actually with this technique I can instantly make any site on the internet running Boost much much faster. I just set the backend to the IP, set the hostname in the VCL and my IP address will serve those pages. So you could even share one Varnish server instance for all of your pages and those of your friends, too. I did experiment with EC2 micro instances and it worked, but for any serious sites you should at least get a small one. I spare the details for another blog post though - if there is interest to explore this further.

How and Why it works

The idea of this configuration is quite simple.

Boost is a solution which works well with many many contrib modules out of the box. With Varnish you need to use Pressflow or Drupal 7 and you need to make sure no contrib modules are opening sessions needlessly, which can be quite a hassle to track down. (Checkout varnish_debug to make this task easier here: http://drupal.org/sandbox/Fabianx/1259074)

But Boost's behavior and rules can be emulated in Varnish, because if it is serving a static HTML page, it could serve also a static object out of the Varnish cache.

And the property that is distinguishing between boosted and non-boosted pages is the DRUPAL_UID cookie set by Boost.

The cookies (and such the anonymous SESSION) are removed whenever Boost would have been serving a static HTML page, which would mean that Drupal never got to see that Cookies in the first place, so we can safely remove them.

The second thing to prevent Drupal from needlessly creating session after session is a very simple rule:

If a SESSION was not sent to the webserver, do not send a SESSION to the client. If a SESSION was sent to the webserver, return the SESSION to the client. So SESSION cookies will only be set on pages that are excluded from caching in Varnish like the user/login pages or POST requests (E.g. forms). As Drupal has the pre-existing SESSION cookie, it does not need to create a new SESSION.

To summarize those rules in a logic scheme:

# Logic is:
#
# * Assume: A cookie is set (we add __varnish=1 to client request to make this always true)
# * If boosted URL -> unset cookie
# * If backend not healthy -> unset cookie
# * If graphic or CSS file -> unset cookie
#
# Backend response:
#
# * If no (SESSION) cookie was send in, don't allow a cookie to go out, because
#   this would overwrite the current SESSION.

Why Boost and Varnish?

Now the question that could come up is: If I have Varnish, why would I need Boost anymore?

Boost has some very advanced expiration characteristics, which can be used for creating a current permanent cache on disk of the pages on the site.

This can help pre-warm the varnish cache in case of a varnish restart. But as it turns out, you can use the stock .htaccess and boosted varnish will still work - as long as the DRUPAL_UID cookie is set. It might be possible as further work to just write a contrib module doing exactly that.

But Boost can also be really helpful in this special configuration as you can set your Varnish cache to a minimum lifetime of - for example - 10 min. And instead of Drupal being hit every 10 min, Apache is just happily serving the static HTML page Boost had created until it expires.

The advantage of that is:

If there are 1000 requests to your frontpage, Apache will just be hit once and then Varnish will serve this cached page to the 1000 clients. So instead of Apache having to serve 1000 page requests, it just have to serve one every 10 min. Multiply that with page assets like images and CSS and JS files and you get some big savings in traffic going to Apache.

Conclusion

Varnish is a great technology, but it has been difficult to configure and there are lots of caveats to think of (especially with Drupal 6). This blog post introduced a new technology called Boosted Varnish, which lets Varnish work with every page that is running the Boost Module by temporarily adding it to the active working set of varnish and fetching it frequently back from the permanent Boost cache on disk. The purpose is not for those that are already running high performance drupal sites with Mercury stack, but for those that are using Boost and want to make their site faster by adding Varnish in front of it without having to worry about Varnish specifics.

I created a sandbox project to create any issues related to the configuration on:

Have fun with the configuration and I am happy to hear from you or see you tomorrow at my BoF session at DrupalCon London!

AttachmentSize 10.74 KB
Aug 03 2010
Aug 03

On RVParking.com, we have a detailed directory listing page. It is complex CCK node themed, with an included Google map for location. Also included are two embedded Views. One for photos (a slideshow type photo with AJAX paging) and a View of Reviews (a review node type, with fivestar voting and attached pictures among other things. Both use node references to attach them to the park.

Our Problem: Cached pages with embedded views show stale view data.

The crux on the problem is that a park page would not be rebuilt when a new review or photo was posted. At first I thought it was the view that wasn’t being updated the new photos or reviews. After even turning off the view cache, the problem persisted.

So I looked higher up the cache ladder, and figured out it was the node cache that was the issue. Of course, how would the park node cache know to expire its cache when a review or photo with a reference to this park had been added or updated. But we can tell it to using the nodeapi_hook().

function rvparks_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  switch ($op) {
     case 'insert':
     case 'update':
       if ($node->type == 'review' OR $node->type == 'park_photo') {
       $url = url('node/'. $node->field_park_reference[0]['nid'], array('absolute' => TRUE));
       // delete cache entries for that url
       cache_clear_all($url, 'cache_page');
     }
   }
}

This code fragment executes during either the insert (node create) or update (node edit) actions. It checks to see if either a photo or review node has been created. If so, then it form the full URL for the node that is in the node reference field. The Drupal caches uses the full URL to create the cache references, thus the ‘absolute’ => TRUE is needed.

The cache_clear_all() does all the work to delete the cache entry.

Now, when the RV park page is viewed after a new review is posted, the whole node cache is recreated. Problem solved.

The next step in making our site faster will be to add Boost. I am pretty sure this method will still work under Boost. About to find out.

Apr 12 2010
Apr 12

I am looking for confirmations from other Drupal developers regarding details and corroborations. Comments are welcome here. PHBs need not worry, your Drupal site is just fine.

This post is about an inherent problem with Google’s recently announced “Speed-as-a-ranking-feature” and its problems with content-management systems like Drupal and Wordpress. For an auto-generated website, Google is often the first and only visitor to a lot of pages. Since Drupal spends a lot of time in the first render of the page, Google will likely see this delay. This is both due to a problem with how Drupal generates pages, and Google’s metric.

Google recently announced that as a part of it’s quest to making the web a faster place, it will penalize slow websites in its ranking:

today we’re including a new signal in our search ranking algorithms: site speed. Site speed reflects how quickly a website responds to web requests.

Since Google’s nice enough to provide webmaster tools, I looked up how my site was doing, and got this disappointing set of numbers:

Screen shot 2010-04-11 at 10.35.31 PM

I’m aware 3 seconds is too long. Other Drupal folks have reported ~600ms averages. My current site does under 1s second on average based on my measurements. This is probably because I occasionally have some funky experiments going on in some parts of the site that run expensive queries. Still, some other results were surprising:

Investigating further, it looks like there are 3 problems:

Screen shot 2010-04-11 at 10.49.44 PM

DNS issues & Multiple CSS: Since Google Analytics is on a large number of websites, so I’m expecting their DNS to be prefetched. CSS is not an issue since the 2 files are client media specific(print / screen).

GZip Compression: Now this is very odd. I’m pretty sure I have gzip compression enabled in Drupal (Admin > Performance > Compression). Why is Google reporting lack of compression? To check, I ran some tests, and discovered that since Google usually sees the page before it’s cached, it’s getting a non-gzipped version. This happens due to the way Drupal’s cache behaves, and is fixable. Ordinarily, this is a small problem, since uncached pages are rendered for only the first visitor. But since Google is the first visitor to a majority of the pages in a less popular site, it thinks the entire site is uncompressed. I’ve started a bug report for the uncached page gzip problem.

A flawed metric: The other problem is that Drupal (and Wordpress etc) use a fry model ; pages are generated on the fly per request. On the other hand, Movable Type, etc., bake their pages beforehand, so anything served up doesn’t go through the CMS. Caching in fry-based systems is typically done on the first-render, i.e. the first visit to a page is generated from scratch and written to the database/filesystem, any successive visitor to that page will see a render from the cache.

Since the Googlebot is usually the first (and only) visitor to many pages in a small site, the average crawl would hit a large number of pages where Drupal is writing things to cache for the next visitor. This means every page Googlebot visits costs a write to the database. While afaik Drupal runs page_set_cache after rendering the entire page and hence the user experience is snappy, I’m assuming Google counts time to connection close and not the closing </html> tag, resulting in a bad rendering time evaluation.

This means that Google’s Site Speed is not representative of the average user(i.e. second, third, fourth etc visitors that read from the cache), it only represents the absolute worst case situation for the website, which is hardly a fair metric. (Note that this is based on my speculation of what Site Speed means, based on the existing documentation.)

Jan 07 2009
Jan 07

So I just submitted my first core patch, followed quick by several revisions. It felt as good as I was told it would. Still tingling.

Anyways, this is an issue that's got to be resolved--it's been stuck for months on one particular problem, which I don't think should be a problem at all. It's very problematic for developers of large scale sites to be unable to adjust the expiration sent by Drupal to the client. My goal in this patch is to give developers this ability and intentionally not address the issue--which, again, has delayed this patch for months--of how reverse proxies are going to deal with it. That's not Drupal's job--it's the job of whoever is connecting Drupal with the reverse proxy, and any attempt to solve this on the Drupal Core level will require not using PHP Sessions for users that aren't authenticated. Turn the page for the proof.

Consider this:

  1. Drupal can't discard sessions for anonymous users. How would shopping carts work?
  2. If we have the session cookie, we can't use Vary: Cookie in the response header (established in issue thread).
  3. If we can't use Vary: Cookie, we can't control how proxies will deal with the situation where users log in after browsing the site and accumulating pages in their cache.
  4. But individual developers controlling reverse proxies can configure an individual proxy to *not* forward on caching policies to individual browsers or forward proxies, or they can configure their site to redirect authenticated users to a different domain (e.g. www2.mysite.com). Either method will work, and neither can be achieved by any core patch, no matter how transcendentally amazing.
  5. But that requires sending proper caching headers from Drupal.
  6. But that requires a core patch.

The only difference between my patch and kbahey's original patch is that mine implements a hook to which allows modules to control caching on a per-URL level and his implements this ability via a system setting on a site-wide level. I think it's better and more powerful to do it per-URL. For instance, I want to proxy blocks (e.g., top stories) with 300 second caching using AHAH--that's a contrib module I'm building right now for the new Observer.com, and is why I patched drupal_page_header as well as drupal_page_cache_header. I also want to serve recent articles with 600 second caching, and month-old articles with 86400 second caching. I also need to implement the ability to purge content from the CDN on hook_nodeapi, so I'm writing a module anyways--so will most users of CDNs. One caching setting won't work for a site like Observer.com.

But that isn't actually the cross I'm bearing. I care more about getting this issue unstuck: it shouldn't be Drupal's problem to assume anything about the existence of proxies between the server and the client, because Drupal isn't in a position to solve the problem. But that's not a good enough reason to continue this delay.

Thoughts?

Apr 24 2008
Apr 24

Yes, this is yet another rant about how people incorrectly dismiss state-of-art databases. (Famous people have done it, why shouldn’t I?) It’s amazing how much the Web 2.0 crowd abhors relational databases. Some people have declared real SQL-based databases dead, while some have proclaimed them to be as not cool any more. Amazon’s SimpleDB, Google’s BigTable and Apache’s CouchDB are trendy, bloggable ideas that to be honest, are ideal for very specific, specialized scenarios. Most of the other use cases, and that comprises 95 out of a 100 web startups can do just fine with a memcached + Postgres setup, but there seems to be a constant attitude of “nooooo if we don’t write our code like google they will never buy us…!” that just doesn’t seem to go away, spreading like a malignant cancer throughout the web development community. The constant argument is “scaling to thousands of machines”, and “machines are cheap”. What about the argument “I just spent an entire day implementing the equivalent of a join and group by using my glorified key-value-pair library”? And what about the mantra “smaller code that does more”?

Jon Holland (who shares his name with the father of genetic algorithms) performs a simple analysis which points out a probable cause: People are just too stupid to properly use declarative query languages, and hence would rather roll their own reinvention of the data management wheel, congratulating themselves on having solved the “scaling” problem because their code is ten times simpler. It’s also a hundred times less useful, but that fact is quickly shoved under the rug.

It’s not that all Web-related / Open Source code is terrible. If you look at Drupal code, you’ll notice the amount of sane coding that goes on inside the system. JOINs used where needed, caching / throttling assumed as part of core, and the schema allows for flexibility to do fun stuff. (Not to say I don’t have a bone to pick with Drupal core devs; the whole “views” and “workflow” ideas are soon going to snowball into the reinvention of Postgres’s ADTs; all written in PHP running on top of a database layer abstracted Postgres setup.)

If Drupal can do this, why can’t everyone else? Dear Web 2.0, I have a humble request. Pick up the Cow book if you have access to a library, or attend a database course in your school. I don’t care if you use an RDBMS after that, but at least you’ll reinvent the whole thing in a proper way.

Jan 10 2008
Jan 10

OK, I'm not sure if you actually want to do this, but I thought I would try to see if you can do it. Setting up the Boost module to create static html pages of anonymous Drupal page views, then serving them from a completely different machine. It's kind of nice to have a way of having a bit of a static 'mirror' for your content, but I'm guessing squid is better at all this.

So how?

First of all set up Boost on the Drupal site. It was completely straightforward - just follow the readme.txt. Running on default settings it is generating .html files and sym-links in the cache directory. These will be transfered, along with the files directory, regularly using rsync.

Rsync is configured in /etc/rsync.conf:

  max connections = 2
  log file = /var/log/rsync.log
  timeout = 300
       
  [example-cache]
  comment = Example static html files
  path = /var/www/example.com/drupal-5/cache/example.com
  read only = yes
  list = no
  uid = nobody
  gid = nogroup

  [example-files]
  comment = Example files
  path = /var/www/example.com/drupal-5.5/files/example.com
  read only = yes
  list = no
  uid = nobody
  gid = nogroup

The other static files that want to be served are all the .css, misc images and such like. I rolled these up in a tarball - in the DocumentRoot:

find . -regex '.*\(\.js\|\.css\|\.gif\|\.png\)' | xargs tar -zcvf static_files.tar.gz

Next over to the 'mirror'. First I copied the tarball and unpacked it in the 'mirrors' document root. I also copied over the boost .htaccess file and put this there too.

Then rsync'ed the cache and files directories:

/usr/bin/rsync -avz --delete rsync://drupal.server.example.com/example-cache /var/www/example.mirror/cache/example.com/
/usr/bin/rsync -avz --delete rsync://drupal.server.example.com/example-files /var/www/html/example.mirror/files/example.com

As this worked, swapping v for q (verbose for quiet) they were put in crontab to run regularly.

Finally setting up a standard vhost, that will serve the static content if it has it and the user is anonymous, or proxy to the server if not. The boost .htaccess file works for finding the static content if it's there, and overrides the vhost, so just this was added for other content:

Directory /var/www/example.com/>
    AllowOverride all
Directory>

ProxyRequests Off

<Proxy *>
  Order deny,allow
  Allow from all
</Proxy>

ProxyPass /index.php http://drupal.server.example.com/index.php
ProxyPassReverse /index.php http://drupal.server.example.com/index.php

And it works!

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web