Feb 25 2015
Feb 25

Drupal 8 comes with many improvements over its predecessor we have grown to both love and hate. Next to prominent systems such as Views in core, configuration management or a useful translation service, there are also less known changes but that are equally important to know and use. One such improvement has been the cache API that solves many performance problems we have in Drupal 7.

drupal8wide

In this article, I want to shine a bit of light over the new cache API. To this end, we are going to look at how we can use it in our custom modules as we are encouraged to do so much more in Drupal 8.

Additionally, I have prepared a little demonstration in the shape of a module you can install for testing the impact of the cache API. It’s a simple page that in its rendering logic makes an external API call (to a dummy JSON endpoint) and caches its results. The page then displays the actual time it takes for this to happen, contrasting the external call time vs. the cached version time.

The new cache API

Cache concept.

Bins

The new cache API (with the default DatabaseBackend storage) is stored in multiple bins which map to tables that start with the prefix cache_. When interacting with the cache, we always start by requesting a cache bin:

$cache = \Drupal::cache();

Where $cache will be an instance of the DatabaseBackend object that represents the default bin (cache_default). To request a particular bin we pass in the name in the constructor:

$render_cache = \Drupal::cache('render');

Where $render_cache will represent the render cache bin (which is new in Drupal 8 and is supposed to improve render performance across the board).

As you can see, we are requesting the cache service statically using the \Drupal class. If we are working inside classes, it is best practice to inject the service from the container. You can do so by specifying as an argument to your service the relevant cache bin service (such as cache.default). Here you can get a list of all core services including the ones related to cache.

But for the sake of brevity, we will use it statically here.

Retrieving cached items

Once we know which bin we want to work with (for custom modules this will usually be the default bin), we can retrieve and store cache items.

$cache = \Drupal::cache()->get('my_value');

It’s that simple. $cache will be a stdClass object containing some metadata about the cache item plus the actual data available under the $cache->data property. The my_value parameter is the cache ID.

An important thing to keep in mind is that using the get() method without a second parameter will not return the cache item if it has been invalidated (either programatically or through expiration). Passing the boolean true as a second parameter will force it to return the data.

Storing cache items

Although storing new items in the cache is just as easy as retrieving them, we have more options when doing so. To store an item we use the set() method (instead of get() like before), a method that takes 2 mandatory parameters and 2 optional ones:

  • the cache ID (the string by which we can later reference the item)
  • the data (a PHP value such as a string, array or object that gets serialised automatically and stored in the table – should not be over 1MB in size)
  • the expiration time (a timestamp in the future when this cache item will automatically become invalid or -1 which basically means this item never expires. It is best practice to use the Drupal\Core\Cache\CacheBackendInterface::CACHE_PERMANENT constant to represent this value)
  • tags (an array of cache tags this item can be later identified by)

As an example:

Drupal::cache()->set('my_value', $my_object, CacheBackendInterface::CACHE_PERMANENT, array('my_first_tag', 'my_second_tag'));

This will set a permanent cache item tagged with 2 tags and store a serialised version of $my_object as the data.

Cache invalidation and removal

Cache invalidation means that the respective items are no longer fresh and are unreliable in terms of what data they hold. They will be removed at the next garbage collection which can also be called using the garbageCollection() method on the CacheBackend object.

As mentioned above, when storing a cache item we can specify an expiration time. When this time lapses, the cache item becomes invalid but still exists in the bin and can be retrieved. However, we can also invalidate items manually using the invalidate(), invalidateMultiple() or invalidateAll() methods on the CacheBackend object.

Removing items altogether can be done using the delete(), deleteMultiple() or deleteAll() methods. These actions also happen only on the bin the CacheBackend is wrapping and completely remove the respective table records.

Cache tags

Another cool new feature of the Cache API in Drupal 8 are the cache tags (the fourth parameter in the setter method). The role of the tags is to identify cache items across multiple bins for proper invalidation. The purpose is the ability to accurately target multiple cache items that contain data about the same object, page, etc. For example, nodes can appear both on a page and in a view (stored in different cache items in different bins but both tagged with the same node:nid formatted tag). This allows invalidating both cache items when changes happen to that node without having to know the cache ids.

To manually invalidate caches using the tags, we can use the invalidateTags() method statically on the \Drupal\Core\Cache\Cache class:

\Drupal\Core\Cache\Cache::invalidateTags(array('node:5', 'my_tag'));

This will call the cache invalidator service and invalidate all the cache items tagged with node:5 and my_tag.

Additionally, for Drupal entities we don’t have to create our own tags but can retrieve them from the entity system:

  • \Drupal\Core\Entity\EntityInterface::getCacheTags()
  • \Drupal\Core\Entity\EntityTypeInterface::getListCacheTags()

This keeps the tags for Drupal entities consistent across the board.

Demonstrating the cache API

As I mentioned before, I created a small module that allows us to see the benefits of caching data. You can find the module in this git repository but here is the crux of it:

Please note that in this example I access the cache backend service statically to save some space. For a dependency injection approach (the correct approach), take a look at the repository code.

A route file that adds a new route to the /cache-demo path:

cache_demo_page:
  path: 'cache-demo'
  defaults:
    _controller: '\Drupal\cache_demo\Controller\CacheDemoController::index'
    _title: 'Cache demo'
  requirements:
    _permission: 'access content'

And the controller class that returns the page inside src/Controller/CacheDemoController.php:

<?php

/**
 * @file
 * Contains \Drupal\cache_demo\Controller\CacheDemoController.
 */

namespace Drupal\cache_demo\Controller;

use Drupal\Core\Cache\CacheBackendInterface;
use Drupal\Core\Controller\ControllerBase;
use Drupal\Core\Url;
use \GuzzleHttp\Client;

/**
 * Cache demo main page.
 */
class CacheDemoController extends ControllerBase {

  public function index(Request $request) {
    $output = array();

    $clear = $request->query->get('clear');
    if ($clear) {
      $this->clearPosts();
    }

    if (!$clear) {
      $start_time = microtime(TRUE);
      $data = $this->loadPosts();
      $end_time = microtime(TRUE);

      $duration = $end_time - $start_time;
      $reload = $data['means'] == 'API' ? 'Reload the page to retrieve the posts from cache and see the difference.' : '';
      $output['duration'] = array(
        '#type' => 'markup',
        '#prefix' => '<div>',
        '#suffix' => '</div>',
        '#markup' => t('The duration for loading the posts has been @duration ms using the @means. @reload',
          array(
            '@duration' => number_format($duration * 1000, 2),
            '@means' => $data['means'],
            '@reload' => $reload
          )),
      );
    }

    if ($cache = \Drupal::cache()->get('cache_demo_posts') && $data['means'] == 'cache') {
      $url = new Url('cache_demo_page', array(), array('query' => array('clear' => true)));
      $output['clear'] = array(
        '#type' => 'markup',
        '#markup' => $this->l('Clear the cache and try again', $url),
      );
    }

    if (!$cache = \Drupal::cache()->get('cache_demo_posts')) {
      $url = new Url('cache_demo_page');
      $output['populate'] = array(
        '#type' => 'markup',
        '#markup' => $this->l('Try loading again to query the API and re-populate the cache', $url),
      );
    }

    return $output;
  }

  /**
   * Loads a bunch of dummy posts from cache or API
   * @return array
   */
  private function loadPosts() {
    if ($cache = \Drupal::cache()->get('cache_demo_posts')) {
      return array(
        'data' => $cache->data,
        'means' => 'cache',
      );
    }
    else {
      $guzzle = new Client();
      $response = $guzzle->get('http://jsonplaceholder.typicode.com/posts');
      $posts = $response->json();
      \Drupal::cache()->set('cache_demo_posts', $posts, CacheBackendInterface::CACHE_PERMANENT);
      return array(
        'data' => $posts,
        'means' => 'API',
      );
    }
  }

  /**
   * Clears the posts from the cache.
   */
  function clearPosts() {
    if ($cache = \Drupal::cache()->get('cache_demo_posts')) {
      \Drupal::cache()->delete('cache_demo_posts');
      drupal_set_message('Posts have been removed from cache.', 'status');
    }
    else {
      drupal_set_message('No posts in cache.', 'error');
    }
  }

}

Inside the index() method we do a quick check to see whether the clear query parameter is present in the url and call the clearPosts() method responsible for deleting the cache item. If there isn’t one, we calculate how long it takes for the loadPosts() method to return its value (which can be either the posts from the cache or from the API). We use Guzzle to make the API call and when we do, we also store the results directly. Then we just output the duration of the call in milliseconds and print 2 different links depending on whether there is cache stored or not (to allow us to clear the cache item and run the API call again).

When you navigate to cache-demo for the first time, the API call gets made and the 100 posts get stored in the cache. You can then reload the page to see how long it takes for those posts to be retrieved from the cache. Upon doing that, you’ll have a link to clear the cache (by a page refresh with the clear query string) followed by another link which refreshes the page without the clear query string and that in turn makes the API call again. And on like that to test the contrast in duration.

Conclusion

In this article we’ve looked at how easy it is to use the Cache API in Drupal 8. There are some very simple class methods that we can use to manage cache items and it has become too straightforward for us not to start using it in our custom modules. I encourage you to check it out, play around with the API and see for yourself how easy it is to use.

Nov 15 2013
Nov 15

Cache invalidation is known as one of the very few hard things in computer science.

It seems to be a common misconception that Drupal's cache_get checks whether a given cache entry has expired, and won't return a stale result. In fact, in Drupal this is not always the case.

The docs for both D6 and D7 actually say that if a specific timestamp is given as the $expire parameter in a cache_set, that this "Indicates that the item should be kept at least until the given time, after which it behaves like CACHE_TEMPORARY.". [D6/D7]

So this does not say that cache entries will expire (i.e. cache_get will not return them) after this timestamp has passed; rather it says that "the item should be removed at the next general cache wipe."

What this actually means is that it's the responsibility of the code which does a cache_get to check whether any object that it gets back is still valid in terms of the time it should expire.

So, if you want to use Drupal's cache system in D6 or D7 to store a value for a short amount of time, but not wait for the cache entry to be cleared until "the next general cache wipe", you must check the expire timestamp on any cache object that you receive back from a cache_get.

Here's a little php script which illustrates this; we still get a cache object back even although it has expired:

<?php
 
define('TEST_CACHE_LIFETIME', 10); // seconds
if (!defined('REQUEST_TIME')) {
  // REQUEST_TIME is in D7 but not D6
  define('REQUEST_TIME', time());
}
print "\n###\nrunning cache test at " . REQUEST_TIME . "\n";
 
$reset_cache = FALSE;
if($cached = cache_get('test_cache_expiry', 'cache'))  {
  print 'this came from cache: ' . print_r($cached, TRUE);
  if ($cached->expire < REQUEST_TIME) {
    $reset_cache = TRUE;
    print "cached data has expired; resetting\n";
  }
}
else {
  $reset_cache = TRUE;
}
 
if ($reset_cache) {
  print 'setting this to cache: ' . ($data = md5(rand())) . "\n";
  cache_set('test_cache_expiry', $data, 'cache', REQUEST_TIME + TEST_CACHE_LIFETIME);
}

...and here's what happens if we run it a few times in quick succession:

$ for i in {1..8}; do drush scr cache_test.php; sleep 3; done
 
###
running cache test at 1384557409
setting this to cache: 5d9f014b374764e35220ead02102b1e7
 
###
running cache test at 1384557412
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => 5d9f014b374764e35220ead02102b1e7
    [created] => 1384557409
    [expire] => 1384557419
    [serialized] => 0
)
 
###
running cache test at 1384557416
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => 5d9f014b374764e35220ead02102b1e7
    [created] => 1384557409
    [expire] => 1384557419
    [serialized] => 0
)
 
###
running cache test at 1384557419
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => 5d9f014b374764e35220ead02102b1e7
    [created] => 1384557409
    [expire] => 1384557419
    [serialized] => 0
)
 
###
running cache test at 1384557422
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => 5d9f014b374764e35220ead02102b1e7
    [created] => 1384557409
    [expire] => 1384557419
    [serialized] => 0
)
cached data has expired; resetting
setting this to cache: a57b9e9734824207e0aa6d4d6a4b6973
 
###
running cache test at 1384557426
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => a57b9e9734824207e0aa6d4d6a4b6973
    [created] => 1384557422
    [expire] => 1384557432
    [serialized] => 0
)
 
###
running cache test at 1384557429
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => a57b9e9734824207e0aa6d4d6a4b6973
    [created] => 1384557422
    [expire] => 1384557432
    [serialized] => 0
)
 
###
running cache test at 1384557433
this came from cache: stdClass Object
(
    [cid] => test_cache_expiry
    [data] => a57b9e9734824207e0aa6d4d6a4b6973
    [created] => 1384557422
    [expire] => 1384557432
    [serialized] => 0
)
cached data has expired; resetting
setting this to cache: abbe82035a1bcaea187259f316f04309

Note that not all cache backends work the same - memcache doesn't seem to return cache entries after their expire timestamp has passed, for example.

We should assume, however, that we might well get a cache object which has expired back from cache_get, so we should always check the expire property before assuming that the cache entry is valid

See https://drupal.org/node/534092 for some discussion as to whether this is a bug or a feature.

Oct 08 2012
Oct 08

Posted Oct 8, 2012 // 0 comments

In the Drupal community, you see caching discussions related to pages, blocks, reverse-proxies, opcodes, and everything in between. These are often tied to render- and database-intensive optimizations to decrease the load on a server and increase throughput. However, there is another form of caching that can have a huge impact on your site’s performance – module level data caching. This article explores Drupal 7 core caching mechanisms that modules can take advantage of.

When?

Not all modules require data caching, and in some cases due to “real-time” requirements it might not be an option. However, here are some questions to ask yourself to determine if module-level data caching can help you out:

  • Does the module make queries to an external data provider (e.g. web service API) that returns large datasets?
  • If the module pulls data from an external source, is it a slow or unreliable connection?
  • If calling a web service, are there limits to the number of calls the module can make (hourly, daily, monthly, etc.)? Also, if it is a pay service, is it a variable cost based on number of calls?
  • Does the hosting provider have penalties for large amounts of inbound data?
  • Does the data my module handles require significant processing (e.g. heavy XML parsing)?
  • Is the data the module loads from an external source relatively stable and not change rapidly?

If you answered, “yes,” to more than a third of the questions above, module-level data caching can probably help your module’s performance by providing the following features:

  • Decrease external bandwidth
  • Decrease page load times
  • Reduce load on the site’s server
  • Provide reliable data services

Where?

OK, so you’ve decided your module could probably benefit from some form of module-level data caching. The next thing to determine is where to store it. You can always use some form of file-based caching, but to implement that with the proper abstractions to run on a variety of servers requires calls through the Drupal core File APIs, which can be a bit convoluted at times. File-based caching mechanisms also cannot take advantage of scalable performance solutions like memcache or multiple database server configurations that might be changed at any time.

Luckily, Drupal core provides a cache mechanism available to any module using the cache_get and cache_set functions, fully documented on http://api.drupal.org:

<?php
cache_get
($cid, $bin = 'cache')
cache_set($cid, $data, $bin = 'cache', $expire = CACHE_PERMANENT)
?>

By default, these functions will work with the core cache bin called simply “cache.” This is the main dumping ground for Drupal core for data that can persist in the system for a length of time beyond the one page call, and are not tied to a session. However, many modules define their own cache bins so they can provide their own cache management processes. A few core module ones are:

  • cache_block
  • cache_field
  • cache_filter
  • cache_form
  • cache_menu
  • cache_page

Seeing as how several core Drupal modules implement their own cache bins, the next questions for your new module are:

  • Does the module need to manage its cache in a manner that is not consistent with the main cache bin?
  • Will its cache need to be flushed independently of the main cache at any time, or have some other expiration logic assigned to it that falls outside of the core cron cache clear calls?

If the answer to either of these questions is, “yes,” then a dedicated cache bin is probably a wise idea.

Cache bin management is abstracted in the Drupal system via classes implementing DrupalCacheInterface. The core codebase provides a default database-driven cache mechanism via DrupalDatabaseCache that is used for any cache bin type that has not been overridden with a custom class (see the documentation on DrupalCacheInterface for details on how to do that) and has a table in the database named the same as the bin. This table conforms to the same schema as the core cache tables. For reference, this is the core cache table schema in MySQL that we will use as the base for our module’s cache bin:

+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| cid        | varchar(255) | NO   | PRI |         |       |
| data       | longblob     | YES  |     | NULL    |       |
| expire     | int(11)      | NO   | MUL | 0       |       |
| created    | int(11)      | NO   |     | 0       |       |
| serialized | smallint(6)  | NO   |     | 0       |       |
+------------+--------------+------+-----+---------+-------+

How?

For the sake of simplicity, we will assume that our module is fine with using the default cache mechanism and database schema. As an exercise, we will also assume that we meet the criteria for defining our own cache bin so we can explore all the hooks required to implement a complete custom bin leveraging the default cache implementation. The sample module is called cachemod, and the cache bin name is cache_cachemod.

Define the cache bin schema

In order to add a table with the correct schema to the system, we borrow from some code found in the block module that copies the schema from the core cache table and add this to our install hooks in cachemod.install:

<?php
/**
* Implements hook_schema
*/
function cachemod_schema() {
 
// Create new cache table using core cache schema
 
$schema['cache_cachemod'] = drupal_get_schema_unprocessed('system', 'cache');
 
$schema['cache_cachemod']['description'] = 'Cache bin for the cachemod module';  return $schema;
}
?>

Now that we have defined a table for our cache bin that replicates the schema of the core cache table, we can make basic set and get calls using the following:

<?php
cache_get
($cid, 'cache_cachemod');
cache_set($cid, $data, 'cache_cachemod');
?>

Using our new cache bin

Notice the CID (cache ID) parameter. This will need to be unique to the data being stored, so in the case of something like a web service, the CID might be built from the arguments being passed to the service and the data will be the returned data. One way to abstract this so you get consistent CID values for calls to cache_get and cache_set is to build a helper function. This sample assumes our service call takes an array of key-value pairs:

<?php
/**
* Util function to generate cid from service call args
*/
function _cachemod_cid($args) {
 
// Make sure we have a valid set of args
 
if (empty($args)) {
    return
NULL;
  } 
// Make sure we are consistently operating on an array
 
If (!is_array($args)) {
   
$args = array($args);
  } 
// Sort the array by key, serialize it, and calc the hash
 
ksort($args);
 
$cid = md5(serialize($args));
  return
$cid;
}
?>

Now we can implement a basic public web service function leveraging our cache like this:

<?php
/**
* Public function to execute web service call
*/
function cachemod_call($args) {
 
// Create our cid from args
 
$cid = _cachemod_cid($args);  // See if we have cached data already
 
$data = cache_get($cid, 'cache_cachemod')
  if (!
$data) {
   
// No such luck, go try to pull it from the web service
   
$data = _cachemod_call_service($args);
    if (
$data) {
     
// Great, we have data!  Store it off in the cache
     
cache_set($cid, $data, 'cache_cachemod');
    }
  }  return
$data;
}
?>

Note that there are several values for the optional expire parameter to the cache_set call that are fully documented in the API docs.

Hooking into the core cache management system

If you want your module’s cache bin to clear out when Drupal executes a cache wipe during cron runs or a general cache_clear_all, set the expire parameter in your cache_set call above to either CACHE_TEMPORARY or a Unix timestamp to expire after, and add the following hook to your module:

<?php
/**
* Implements hook_flush_caches
*/
function cachemod_flush_caches() {
 
$bins = array('cache_cachemod');
  return
$bins;
}
?>

This will add your cache bin to the list of bins that Drupal’s cron task will empty.

Additionally, if you would like to add your cache bin to the list of caches that drush can selectively clear, add the following to your module in a file named cachemod.drush.inc:

<?php
// Implements hook_drush_cache_clear
function cachemod_drush_cache_clear(&$types) {
 
$types['cachemod'] = '_cachemod_cache_clear';
}
// Util function to clear the cachemod bin
function _cachemod_cache_clear() {
 
cache_clear_all('*', 'cache_cachemod', true);
}
?>

Note that if you set the expiration of the cache item to CACHE_PERMANENT (the default), only an explicit call to cache_clear_all with the item’s CID will remove it from the cache.

Conclusion

Sometimes it makes sense to have a module cache data for its own use, and even possibly in its own cache bin to maintain a finer-grained control of the data and cache management if something beyond the core cache management is required. Utilizing the cache abstraction built into Drupal 7 core and some custom classes, hooks, and drush callbacks can give your module a range of options for reducing data calls, processing overhead, and bandwidth consumption. For more detailed info, check out the API pages at http://api.drupal.org for the functions, classes and hooks mentioned above.

As a Senior Developer at Phase2, Robert Bates is able to pursue his interests in solving complex multi-tier integration challenges with elegant solutions. He has experience not only in traditional web programming languages such as PHP and ...

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web