Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
Oct 07 2020
Oct 07


Here’s a dirty secret: most businesses are unsatisfied with their website. Research shows that 34% of website owners are unsatisfied with the amount of business their website generates for them. Loudhouse data suggests that 62% of business owners believe a more effective website would increase their sales. And millions of business websites deal with slow load times, inconsistent customer experiences, and problematic UI/UX issues.

There’s a reason that 36% of small businesses STILL don’t have a website. Creating an amazing, design-driven, customer-centric website is challenging. So, what do you do when your website isn’t making the cut? You look towards the source — your Content Management System (CMS). Every year, thousands of private and public entities migrate their website to a new CMS.

But, unfortunately, thousands more don’t. Migration is scary. It’s easier to stay with your current CMS and focus on redesigns or new templates. Here’s the problem: new coats of paint don’t fix broken engines. If you’re thinking about migrating from WordPress or Joomla to Drupal, you’ve probably heard rumors and myths regarding migrations.

Let’s clear those up. Here are 4 myths about migration that need to be squashed.

Myth #1: I’m Going to Lose All My Content/Data

This is, by far, the most common excuse against migrating. You’re worried all of that precious content and data are going to fall off the ship if you switch ports. And, you’re right to worry. It could… if you don’t migrate correctly. But it’s not inevitable. You can prevent data and content loss. In fact, if you lose data or content, we would consider that a failed migration. In other words, successful migrations keep data and content intact by definition.

Here are some handy-dandy steps you can take to ensure that your precious data doesn’t go overboard during your migration:

  • Crawl your site before migration and use the crawl data to check for URL issues. If you check each URL, you should be able to see any missing content (and fix it!)
  • Keep your existing site stable until you’ve fully migrated.
  • When you migrate, check for duplicate content; plenty of site owners run into the opposite of losing content.

Myth #2: I Have to Invest in a Redesign

You’re migrating; you might as well invest in a redesign, right? Sure! You could. But it’s tricky. When you do a redesign and a migration, you’re no longer just matching URL-to-URL and content-to-content, you’re simultaneously rebuilding your website. Don’t get us wrong; there are advantages. It’s a great time to redesign from an SEO perspective (you’re already going to take a small hit during the migration; more on this in the next section), but it also requires significantly more planning, budget, and time.

If you want to do a redesign-migration, we heavily recommend that you touch base with your design company. You want to work through the kinks and create a best-in-class action plan to tackle any issues that may (or may not) pop up. The entire migration will be structured around the redesign, so it’s important to carefully weigh your options.

Myth #3: Goodbye SEO!

From an SEO perspective, migration sounds like a nightmare. You’ve worked diligently to build up your SEO. What happens when you frolic to a new location? Let’s get this out of the way: your SEO will take a temporary hit. But, it shouldn’t last long. In fact, there’s a good chance you’re moving to another platform because it’s better at handling SEO. For example, Drupal has built-in SEO capabilities (e.g., title-based URL nodes, customizable meta tags, etc.) WordPress does not. Obviously, you can get SEO plugins for WordPress that help you build SEO functionalities, but most of those plugins are also available for Drupal — so Drupal gives you a net gain.

Here’s a secret: migration can help your page rank. After the first awkward week (Google has to recrawl your website, recognize that it’s you, and give you back your ranking), migration can help you build a more powerful SEO framework.

Want to migrate without dumping your SEO overboard? Here are some tips:

  • Update your internal links
  • Benchmark your Google Analytics profile and compare it with your analytics post-migration to look for gaps
  • Keep any old domains and redirect your website
  • Check for broken or duplicate content that could tank your SEO
  • Manage your sitemaps
  • Update any PPC campaigns and ad creatives

Myth #4: You Just Have to “Lift-and-Shift”

There are plenty of myths surrounding the difficulty of migration. But there are also a few myths making migration out to be super easy. And, without a doubt, the most prevalent “easy-peasy-lemon-squeezy” migration myth is the ever-coveted “lift-and-shift.” There is no one-size-fits-all strategy for migrating websites. Sometimes, it can be as easy as lifting content off of one website and putting it onto another website. But that’s seldom the case.

Generally, you need to set up test servers, check to see if website elements function correctly on the new platform, test out and utilize new CMS features, and a variety of other tasks before you can simply drop content from one place to another. In other words, lift-and-shift may work when you’re migrating a cloud environment, but it often doesn’t work with CMS migration.

Remember, just because everything worked perfectly in one environment doesn’t mean it will in another one. You may have to fix some website elements and carefully construct your new website ecosystem. At the same time, you’ll probably be playing around with the new features available to you on Drupal — so the “lift-and-shift” is usually more of a “lift-and-test-and-shift.”

Do You Need Help With Your Drupal Migration?

At Mobomo, we help private and public entities migrate to Drupal environments using proven migration strategies and best-in-class support. So, whether you’re looking to establish your website in a more secure, SEO-friendly environment or you’re looking to do a redesign-and-migrate, we can help you migrate pain-free. Are you ready to move to a brighter future?

Contact us. We’ve got your back.

Nov 28 2015
Nov 28

logo_small_sx_1logo_small_sx_1 The keynote at DrupalCamp Bulgaria was planned to be left field from the get go, however it went a little further out after Paris came under attack on the night of the 13th of November 2015.

#JeSuiBaghdad #JeSuiParis #JeSuiBeirut #JeSuiChibok #JeSuiKarachi #JeSuiMadrid #JeSuiDamascus #JeSuiAnkara #JeSuiLondon Dalai Lama Quote 2015Dalai Lama Quote 2015 #JeSuiMali the list goes on, but other than on the bench solidarity what are we doing as individuals, as a community to facilitate and help build a better safer, cohesive and a pluralist society?

As a FOSS community we are constantly talking of give-back but are we engaged enough?

How could we take the strengths and learnings that make us a successful tech community to wider non tech audiences with a view of creating social transformation that addresses the needs of our societies in these turbulent times. What can we learn from the transformation FOSS and the Cloud has had on our ecosystem as technologists and how can we export that beyond tech to heal and build a stronger society?

I have more questions for discussions than answers however there is an inflight and successful start made by Peace Through Prosperity using Agile, Open source and Cloud to deliver social transformation programs that could be a starting point for the Drupal community to engage with in their own geographies. The open source component of this program is in development and work in progress can be seen here, if you’d like to contribute and #GiveBack beyond our bubble please get in touch over Twitter or Linkedin.

Links shared within the keynote slides:

The presentation from the keynote:

Open Source and Cloud Beyond tech from Kubair Shirazee

One more option to look on her packing at it levitra vardenafil it of course to take not so simply because a form another and there is a wish to hold in hand her not so strongly. You can carry by me on a wide field.

Jul 15 2015
Jul 15

Regardless of industry, staff size, and budget, many of today’s organizations have one thing in common: they’re demanding the best content management systems (CMS) to build their websites on. With requirement lists that can range from 10 to 100 features, an already short list of “best CMS options” shrinks even further once “user-friendly”, “rapidly-deployable”, and “cost-effective” are added to the list.

There is one CMS, though, that not only meets the core criteria of ease-of-use, reasonable pricing, and flexibility, but a long list of other valuable features, too: Drupal.

With Drupal, both developers and non-developer admins can deploy a long list of robust functionalities right out-of-the-box. This powerful, open source CMS allows for easy content creation and editing, as well as seamless integration with numerous 3rd party platforms (including social media and e-commerce). Drupal is highly scalable, cloud-friendly, and highly intuitive. Did we mention it’s effectively-priced, too?

In our “Why Drupal?” 3-part series, we’ll highlight some features (many which you know you need, and others which you may not have even considered) that make Drupal a clear front-runner in the CMS market.

For a personalized synopsis of how your organization’s site can be built on or migrated to Drupal with amazing results, grab a free ticket to Drupal GovCon 2015 where you can speak with one of our site migration experts for free, or contact us through our website.


SEO + Social Networking:

Unlike other content software, Drupal does not get in the way of SEO or social networking. By using a properly built theme–as well as add-on modules–a highly optimized site can be created. There are even modules that will provide an SEO checklist and monitor the site’s SEO performance. The Metatags module ensures continued support for the latest metatags used by various social networking sites when content is shared from Drupal.

SEO Search Engine Optimization, Ranking algorithm


Drupal Commerce is an excellent e-commerce platform that uses Drupal’s native information architecture features. One can easily add desired fields to products and orders without having to write any code. There are numerous add-on modules for reports, order workflows, shipping calculators, payment processors, and other commerce-based tools.



Drupal’s native search functionality is strong. There is also a Search API module that allows site managers to build custom search widgets with layered search capabilities. Additionally, there are modules that enable integration of third-party search engines, such as Google Search Appliance and Apache Solr.

Third-Party Integration:

Drupal not only allows for the integration of search engines, but a long list of other tools, too. The Feeds module allows Drupal to consume structured data (for example, .xml and .json) from various sources. The consumed content can be manipulated and presented just like content that is created natively in Drupal. Content can also be exposed through a RESTful API using the Services module. The format and structure of the exposed content is also highly configurable, and requires no programming.

Taxonomy + Tagging:

Taxonomy and tagging are core Drupal features. The ability to create categories (dubbed “vocabularies” by Drupal) and then create unlimited terms within that vocabulary is connected to the platform’s robust information architecture. To make taxonomy even easier, Drupal even provides a drag-n-drop interface to organize the terms into a hierarchy, if needed. Content managers are able to use vocabularies for various functions, eliminating the need to replicate efforts. For example, a vocabulary could be used for both content tagging and making complex drop-down lists and user groups, or even building a menu structure.



There are a few contributor modules that provide workflow functionality in Drupal. They all provide common functionality along with unique features for various use cases. The most popular options are Maestro and Workbench.


Drupal has a dedicated security team that is very quick to react to vulnerabilities that are found in Drupal core as well as contributed modules. If a security issue is found within a contrib module, the security team will notify the module maintainer and give them a deadline to fix it. If the module does not get fixed by the deadline, the security team will issue an advisory recommending that the module be disabled, and will also classify the module as unsupported.

Cloud, Scalability, and Performance:

Drupal’s architecture makes it incredibly “cloud friendly”. It is easy to create a Drupal site that can be setup to auto-scale (i.e., add more servers during peak traffic times and shut them down when not needed). Some modules integrate with cloud storage such as S3. Further, Drupal is built for caching. By default, Drupal caches content in the database for quick delivery; support for other caching mechanisms (such as Memcache) can be added to make the caching lightning fast.


Multi-Site Deployments:

Drupal is architected to allow for multiple sites to share a single codebase. This feature is built-in and, unlike WordPress, it does not require any cumbersome add-ons. This can be a tremendous benefit for customers who want to have multiple sites that share similar functionality. There are few–if any–limitations to a multi-site configuration. Each site can have its own modules and themes that are completely separate from the customer’s other sites.

Want to know other amazing functionalities that Drupal has to offer? Stay tuned for the final installment of our 3-part “Why Drupal?” series!

Apr 10 2015
Apr 10

A Silex and Elasticsearch app powered by Drupal 7 for content management

In the previous article I started exploring the integration between Drupal 7 and the Elasticsearch engine. The goal was to see how we can combine these open source technologies to achieve a high performance application that uses the best of both worlds. If you’re just now joining us, you should check out this repository which contains relevant code for these articles.


We’ll now create a small Silex application that reads data straight from Elasticsearch and returns it to the user.

Silex app

Silex is a great PHP micro framework developed by the same people that are behind the Symfony project. It is in fact using mainly Symfony components but at a more simplified level. Let’s see how we can get started really quickly with a Silex app.

There is more than one way. You can add it as a dependency to an existent composer based project:

"silex/silex": "~1.2",

Or you can even create a new project using a nice little skeleton provided by the creator:

composer.phar create-project fabpot/silex-skeleton

Regardless of how your project is set up, in order to access Elasticsearch we’ll need to use its PHP SDK. That needs to be added to Composer:

"elasticsearch/elasticsearch": "~1.0",

And if we want to use Twig to output data, we’ll need this as well (if not already there of course):

"symfony/twig-bridge": "~2.3"

In order to use the SDK, we can expose it as a service to Pimple, the tiny Silex dependency injection container (much easier than it sounds). Depending on how our project is set up, we can do this in a number of places (see the repository for an example). But basically, after we instantiate the new Silex application, we can add the following:

$app['elasticsearch'] = function() {
  return new Client(array());

This creates a new service called elasticsearch on our app that instantiates an object of the Elasticsearch Client class. And don’t forget we need to use that class at the top:

use Elasticsearch\Client;

Now, wherever we want, we can get the Elasticsearch client by simply referring to that property in the $app object:

$client = $app['elasticsearch'];

Connecting to Elasticsearch

In the previous article we’ve managed to get our node data into the node index with each node type giving the name of an Elasticsearch document type. So for instance, this will return all the article node types:


We’ve also seen how to instantiate a client for our Elasticsearch SDK. Now it’s time to use it somehow. One way is to create a controller:


namespace Controller;

use Silex\Application;
use Symfony\Component\HttpFoundation\Response;

class NodeController {

   * Shows a listing of nodes.
   * @return \Symfony\Component\HttpFoundation\Response
  public function index() {
    return new Response('Here there should be a listing of nodes...');
   * Shows one node
   * @param $nid
   * @param \Silex\Application $app
   * @return mixed
  public function show($nid, Application $app) {
    $client = $app['elasticsearch'];
    $params = array(
      'index' => 'node',
      'body' => array(
        'query' => array(
          'match' => array(
            'nid' => $nid,
    $result = $client->search($params);
    if ($result && $result['hits']['total'] === 0) {
      $app->abort(404, sprintf('Node %s does not exist.', $nid));
    if ($result['hits']['total'] === 1) {
      $node = $result['hits']['hits'];
      return $app['twig']->render('node.html.twig', array('node' => reset($node)));

Depending on how you organise your Silex application, there are a number of places this controller class can go. In my case it resides inside the src/Controller folder and it’s autoloaded by Composer.

We also need to create a route that maps to this Controller though. Again, there are a couple of different ways to handle this but in my example I have a routes.php file located inside the src/ folder and required inside index.php:


use Symfony\Component\HttpFoundation\Response;

 * Error handler
$app->error(function (\Exception $e, $code) {
  switch ($code) {
    case 404:
      $message = $e->getMessage();
      $message = 'We are sorry, but something went terribly wrong. ' . $e->getMessage();

  return new Response($message);

 * Route for /node
$app->get("/node", "Controller\\NodeController::index");

 * Route /node/{nid} where {nid} is a node id
$app->get("/node/{nid}", "Controller\\NodeController::show");

So what happens in my example above? First, I defined an error handler for the application, just so I can see the exceptions being caught and print them on the screen. Not a big deal. Next, I defined two routes that map to my two controller methods defined before. But for the sake of brevity, I only exemplified what the prospective show() method might do:

  • Get the Elasticsearch client
  • Build the Elasticsearch query parameters (similar to what we did in the Drupal environment)
  • Perform the query
  • Check for the results and if a node was found, render it with a Twig template and pass the node data to it.
  • If no results are found, abort the process with a 404 that calls our error handler for this HTTP code declared above.

If you want to follow this example, keep in mind that to use Twig you’ll need to register it with your application. It’s not so difficult if you have it already in your vendor folder through composer.

After you instantiate the Silex app, you can register the provider:

$app->register(new TwigServiceProvider());

Make sure you use the class at the top:

use Silex\Provider\TwigServiceProvider;

And add it as a service with some basic configuration:

$app['twig'] = $app->share($app->extend('twig', function ($twig, $app) {
  return $twig;
$app['twig.path'] = array(__DIR__.'/../templates');

Now you can create template files inside the templates/ folder of your application. For learning more about setting up a Silex application, I do encourage you to read this introduction to the framework.

To continue with our controller example though, here we have a couple of template files that output the node data.

Inside a page.html.twig file:

        <!DOCTYPE html>
            {% block head %}
                <title>{% block title %}{% endblock %} - My Elasticsearch Site</title>
            {% endblock %}
        <div id="content">{% block content %}{% endblock %}</div>

And inside the node.html.twig file we used in the controller for rendering:

{% extends "page.html.twig" %}

{% block title %}{{ node._source.title }}{% endblock %}

{% block content %}

        <h1>{{ node._source.title }}</h1>

        <div id="content">

            {% if node._source.field_image %}
                <div class="field-image">
                    {% for image in node._source.field_image %}
                        <img src="http://www.sitepoint.com/integrate-elasticsearch-silex//{{ image.url }}" alt="img.alt"/>
                    {% endfor %}
            {% endif %}

            {% if node._source.body %}
                <div class="field-body">
                    {% for body in node._source.body %}
                        {{ body.value|striptags('<p><div><br><img><a>')|raw }}
                    {% endfor %}
            {% endif %}

{% endblock %}

This is just some basic templating for getting our node data printed in the browser (not so fun otherwise). We have a base file and one that extends it and outputs the node title, images and body text to the screen.

Alternatively, you can also return a JSON response from your controller with the help of the JsonResponse class:

use Symfony\Component\HttpFoundation\JsonResponse;

And from your controller simply return a new instance with the values passed to it:

return new JsonResponse($node);

You can easily build an API like this. But for now, this should already work. By pointing your browser to http://localhost/node/5 you should see data from Drupal’s node 5 (if you have it). With one big difference: it is much much faster. There is no bootstrapping, theming layer, database query etc. On the other hand, you don’t have anything useful either out of the box except for what you build yourself using Silex/Symfony components. This can be a good thing or a bad thing depending on the type of project you are working on. But the point is you have the option of drawing some lines for this integration and decide its extent.

One end of the spectrum could be building your entire front end with Twig or even Angular.js with Silex as the API backend. The other would be to use Silex/Elasticsearch for one Drupal page and use it only for better content search. Somewhere in the middle would probably be using such a solution for an entire section of a Drupal site that is dedicated to interacting with heavy data (like a video store or something). It’s up to you.


We’ve seen in this article how we can quickly set up a Silex app and use it to return some data from Elasticsearch. The goal was not so much to learn how any of these technologies work, but more of exploring the options for integrating them. The starting point was the Drupal website which can act as a perfect content management system that scales highly if built properly. Data managed there can be dumped into a high performance data store powered by Elasticsearch and retrieved again for the end users with the help of Silex, a lean and fast PHP framework.

Apr 03 2015
Apr 03

A Silex and Elasticsearch app powered by Drupal 7 for content management

In this tutorial I am going to look at the possibility of using Drupal 7 as a content management system that powers another high performance application. To illustrate the latter, I will use the Silex PHP microframework and Elasticsearch as the data source. The goal is to create a proof of concept, demonstrating using these three technologies together.


The article comes with a git repository that you should check out, which contains more complete code than can be presented in the tutorial itself. Additionally, if you are unfamiliar with either of the three open source projects being used, I recommend following the links above and also checking out the documentation on their respective websites.

The tutorial will be split into two pieces, because there is quite a lot of ground to cover.

In this part, we’ll set up Elasticsearch on the server and integrate it with Drupal by creating a small, custom module that will insert, update, and delete Drupal nodes into Elasticsearch.

In the second part, we’ll create a small Silex app that fetches and displays the node data directly from Elasticsearch, completely bypassing the Drupal installation.


The first step is to install Elasticsearch on the server. Assuming you are using Linux, you can follow this guide and set it up to run when the server starts. There are a number of configuration options you can set here.

A very important thing to remember is that Elasticsearch has no access control so, once it is running on your server, it is publicly accessible through the (default) 9200 port. To avoid having problems, make sure that in the configuration file you uncomment this line:

network.bind_host: localhost

And add the following one:

script.disable_dynamic: true

These options make sure that Elasticsearch is not accessible from the outside, nor are dynamic scripts allowed. These are recommended security measures you need to take.


The next step is to set up the Drupal site on the same server. Using the Elasticsearch Connector Drupal module, you can get some integration with the Elasticsearch instance: it comes with the PHP SDK for Elasticsearch, some statistics about the Elasticsearch instance and some other helpful submodules. I’ll leave it up to you to explore those at your leisure.

Once the connector module is enabled, in your custom module you can retrieve the Elasticsearch client object wrapper to access data:

$client = elastic_connector_get_client_by_id('my_cluster_id');

Here, my_cluster_id is the Drupal machine name that you gave to the Elasticsearch cluster (at admin/config/elasticsearch-connector/clusters). The $client object will now allow you to perform all sorts of operations, as illustrated in the docs I referenced above.

Inserting data

The first thing we need to do is make sure we insert some Drupal data into Elasticsearch. Sticking to nodes for now, we can write a hook_node_insert() implementation that will save every new node to Elasticsearch. Here’s an example, inside a custom module called elastic:

 * Implements hook_node_insert().
function elastic_node_insert($node) {
  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');
  $params = _elastic_prepare_node($node);

  if ( ! $params) {
    drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));

  $result = $client->index($params);
  if ($result && $result['created'] === false) {
    drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));

  drupal_set_message(t('The node has been saved to Elasticsearch.'));

As you can see, we instantiate a client object that we use to index the data from the node. You may be wondering what _elastic_prepare_node() is:

 * Prepares a node to be added to Elasticsearch
 * @param $node
 * @return array
function _elastic_prepare_node($node) {

  if ( ! is_object($node)) {

  $params = array(
    'index' => 'node',
    'type' => $node->type,
    'body' => array(),

  // Add the simple properties
  $wanted = array('vid', 'uid', 'title', 'log', 'status', 'comment', 'promote', 'sticky', 'nid', 'type', 'language', 'created', 'changed', 'revision_timestamp', 'revision_uid');
  $exist = array_filter($wanted, function($property) use($node) {
    return property_exists($node, $property);
  foreach ($exist as $field) {
    $params['body'][$field] = $node->{$field};

  // Add the body field if exists
  $body_field = isset($node->body) ? field_get_items('node', $node, 'body') : false;
  if ($body_field) {
    $params['body']['body'] = $body_field;

  // Add the image field if exists
  $image_field = isset($node->field_image) ? field_get_items('node', $node, 'field_image') : false;
  if ($image_field) {
    $params['body']['field_image'] = array_map(function($img) {
      $img = file_load($img['fid']);
      $img->url = file_create_url($img->uri);
      return $img;
    }, $image_field);

  return $params;

It is just a helper function I wrote, which is responsible for “serializing” the node data and getting it ready for insertion into Elasticsearch. This is just an example and definitely not a complete or fully scalable one. It is also assuming that the respective image field name is field_image. An important point to note is that we are inserting the nodes into the node index with a type = $node->type.

Updating data

Inserting is not enough, we need to make sure that node changes get reflected in Elasticsearch as well. We can do this with a hook_node_update() implementation:

 * Implements hook_node_update().
function elastic_node_update($node) {
  if ($node->is_new !== false) {

  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');
  $params = _elastic_prepare_node($node);

  if ( ! $params) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));

  $result = _elastic_perform_node_search_by_id($client, $node);
  if ($result && $result['hits']['total'] !== 1) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));

  $params['id'] = $result['hits']['hits'][0]['_id'];
  $version = $result['hits']['hits'][0]['_version'];
  $index = $client->index($params);

  if ($index['_version'] !== $version + 1) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
  drupal_set_message(t('The node has been updated in Elasticsearch.'));

We again use the helper function to prepare our node for insertion, but this time we also search for the node in Elasticsearch to make sure we are updating and not creating a new one. This happens using another helper function I wrote as an example:

 * Helper function that returns a node from Elasticsearch by its nid.
 * @param $client
 * @param $node
 * @return mixed
function _elastic_perform_node_search_by_id($client, $node) {
  $search = array(
    'index' => 'node',
    'type' => $node->type,
    'version' => true,
    'body' => array(
      'query' => array(
        'match' => array(
          'nid' => $node->nid,

  return $client->search($search);

You’ll notice that I am asking Elasticsearch to return the document version as well. This is so that I can check if a document has been updated with my request.

Deleting data

The last (for now) feature we need is the ability to remove the data from Elasticsearch when a node gets deleted. hook_node_delete() can help us with that:

 * Implements hook_node_delete().
function elastic_node_delete($node) {
  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');

  // If the node is in Elasticsearch, remove it
  $result = _elastic_perform_node_search_by_id($client, $node);
  if ($result && $result['hits']['total'] !== 1) {
    drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));

  $params = array(
    'index' => 'node',
    'type' => $node->type,
    'id' => $result['hits']['hits'][0]['_id'],

  $result = $client->delete($params);
  if ($result && $result['found'] !== true) {
    drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));

  drupal_set_message(t('The node has been deleted in Elasticsearch.'));

Again, we search for the node in Elasticsearch and use the returned ID as a marker to delete the document.

Please keep in mind though that using early returns such as illustrated above is not ideal inside Drupal hook implementations unless this is more or less all the functionality that needs to go in them. I recommend splitting the logic into helper functions if you need to perform other unrelated tasks inside these hooks.

This is enough to get us started using Elasticsearch as a very simple data source on top of Drupal. With this basic code in place, you can navigate to your Drupal site and start creating some nodes, updating them and deleting them.

One way to check if Elasticsearch actually gets populated, is to disable the remote access restriction I mentioned above you need to enable. Make sure you only do this on your local, development, environment. This way, you can perform HTTP requests directly from the browser and get JSON data back from Elasticsearch.

You can do a quick search for all the nodes in Elasticsearch by navigating to this URL:


…where localhost points to your local server and 9200 is the default Elasticsearch port.

For article nodes only:


And for individual articles, by the auto generated Elasticsearch ids:


Go ahead and check out the Elasticsearch documentation for all the amazing ways you can interact with it.


We’ve seen in this article how we can start working to integrate Elasticsearch with Drupal. Obviously, there is far more we can do based on even the small things we’ve accomplished. We can extend the integration to other entities and even Drupal configuration if needed. In any case, we now have some Drupal data in Elasticsearch, ready to be used from an external application.

That external application will be the task for the second part of this tutorial. We’ll be setting up a small Silex app that, using the Elasticsearch PHP SDK, will read the Drupal data directly from Elasticsearch. As with part 1, above, we won’t be going through a step-by-step tutorial on accomplishing a given task, but instead will explore one of the ways that you can start building this integration. See you there.

Sep 21 2013
Sep 21

I've had several people already asking me about this so I figured I should add some details here. But first, let me be clear about this: Acquia does NOT yet support Drupal 8. It's a work in progress and there's much testing and brainstorming, currently, on how to provide the best Drupal 8 experience on Acquia Cloud. When we say it's ready, it really will. Stay tuned.

So, with this, how can you start playing with Drupal 8 on Acquia Cloud today? That's easy, use Acquia Cloud Free! If you don't know what that is, read my previous blog post.

Prepare your Drupal 8 repository

Start by cloning your git repo locally:

$ git clone [email protected]:d8.git
Cloning into 'd8'...  
Warning: Permanently added 'svn-3625.devcloud.hosting.acquia.com' (RSA) to the list of known hosts.  
remote: Counting objects: 29, done.  
remote: Compressing objects: 100% (27/27), done.  
remote: Total 29 (delta 5), reused 0 (delta 0)  
Receiving objects: 100% (29/29), 13.91 KiB, done.  
Resolving deltas: 100% (5/5), done.  

You'll get the following default structure:

$ cd d8 ; ll
total 32K  
drwxr-xr-x 6 www-data drupal 4.0K Sep 21 14:40 ./  
drwxrwxr-x 4 www-data drupal 4.0K Sep 21 14:40 ../  
drwxrwxr-x 2 www-data drupal 4.0K Sep 21 14:40 acquia-utils/  
drwxrwxr-x 2 www-data drupal 4.0K Sep 21 14:40 docroot/  
drwxrwxr-x 8 www-data drupal 4.0K Sep 21 14:40 .git/  
-rw-rw-r-- 1 www-data drupal  113 Sep 21 14:40 .gitignore
drwxrwxr-x 2 www-data drupal 4.0K Sep 21 14:40 library/  
-rw-rw-r-- 1 www-data drupal  598 Sep 21 14:40 README.acquia

Now, let's get to the real thing. Navigate to the docroot and add your Drupal 8 codebase there. Below, I'm simply rsync'ing Drupal alpha3 to the freshly checked out Acquia Cloud docroot. Note that I'm also excluding the .git directory from the process as well as deleting any file that's in the repo first. Proceed with care and always double check the path you're triggering rsync from. A safer method would be to add the full destination path instead of "." only. Your call.

$ cd docroot ; ls
acquialogo.gif  index.html  
$ rsync -avC --delete --exclude=.git /tmp/drupal-8.0-alpha3/ .

We can now commit our changes. I'm adding invisible files manually as it can sometimes be tricky with git. Note the -f (force) command as well to make sure git will take into account every new file that's been added to the repo.

$ git add -f *
$ git add -f .editorconfig .gitattributes .htaccess .jshintignore .jshintrc
$ git commit -am "First D8 commit"

We're making good progress. Let's push those changes upstream.

$ git push origin master
Counting objects: 29, done.  
Delta compression using up to 4 threads.  
Compressing objects: 100% (23/23), done.  
Writing objects: 100% (27/27), 33.33 KiB, done.  
Total 27 (delta 2), reused 0 (delta 0)  
remote: INFO: you are using 580kB out of 512000kB available.  
To [email protected]:d8.git  
   3a28dcc..dccf8a1  master -> master

Let's step back for a second. No need to move on with the remaining steps if you have an issue with your codebase. It should be part of your normal workflow to make sure there's no missing file or directory in your codebase. You have plenty of GUI tools to perform a directory diff, but you can also use the command line for that:

$ diff -bur /tmp/drupal-8.0-alpha3/ ~/Sites/d8/docroot/

Our codebase is not yet usable. We first need to create a settings.php file and give it correct permissions. Same deal for the default directory. We need to make sure the installer be able to create the files and translations directory (if you're not using english by default).

$ chmod 775 sites/default/
$ cp sites/default/default.settings.php sites/default/settings.php
$ chmod 775 sites/default/settings.php

On top of that, to make a successful connection to the Acquia Cloud database, we have to edit the settings.php file and add the Acquia require() line at the very bottom of the file. You'll find the PHP snippet under your Databases page, after clicking on the Connect to database button.

We're almost there. Now we need to add the settings.php file to our repo and push it upstream. 

$ git add -f sites/default/settings.php
$ git commit -am "Added settings.php file" ; git push

Note: you might come across git attributes issues such as below when committing modifications to your repo:

[attr]drupaltext    text eol=lf whitespace=blank-at-eol,-blank-at-eof,-space-before-tab,tab-in-indent,tabwidth=2
 not allowed: docroot/.gitattributes:13
[attr]drupalbinary  -text diff
 not allowed: docroot/.gitattributes:18

This is because Drupal 8 ships with a .gitattributes file to improve git patches. If you're getting errors, you might want to review your Git configuration. Change notice is available here: https://drupal.org/node/1803766

Install Drupal 8

We're finally there! Visit your Acquia Cloud dashboard to see the commits in your activity stream or you can directly visit http://d8.devcloud.acquia-sites.com/core/install.php (replace "d8" by your sitename, obviously). If you're seeing the Drupal install page, run the installer and enjoy:

Remember to change permissions back to what they were before for security reasons (don't forget to commit the changes):

$ chmod 755 sites/default/
$ chmod 440 sites/default/settings.php

If you come across a WSOD it'll be complicated to debug what's going on since you don't have access to an interactive Shell. Fortunately you can still get access to your log files from the Acquia Cloud dashboard and determine what's going on that's causing you troubles. Also, make sure to:

  • Run a diff (really) to make sure there's no missing file or directory
  • Confirm that your permissions are set correctly (remember that Git does not store permissions, apart from the executable bit)

Not everything will work out of the box, but you should be able to have some fun already. Depending upon where we are with Drupal 8 support, this whole process might break anytime so really only use this for testing purposes. Isn't that cool already?

Sep 20 2013
Sep 20

You might already know about the Acquia Cloud offering and maybe you're also an Acquia customer already (thank you!). If you're not and/or if you've always wanted to know more about what kind of Drupal hosting Acquia provides, I highly encourage you check out Acquia Cloud Free. Yes, it's a completely free Drupal hosting. There are a few limits and one of them is you cannot have a live site running on it. But it's perfect to discover the product or even play with dev sites and leverage great tools such as the Acquia Cloud development platform but also have access to QA Tools (Acquia Insight), hosted Apache Solr search (Acquia Search), Mollom (content moderation) and the Acquia knowledge base.

Set up a new subscription

First things first, visit the https://insight.acquia.com/free page. You need to either log in to your existing Acquia Network account or create a new account on the fly before being able to create your free subscription. The below screen capture shows a standard wizard when you're already logged in. Pretty straightforward, right? You also have the possibility to provision your subscription in the European Union, Australia and Singapore so that you get a minimal latency with the Amazon AWS availability zones, depending upon where you live.

Create a new free Acquia Cloud subscription

When you're ready, click on the Create my free subscription button. This will provision your subscription within ~2mn.

Subscription is being provisioned on the Amazon AWS availability zone of your choice

Configure your Acquia Cloud Free subscription

Once your subscription is available, a Getting started window will show up to guide you through the most common tasks required to set up your Drupal site.

The tour helps new users they make their way through the Acquia Cloud Free subscription

Basically, it requires two things: add your public SSH key and clone your Git repo locally. The "tour" will provide you with helpful tooltips for you to immediately understand where to start from.

The tour helps new users they quickly know where to add their SSH public key

Once your SSH public key has been added, it'll show up under your Users and keys page from within your subscription. Next, you're ready to clone your repo locally, and again, the "tour" will assist you with this process.

The tour will guide you through cloning your git repository locally

That's it. In less that 10mn you've been able to provision a completely free Drupal hosting environment and get instant access to an outstanding platform and tools. Enjoy!

Apr 10 2013
Apr 10

The bulk of Drupal hosting for clients that we deal with is on virtual servers, whether they are marketed as "cloud" or not. Many eventually have to move to dedicated servers because increased traffic or continually adding features that increase complexity and bloat.

But, there are often common issues that we see repeatedly that have solutions which can prolong the life of your current site's infrastructure.

We assume that your staff, or your hosting provider, have full access to the virtual servers, as well as the physical servers that run on them.

Disks cannot be virtualized

Even for dedicated servers, the server's disk(s) are often the bottleneck for the overall system. They are the slowest part. This is definitely true for mechanical hard disks with rotating platters, and even Solid State Disks (SSDs) are often slower than the CPU or memory.

For the above reasons, disks cannot be fully virtualized. Yes, you do get a storage allocation that is yours to use and no one else can use. But you cannot guarantee a portion of the I/O throughput, which is always a precious resource on servers.

So, other virtual servers that are on the same physical server as you will contend for disk I/O if your site (or theirs) is a busy one or not optimally configured.

In a virtual server environment, you cannot tell how many virtual servers are on the same physical server, nor if they are busy or not. You only deal with the effects (see below).

For a Drupal site, the following are some of the most common causes for high disk I/O activity:

  • MySQL, with either a considerable amount of slow queries that do file sorts and temporary tables; or lots of INSERT/UPDATE/DELETE
  • Lots of logging activity, such as a warning or a notice in a module that keeps reporting exceptions many times per disk access
  • Boost cache expiry, e.g. when a comment is posted

Xen based virtualization vs. Virtuozzo or OpenVZ

The market uses virtualization technologies much like airlines when they overbook flights based on the assumption that some passengers will not show up.

Similarly, not all virtual hosting customers will use all the resources allocated to them, so there is often plenty of unused capacity.

However, not all virtualization technologies are equal when it comes to resource allocation.

Virtuozzo and its free variant, OpenVZ, use the term "burst memory" to allocate unused memory from other instances, or even swap space when applications demand it on one instance. However, this can bring a server to its knees if swap usage causes thrashing.

Moreover, some Virtuozzo/OpenVZ hosts use vzfs, a virtualized file system, which is slow for Drupal when used for certain things, such as having all of web root on it, logs, and database files.

Xen does not suffer from any of the above. It guarantees that memory and CPU allocated to one virtual instance stays dedicated for that instance.

However, since physical disk I/O cannot be virtualized, it remains the only bottleneck with Xen.

Underpowered Instances

One issue that Amazon AWS EC2 users face is that the reasonably priced instances are often underpowered for most Drupal sites. These are the Small and Medium instances.

For sites with low number of nodes/comments per day, and with most traffic being anonymous. These sites lend themselves to working well with proper Varnish caching enabled set to long hours before expiring.

Other sites that rely on a large number of simultaneous logged in users, with lots of enabled modules, and with short cache expiry times do not work well with these underpowered instances. Such sites require the Extra Large instances, and often the High CPU ones too.

Of course, this all adds to the total costs of hosting.

Expensive As You Grow

Needless to say, if your site keeps growing then there will be added hosting costs to cope with this growth.

With the cloud providers, these costs often grow faster than with dedicated servers, as you add more instances, and so on.

Misconfigured Self-Virtualization

Some companies choose to self manage physical servers colocated at a datacenter and virtualized them themselves.

This is often a good option, but can also be a pitfall. Sometimes the servers are badly misconfigured. We saw one case where the physical server was segmented into 12 VMWare virtual servers with no good reason. Moreover, all of them were accessing a single RAID array. On top of that boost was used on a busy popular forum. When a comment was posted, boost was expiring pages, and that was tying up the RAID array from doing anything useful to other visitors of the site.

Variability in Performance

With cloud and virtual servers, you often don't notice issues, but then suddenly variability will creep in.

An analogy ...

This happens because you have bad housemates who flush the toilet when you are in the shower. Except that you do not know who those housemates are, and can't ask them directly. The only symptom is this sudden cold water over your body. Your only recourse is to ask the landlord if someone flushed the toilet!

Here is a case in point: a Drupal site at a VPS with a popular cloud provider. It worked fine for several years. Then the host upgraded to another, newer version, and asked all customers to move their sites.

It was fine most of the time, but then extremely slow at other times. No pattern could be predicted.

For example while getting a page from the cache for anonymous visitors usually takes a few tens of milliseconds at most, on some occasions it takes much more than that, in one case, 13,879 milliseconds, with the total page load time 17,423 milliseconds.

Here is a sample of devel's output:

Executed 55 queries in 12.51 milliseconds. .Page execution time was 118.61 ms.

Executed 55 queries in 7.56 milliseconds. Page execution time was 93.48 ms.

Most of the time is spent is retrieving cached items.

ms where query
0.61 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'menu:1:en'
0.42 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_87_[redacted]'
0.36 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_54_[redacted]'
0.19 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'filter:3:0b81537031336685af6f2b0e3a0624b0'
0.18 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_88_[redacted]'
0.18 block_list SELECT * FROM blocks WHERE theme = '[redacted]' AND status = 1 ORDER BY region, weight, module

Then suddenly, same site, same server, and you get:

Executed 55 queries in 2237.67 milliseconds. Page execution time was 2323.59 ms.

This was a Virtuozzo host, and it was a sign of disk contention. Since this is a virtual server, we could not tell if this is something inside the virutal host or some other tenant on the same physical server flushing the toilet ...

The solution is in the following point.

Move your VPS to another physical server

When you encounter variable performance or poor performance, before wasting time on troubleshooting that may not lead anywhere, it is worthwhile to contact your host, and ask for your VPS to be moved to a different physical server.

Doing so most likely will solve the issue, since you effectively have a different set of housemates.

Further Reading:

Feb 09 2013
Feb 09

In this blog I will drive you through setting up Drupal project instance on Ec2 micro instance of AWS and setting up ftp on your Drupal instance. Before this, of course you have to register with AWS which is straightforward.

So, what we will understand from this blog:-

1       Choosing OS and assigning some security rules on our instance.

2       How to access our instance and play around it?

3       Setting up LAMP on our AWS micro instance

4       Setting up ftp on AWS micro instance

5       Managing your Drupal project using ftp connection using filezilla

Once you are registered with AWS you have to login into your account. Among the AWS services just click on EC2 link which is nothing but a virtual server in the cloud. It will redirect you to EC2 dashboard where you can manage your entire instance. Now for creating new instance follow below steps:-

1       Just click on Launch instance select Classical wizard and click continue.

2       Now you have to select Amazon Machine Image (AMI) from one of the tabbed lists below by clicking its Select button.

3       Let’s say we select Ubuntu 12.04 LTS.

4       Next leave default settings except just select micro instance from the instance type because it’s free to use and click continue.

5       Next again leave default setting and click continue.

6       Next again leave default setting and click continue.

7       Now you have to give name of your key and value. I recommend give key value name more sensible with your project and click continue.

8       Now select creating a new key pair as we are new we don't have existed key pair. Give name to your key pair file it should make sense with regards to your project. Download your keypair.pem file and save at a safe place because we need this file later.

9       Next select create a new security group. Here we will assign some security rule and enable http, SSH and ftp connection to our instance. Http port range is 80 SSH port range is 22, for ftp select custom TCP rule and give port range 21-22. About source you can give any IP range as you need or just leave default for now.

10    Just click continue and launch instance. Your instance will be running in some time as AWS will take some time to run your instance.

Now our EC2 micro instance is running. You can check out from your dashboard.

Now we setup LAMP in our Ubuntu 12.04 LTS instance. For this we access our Ubuntu instance from terminal and setup LAMP in that. Below are the steps to access and setup your LAMP in Ubuntu 12.04 LTS instance.

1       Open your terminal and go to the directory where you stored your key pair file .pem file then run this command into your terminal sudo SSH -i file_name.pem [email protected] .

2       It will give you Ubuntu prompt in your terminal. You can understand like this that now you are logged in into your Ubuntu machine and you can do anything over there. The main thing here is you have to run all commands with "sudo" or as a root user which works on file system directory.

3       In order to setup LAMP we will install three packages in it . Run these command from terminal :-

?       sudo apt-get install apache2

?       sudo apt-get install mysql-server

?       sudo apt-get php5 php-pear php5-mysql php5-suhosin

That's it your LAMP environment is ready. You can check it by navigating your instance url that is for example look like this ec2-43-23-32.compunte.AWS.com in browser. It will give you message Localhost is working something like that.

Till here we got our LAMP environment running into our EC2 instance. Now we install our Drupal 7 instance in it. Here in Ubuntu instance we don't have any ftp connection with ftp client or server. So we can use SCP for taking Drupal tarball from our local instance or we can use wget utility of linux to download Drupal from its url. Below are steps to install Drupal 7

1       cd /var/www/

2       wget  http://ftp.Drupal.org/files/projects/Drupal-7.12.tar.gz

3       tar xvf Drupal-7.12.tar.gz

4       mv Drupal-7.12 Drupal

Below is a link which can tell you how to install Drupal in linux . Just follow all those steps.


When you setup with your mysql database and Drupal configuration then just browser link like for example:


It will take you to your Drupal site.

Now we will see how to setup ftp on Drupal instance .Why we need ftp for Drupal instance. In order to work on Drupal we have to use many modules, themes and libraries. So we have to upload those things on site. We can achieve this with SCP but that will be more difficult because you have to do via command line.  Here we will see how to setup Filezilla ftp on Drupal site which is on AWS EC2. Below are the steps to setup filezilla as a ftp client.

1       Install filezilla  into your local system :- sudo apt-get install filezilla

2       Now open filezilla click on file > site manager

3       Enter the details of your site here like :-

?       HOST - ec2-43-23-32.compute-1.amazonAWS.com

?       Port- 22

?       Protocol - SFTP(SSH file transfer protocol )

?       Logon Type - Normal

?       User - Ubuntu

?       Password - Ubuntu . Then click on ok don't click on connect this time.

1       Now click on Edit> Settings > SFTP and addkeyfile. Navigate your .pem file here. It will ask you to convert .pem file just select ok. Now filezilla have your instance credentials to connect and everything is good to connect to your Drupal instance.

2       Now click on file > site manager > connect

Now you can transfer all files from your local to Drupal instance.

I hope you enjoyed this blog. Please feel free to comment and send queries to me.




Thanks for reading draft of this. Appreciate your help and contribution.

May 24 2012
May 24

A trip to the Drupal Gardens Part 1

Posted on: Thursday, May 24th 2012 by Bono Fung

Drupal Gardens offers an alternative to traditional Drupal installation. If you:

  • Want a site on the fly and don't need a lot of code/module modifications
  • Don't have a lot of experience in Drupal or any other CMS
  • Don't have your own local hosting space/resources

Then Drupal Gardens might worth looking into. It is fast to setup and it is easy to use. It comes with a lot of great pre-loaded features. It's a cloud service so you don't have to setup and configure your own hosting space. It is has a lower learning curve so even if one has never used Drupal it will make the first time experience less overwhelming. Drupal Gardens's official site also offers a lot of documents and tutorials so anyone can go through them and learn all the basics in relatively short time.

However, it does have some restrictions and limitations that one must take into considerations before deciding to go forward with Drupal Gardens. Please carefully review the following before committing to Drupal Gardens.

No additional contribute/custom modules

If you go forward with Drupal Gardens you will need to be clear that you cannot install or add any modules into the system outside of what's already made available. Drupal Gardens offers quite a bit of modules compared to regular versions by default. Unless you need to do something very specific you probably won't need anything else. Unfortunately, once you've committed to using Drupal Gardens and find yourself three months down the road realizing that you must add a custom module, you might need to migrate the site out of Drupal Gardens and into Drupal 7 etc. This can cause additional time and effort.

No custom updates for modules

Take CCK for example. In Drupal6 the default CCK2 does not handle multigroup; therefore, if you wish to do multigroup then you will need to update to CCK3. This is a relatively easy task in regular Drupal. In case of a similar situation in Drupal Gardens you won't be able to perform this update even if CCK is already available in Drupal Gardens. As a result you will be using the same version of module as everyone else in the system.

Real time update with multiple users.

If the site is updated by many people at the same time it might turn into a version control nightmare if you were used to SVN and GIT. Although regular Drupal also suffers from this problem when it comes to editing page content on the site but at least all the themes, css, scripts and module code etc can be handled through version control. Basically any Drupal Garden user who has permission to alter the site can all be working on the content, theme etc and overwrite each other's changes in real time without any notification. If you have a big group working on a Drupal Gardens site it is better to delegate each person to a particular section therefore avoiding an overwrite of each other's changes.

In Part1 I am basically giving everyone an overview of Drupal Gardens. In Part2 I will cover some tips and tricks that you won't find in official tutorials and documents.

Mar 09 2012
Mar 09

The Gateway to 21st Century Skills (www.thegateway.org) is a semantic web enabled digital library that contains thousands of educational resources and as one of the oldest digital libraries on the web, it serves educators in 178 countries. Since 1996, educational activities, lesson plans, online projects, and assessment items have been contributed and vetted by over 700 quality organizations.

Given their rich pedigree, the site serves over 100,000 resources each month to educators worldwide. Since 2005, the Gateway has been managed by JES & Co., a 501(c)(3) non-profit educational organization. The original site was built on Plone several years ago. In recent years the constraints of the old site proved too great for the quality and quantity of content, and the needs of its increasingly engaged readership. It was becoming difficult and expensive to manage and update in its current configuration.

JES & Co., as an organization with a history of embracing innovation, decided to move the Gateway onto Drupal and looked to 10jumps to make the transition happen. The site had to be reliable with very high up time. Moreover the site would have to be able to handle the millions of hits without batting an eyelid. And most importantly, the faceted search would have to work well with the semantically described records. Based on the requirements, Acquia’s managed cloud seemed like the best approach. It can help a site scale across multiple servers and Acquia provides high-availability with full fail-over support.
“If something does go down, we know that Acquia 24x7 support has our back” - 10jumps

How they did it

There were several hosting options, but very few that met the requirements for the Gateway. And definitely none that made the development-testing-production migration seamless and easy. Usually there are too many manual steps raising the chances of error.

After a few rounds of technology and support evaluation calls, Acquia was retained to provide hosting and site support. A good support package, combined the expertise of Acquia’s support team was a compelling reason to make the move. The technical team at 10jumps was also fairly confident that the move would be a good choice for their customer – the Gateway, freeing them to focus on the site development. With Acquia’s Managed Cloud and the self-service model, code, local files and database can be migrated between development, testing and production systems literally with mouse clicks. With the seamless migration, the development cycle became shorter and with Git in place, collaboration between developers became easier. Moreover caching for anonymous content was provided out of the box and the 10jumps developers did not have to navigate tricky cache settings. Moving the developers to the new platform was the first step and soon the team was on an agile development track, being able to develop and roll out features quickly.

The result

After the new site went live, we were certain that

TheGateway.org would not be effected by traffic spikes, nor would the site be down because of a data center outage. More importantly, the semantically described data could be searched more efficiently because of the integration with Apache’s Solr search that comes from being in the Acquia cloud.
The development life cycle had gone from being clunky and broken to being smooth and agile. The redesigned site makes it simpler for the end users to navigate through large amounts of data and the powerful search is returning better results - improving overall user experience.

Jan 03 2012
Jan 03

Tips for Acquia Hosting Development

Posted on: Tuesday, January 3rd 2012 by Brandon Tate

Here at Appnovation, we frequently use the Acquia hosting platform for our clients. The Dev Cloud and Managed Cloud are impressive platforms that fit well for many Drupal sites being built today. I’ve listed some items below that have helped with the overall build quality and ease of use for these platforms.

Utilities - Drush aliases

Anyone who develops with Drupal should know about Drush by now. Acquia has gone ahead and installed Drush and the Drush aliases on your Acquia Hosting account automatically. For example, you can run the command “drush @sitename.stg cc all” from your local computer to clear all the caches within the Drupal site without having to log into the server for it to work! You can find the aliases under Sites > Managed (or Dev) Cloud > Utilities. After downloading the file to your $HOME/.drush/ directory and setting up an SSH key, you're good to go.

Enabling Git

Acquia uses SVN as its primary repository system. However, some of us like to use Git for its enhanced speed over SVN. Not a problem since Acquia supports both. Switching to Git is easy and can be found under Sites > Dev Cloud > Workflow. At the top of the page you’ll find an SVN URL by default with a gear icon located next to it. If you click that icon a drop down will appear and the option to switch to Git will be displayed. Simply click this link and your repository will switch from SVN to Git. For Managed Cloud platforms, you just have to request Git by filing a support ticket through Acquia and they will switch it for you.


Acquia Dev Cloud and Managed Cloud implement Varnish to speed things up. To take advantage of this I recommend you use Pressflow. The installation and setup is the exact same as Drupal and Pressflow is 100% compatible with Drupal core modules. There are some issues with contrib modules, so make sure you check the functionality of these modules before assuming they work. Once Pressflow is installed and setup on the Acquia server, you’ll notice you have a new option under Performance called “external”. This allows Varnish to handle caching instead of Drupal’s stock functionality.

Mobile sites and Varnish

I recently worked on a site that was using Varnish to speed up the performance. The site was also configured to use a different theme when a mobile device was detected. I was using the global variable $custom_theme to switch the theme when a mobile device was found. However, when this solution was implemented with Varnish we noticed that sometimes the mobile site would get served to a desktop browser and vice versa. The problem was that Varnish was caching the page and not hitting the code to determine if the device was mobile or not. To correct this, we needed to switch from using the global $custom_theme variable to using Drupal’s multi-site folder architecture. To do this, you need to add a mobile site folder (m.domainname.com) to the sites directory within Drupal. Then within the settings.php file add the following line.

$conf = array('theme_default' => 'mobile_theme');

This will force the site to use the mobile theme. Next, you’ll have to add the domain to the Acquia Hosting platform which is located under Sites > Managed (or Dev Cloud) > Domains. After that, you’ll have to get in contact with Acquia support so that they can configure Varnish to detect mobile devices and redirect to the correct sites folder.

Memcache setup

Memcache is another technology that speeds up the performance of Drupal. To enable this, you must download the Memcache module from Drupal.org. After this, you can add the following lines into your settings.php file.

if (!empty($conf['acquia_hosting_site_info']) && !empty($conf['memcache_servers'])) {
  $conf['cache_inc'] = './sites/all/modules/contrib/memcache/memcache.inc';
  $conf['memcache_key_prefix'] = 'domainname';
  $conf['memcache_bins'] = array(
    'cache' => 'default',

First, I’d like to point out the memcache_key_prefix which must be unique across the different environments such as development to production. This prevents the cache tables from overwriting each other. Next, the memcache_bins key allows you to cache specific cache tables from Drupal. In the example above, I’ve left it to "default" but you could cache specific tables such as views, blocks etc.

Workflow - Lock out the files folder

Acquia has a drag and drop interface that allows the developer to drag the codebase, files and database between development, staging and production environments. This allows you to easily migrate your site between any of the servers. One thing I want to point out is, once your site is on the production server, its a good idea to lock the environment by setting it to Production Mode. This prevents the files directory and database from being overwritten and also allows you to pre-release the site to the client, allowing them to add content while you update the code base. To do this, go to Sites > Managed (or Dev) Cloud > Workflow. There will be a “Prod” section with a gear beside it, clicking that gear will provide a link to switch the site to Production Mode. This can be easily reverted as well by clicking the gear and reverting the site to Pre-Launch mode.

Those are some tips that I’ve discovered over my use of Acquia that have eased the development and deployment of a Drupal site on their hosting platforms. If you have any questions or tips of your own, feel free to comment.

Dec 23 2011
Dec 23

About thegateway.org:

The Gateway has been serving teachers continuously since 1996 which makes it one of the oldest publically accessible U.S. repositories of education resources on the Web. The Gateway contains a variety of educational resource types from activities and lesson plans to online projects to assessment items.

The older version of the website was on plone. The team hired us to migrate it to Drupal. It was an absolutely right choice to make. Given that, with Drupal comes a lot more benefits.

We redesigned the existing website giving it a new look and on Drupal. Then we hosted it on Acquia managed could to boost its performance and scalability. The new look is more compact, organized and easier to use.

It was a very interesting project for us and our team is proud to be a part of such a great educational organization serving the nation.

Looking forward to a grand success of the new launch!

thegateway.org BEFORE:


thegateway.org NOW:

Mar 29 2011
Mar 29

Amazon AWS + Drupal

(Some familiarity with Amazon AWS is assumed.)

I have always wanted to setup a high performance Drupal on an AWS EC2. There are several advantages of running your website (or web application) on the AWS. Amazon EC2 creates and provisions virtual Linux (or Windows) servers for you and charge you an hourly rate for usage.

With AWS, it becomes easy to distribute and share the Drupal image with others. And of course it is much easier to scale and is definitely cheaper. You can have different virtual servers running the search engine, database and application servers, therefore all scaling independently of each other.

With the introduction of Micro instances and better yet, free micro instances, the barrier to entry for a new user has really dropped. 

I assume you have or can create an Amazon AWS account and use their management console. These aspects are very well covered in Amazon's site and I will not get into the details of creating an account, etc. Amazon has done a great job of creating the documentation and tutorials for getting started.

I will show how to:

1. Setup a LAMP stack on Ubuntu

2. Setup Drupal on the LAMP stack

3. How to install phpmyadmin

4. Configure the database to reside in the EBS instead of the ephemeral instance storage.

1. Setup a LAMP stack on Ubuntu:

I used a 64 bit image Ubuntu image for my purpose. Amazon provides both 32 bit and 64 bit micro instances, but I wanted to start with 64 bit, because their larger servers are only 64 bit and I can use the same image to scale up to larger Amazon servers. I used the Ubuntu image as my base image. This image is available in the US west region only. (Images are unique to regions and you can get similar images for the region you want to use).

Once your AWS account is setup, sign into the Amazon AWS console. Click on the EC2 tab. Check the region you are running in. If you want to run in US West, select it and click on launch instance. The popup following that will allow you to select an image. Search the image id: ami-01772744. Click on start and continue with default options. You will have to select a key-pair and security group. Make sure the port 80 and port 22 are open in the security group you want to use. Port 80 will allow the HTTP access and port 22 will allow the ssh connectivity to the server. 

You also have to know the location of the Amazon's private key file (.pem) for the key-value pair. The Ubuntu server takes a few minutes to start and be available. 

From the command line on your local machine type:

The part following [email protected] has to be replaced by your server's public dns name that Amazon provides on the console. Note there is no root user and all commands will work through sudo. Ubuntu does this to avoid any root user logins. You can access all the administrative functionality using sudo.

BTW, if the command above does not read and execute due to permission problem, you might want to first run:

        chmod 600 [path to key file]/[key file name].pem

Once connected to the remote server console (your first big milestone BTW), you can create a password for the ubuntu user by typing in (optional):

sudo passwd ubuntu

If you want to enable SSH access via passwords so you don't require the .pem file every time you can do the following:

 edit /etc/ssh/sshd_config to have

PasswordAuthentication yes

Restart the ssh deamon

sudo service ssh restart
sudo /etc/init.d/apache2 restart

Now with the basic logistics in place, let's set up the LAMP stack on this Ubuntu instance. I found this to be simpler than what I had expected. Write down any usernames and passwords you create from this point on

sudo tasksel install lamp-server

Drupal will need the re-write functionality to be able to perform clean URLs, so run the command

sudo a2enmod rewrite

That's it. You lamp stack is setup.

Go to http://[your public dns] and you should see some output form Apache.

BTW, what I also find really useful is to create some short cuts in the .profile file. For example instead of typing ls -al I can then type la and since I make spelling mistakes while typing sudo, I can point sodu to sudo as well. To do this, edit the /home/ubuntu/.profile file

sudo vim /home/ubuntu/.profile

Add the line:

alias la='ls -al'
alias sodu='sudo'

2. Setup Drupal on the LAMP stack

Setting up Drupal on the LAMP stack is usually just 1 line command and we will need to perform some basic operations:

        sudo apt-get install drupal6

edit the file /etc/apache2/sites-enabled/000-default and change the make the change so that DocumentRoot is now as follows:

        DocumentRoot /usr/share/drupal6

You can install Drupal anywhere and just point the DocumentRoot to that location. Also comment our the block that starts with 

        <Directory />

Also edit the file /etc/apache2/conf.d/drupal6.conf and comment out the line 

        Alias /drupal6 /usr/share/drupal6

restart the Apache so the above configuration changes are reflected correctly

        sudo service apache2 restart

Now go to http://[your public dns/install.php] and voila you are in business.

3. Setup phpmyadmin:

To access the database through phpmyadmin, you will need to install the phpmyadmin and access the URL of the application. Again, this is only optional and you can access all the SQL functionality form command line also. Installing phpmyadmin is trivial:

        sudo apt-get install phpmyadmin

And you are done. Follow the install options if any.

Go the the phpmyadmin application via:

http://[your public dns/phpmyadmin]

The user name is usually root.

4. Configure the database to reside in the EBS instead of the ephemeral instance storage:

Amazon's instances are ephemeral and the storage on the instance is ephemeral as well. That is if the instance is shutdown, the data on it will go away. Now that is not a very desirable configuration. However, Amazon allows you to mount persistant storage on top of the instance. You can mount any number of 1 TB drives on the instance. You can chose the size of the mounted drive at instance startup time.

Essentially, there will already be a mounted drive which you can find by typing:


The on the mounted drive you can create corresponding directories for logs, DB files and lib

You link the directories on the the mounted drive to the directories on your instance.  The set of commands are as follows:

Shut down the SQL first:
        sudo /etc/init.d/mysql stop

And then create the folders and link them:

        sudo mkdir /vol/etc /vol/lib /vol/log
sudo mv /etc/mysql     /vol/etc/
sudo mv /var/lib/mysql /vol/lib/
sudo mv /var/log/mysql /vol/log/

sudo mkdir /etc/mysql
sudo mkdir /var/lib/mysql
sudo mkdir /var/log/mysql

echo "/vol/etc/mysql /etc/mysql     none bind" | sudo tee -a /etc/fstab
sudo mount /etc/mysql

echo "/vol/lib/mysql /var/lib/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /var/lib/mysql

echo "/vol/log/mysql /var/log/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /var/log/mysql

        sudo /etc/init.d/mysql start

So, in summary, we saw how to setup the LAMP server, install Drupal and make sure the DB runs on the persistant storage. There is still work to harden the image and to create the image from the instance, and that will be covered in a subsequent blog.

Jan 21 2011
Jan 21

Drupal is widely recognized as a great content management system, but we strongly believe that Drupal offers a lot more than that – a framework, a platform, and a set of technology – to build and run enterprise applications, specifically on the cloud. This post is an attempt to explore the benefits and potential of Drupal on the cloud.


One of the last things the customers should worry about their websites is the performance degradation due to sudden spike in the traffic. For years, the customers had to size their servers to meet the peak demand. They overpaid, and still failed to deliver on promise, at peak load. Cloud solves this elasticity problem really well, and if you are using Drupal, you automatically get the elasticity benefits, since Drupal’s modularized architecture - user management, web services, caching etc. - is designed for scale-up and scale-down on the cloud for elastic load.


If Heroku’s $212 million acquisition by Salesforce.com is any indication, the future of PaaS is bright. Drupal, at its core, is a platform. The companies such as Acquia through Drupal Gardens are doing a great job delivering the power of Drupal by making it incredible easy for the people to create, run, and maintain their websites. This is not a full-blown PaaS, but I don’t see why they cannot make it one. We also expect to see a lot more players jumping into this category. The PaaS players such as phpfog and djangy have started gaining popularity amongst web developers.

Time-to-market and time-to-value:

Drupal has helped customers move from concept to design to a fully functional content-rich interactive website in relatively short period of time using built-in features and thousands of modules. Cloud further accelerates this process. Amazon and Rackspace have pre-defined high-performance Drupal images that the customers can use to get started. Another option is to leverage PaaS as we described above. The cloud not only accelerates time-to-market and time-to-value but it also provides economic benefits during scale-up and scale-down situations.


The cloud management tools experienced significant growth in the last two years and this category is expected to grown even more as the customers opt for simplifying and unifying their hybrid landscapes. With Drupal, the customers not only could leverage the cloud management tools but also augment their application-specific management capabilities with Drupal’s modules such as Quant for tracking usage, Admin for managing administrative tasks, and Google Analytics for integration with Google Analytics. There is still a disconnect between the cloud native management tools and Drupal-specific management tools, but we expect them to converge and provide a unified set of tools to manage the entire Drupal landscape on the cloud.

Open source all the way

Not only Drupal is completely open source but it also has direct integration with major open source components such as memcached, Apache SOLR, and native support for jQuery. This not only provides additional scale and performance benefits to Drupal on the cloud but the entire stack on the cloud is backed by vibrant open source communities.


It took a couple of years for the customers to overcome the initial adoption concerns around the cloud security. They are at least asking the right questions. Anything that runs on the cloud is expected to be scrutinized for its security as well. We believe that the developers should not explicitly code for security. Their applications should be secured by the framework that they use. Drupal not only leverages the underlying cloud security but it also offers additional security features to prevent the security attacks such as cross-site scripting, session hijacking, SQL injection etc. Here is the complete list by OWASP on top 10 security risks.

Search and Semantic Web

One of the core functionally that any content website needs is search. Developers shouldn’t have to reinvent the wheel. Integration with SOLR is a great way to implement search functionality without putting in monumental efforts. Drupal also has built-in support for RDF and SPARQL for the developers that are interested in Semantic Web.


The cloud is a natural platform for NoSQL and there has been immense ongoing innovation in the NoSQL category. For the modern applications and websites, using NoSQL on the cloud is a must-have requirement in many cases. Cloud makes it a great platform for NoSQL is so is Drupal. Drupal has modules for MongoDB and Cassandra and the modules for other NoSQL stores are currently being developed.

Drupal started out as an inexpensive content management system, but it has crossed the chasm. Not only the developers are trying to extend Drupal by adding more modules and designing different distributions, but importantly enterprise ISVs have also actively started exploring Drupal to make their offerings more attractive by creating extensions and leveraging the multi-site feature to set up multi-tenant infrastructure for their SaaS solutions. We expect that, the cloud as a runtime platform, will help Drupal, ISVs, and the customers to deliver compelling content management systems and applications on the cloud.

Jul 07 2010
Jul 07

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

May 18 2010
May 18

     In the last few years, the term “cloud computing” has become the ubiquitous new buzz-word to toss around when talking about flexible, scalable hosting solutions. Unfortunately, as often happens when a new concept is introduced to a large market, it is often misunderstood; the benefits and drawbacks are misrepresented (intentionally or not).

A recent schematic of 'The Cloud'     The original concept of cloud computing was a utility-like service that would provide a pool of shared resources for clients to use on-demand – thereby allowing customers to leverage a large pool of resources without having to bear the burden of the capital cost of building or maintaining the entire infrastructure themselves. In the web hosting world what we refer to as “cloud computing” sometimes aligns with this original concept, sometimes not so much, depending on the service in question.

     In this post, we hope to remove some of the mystery around cloud computing as it relates to web hosting services; pull back the curtain a bit and explain what is actually happening inside that misty and illusory “cloud”.

Virtualization and The Cloud
     All cloud products that are offered as a web hosting service have one thing in common: they are virtualized. "Virtualized" means that your “server” (whether called an “instance”, “slice” or something else by your service provider) is actually a virtualized operating system (kernel) that is referred to as a “virtual machine”, or VM. The VM is running inside a “container” system, which is in charge of managing the resources of multiple VMs running alongside yours. There are several products that do this, some of which you may be familiar with – some of the more well known products out there are VmWare, VirtualBox, and XEN.

Although all cloud products are virtualized, not all virtualized hosting services are cloud services. What distinguishes the two is:

  • the ability to scale dynamically
  • adding or removing resources can be controlled by the customer through an API
  • have the utilized resources billed out by some small increment, such as by the hour or day (although some providers do have larger billable increments, such as week or months)

This flexibility is generally considered to be the largest advantage of a cloud product over your more standard virtualized private server, it allows you to dial up and down resources, on-demand and as-needed, without having to commit to a lot of hardware that comes with a bunch of unused overhead during non-peak hours.

The team said my post needed more pictures, so, eh, here is a guy 'Virtualizing'     Once you are on a virtualized platform, your “server" becomes platform agnostic. It is now a VM. This means that it will run anywhere the “container” will run. It doesn't matter what the hardware is. This means that (compared to an operating system installed directly on some hardware) its trivially easy to move your VM from one container to another – from server to server, or even from datacenter to datacenter. The various virtualization technologies have different ways to do this, with different requirements, but they almost all have some way to “migrate” a VM from one host to another. Here is where an important distinction between virtualized private server (or “VPS”) services and cloud services comes into play.

Under the Hood
     In order for cloud services to be able to scale in an elastic fashion, they must use some sort of network attached storage. Network attached storage is pretty much what it sounds like - a physical disk storage system that connects to a server over a network. In the case of the cloud, your VM runs on a server which provides the "brain", the RAM/CPUs and other hardware necessary for the operating system to run and connect to the Internet. But instead of having hard drives inside that same server, the storage is on a networked storage solution that is connected to the server running your VM (in many cases just using your standard Ethernet interface and switching gear).

     Having network based storage is generally a requirement for any type of “hot migration”, that is, moving a VM from one server to another very seamlessly. In this configuration, using XEN for instance, you can migrate a running VM from one server to another without even a reboot. This is useful when you need to allocate more ram or CPU cores to a VM that is on a server that is fully utilized (by migrating to a server with more resources available), or if a server has some hardware failure and needs maintenance all the VMs on it can be migrated with minimal interruption.

More Ins and Outs

Image of an actual I/O bottleneck in the cloud, taken by electron microscope and artificially colorized by photoshop.     There are several drawbacks to using network attached storage, the primary one being I/O. I/O, very simply, is the rate at which data can be transferred “In” and “Out” of a device. There are various kinds of I/O that affect performance in various ways, but in this case we are talking about disc I/O. There are several types of network attached storage products out there, but the most popular is iSCSI, which is basically SCSI over Ethernet. This means that if you are running Gigabit Ethernet, you have 1,000 Mbit/s of I/O. Serial attached SCSI (SAS) drives put you at 4,800 Mbit/s - in a RAID configuration, you can get much more than that. If you are running 10 Gigabit Ethernet, you have 10,000 Mbit/s.

     Comparing I/O rates between devices and standards is never an apples to apples comparison, because it completely depends on the hardware and network configuration - for example it doesn't matter if you've spent all your money on a 10GbE network if you have connected an 8 servers with 8 CPU cores to one 10GbE network connection - you are now sharing that connection with potentially 64 CPU cores - that many cores could easily saturate your expensive 10GbE network several times over. Now, of course, even if each server had a dedicated direct attached SCSI array you could still saturate your disk I/O, but you would have much more overhead to play with, and you could more easily dedicate resources to a specific VM or set of VMs.

     As always, there are some exceptions to the rule, but as of right now, most cloud products will not give you as much I/O as you can get out of dedicated direct attached storage. I/O is not everything, but this is one potential bottleneck.

Keep it down over there! - or - Noisy Neighbors
These crazy kids up all night saturating my network     Perhaps more important than raw I/O, is the fact that cloud products (and shared VPS services for that matter), are based on shared resources. This is great when you don't want to pay for all the resources you have access to all the time, but what happens when someone sharing the same resources as you maxes them out? Your service suffers. If they saturate the network your storage is on, your service can suffer *a lot*. For disk I/O intensive applications (like Drupal) this can mean slow, frustrating death. This is true not only for shared network resources, but for shared CPU cores (you can't share RAM for technical reasons, so at least you are safe there). If you have someone sharing the same resources as you, and they use more than their "fair share", your service will be impacted. This is called having “Noisy Neighbors”. Some people mitigate this by rebooting their VM, hoping they boot back up on a less resource-bound server. Some just wait it out. Many just watch the low load numbers on their VM, and puzzle over the lackluster performance.

What should I do?!?
     Decisions about what type of hosting service to use depends on your particular situation. In most cases, we always say “proof is in the pudding” - if the service you are on is making you happy, and the price is sustainable, then stay there for goodness sake! But if you experience performance problems on a budget hosting or cloud service that are causing you to lose money or traffic, maybe its time to look at a dedicated server environment.

     Drupal, especially with the default database-based cache enabled, can be particularly I/O heavy on the database. Many cloud providers offer a hybrid model, where you can have virtualized cloud servers that connect to external non-virtualized hardware with direct attached storage. This is a great solution if you can afford it – it allows you to put your database server on dedicated hardware, but keep your web servers on a more flexible solution.

     If you can't afford that, then look at Drupal performance solutions like Pressflow or Mercury. If thats not an option, then the next stop may be a VPS with direct attached disks and dedicated resources.

     If you feel like you are having I/O or “Noisy Neighbor” issues, check if you are on a “shared” resource plan, ask your provider what type of infrastructure they are using and if its saturated. If they won't tell you, and you continue to have problems, its probably time to start looking for another provider.

Next up: Caching and Drupal – examining apc, varnish, boost, memcache, and content delivery networks, and how they work together.

Related resources

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web