Apr 10 2015
Apr 10

A Silex and Elasticsearch app powered by Drupal 7 for content management

In the previous article I started exploring the integration between Drupal 7 and the Elasticsearch engine. The goal was to see how we can combine these open source technologies to achieve a high performance application that uses the best of both worlds. If you’re just now joining us, you should check out this repository which contains relevant code for these articles.

esdrupalsilex

We’ll now create a small Silex application that reads data straight from Elasticsearch and returns it to the user.

Silex app

Silex is a great PHP micro framework developed by the same people that are behind the Symfony project. It is in fact using mainly Symfony components but at a more simplified level. Let’s see how we can get started really quickly with a Silex app.

There is more than one way. You can add it as a dependency to an existent composer based project:

"silex/silex": "~1.2",

Or you can even create a new project using a nice little skeleton provided by the creator:

composer.phar create-project fabpot/silex-skeleton

Regardless of how your project is set up, in order to access Elasticsearch we’ll need to use its PHP SDK. That needs to be added to Composer:

"elasticsearch/elasticsearch": "~1.0",

And if we want to use Twig to output data, we’ll need this as well (if not already there of course):

"symfony/twig-bridge": "~2.3"

In order to use the SDK, we can expose it as a service to Pimple, the tiny Silex dependency injection container (much easier than it sounds). Depending on how our project is set up, we can do this in a number of places (see the repository for an example). But basically, after we instantiate the new Silex application, we can add the following:

$app['elasticsearch'] = function() {
  return new Client(array());
};

This creates a new service called elasticsearch on our app that instantiates an object of the Elasticsearch Client class. And don’t forget we need to use that class at the top:

use Elasticsearch\Client;

Now, wherever we want, we can get the Elasticsearch client by simply referring to that property in the $app object:

$client = $app['elasticsearch'];

Connecting to Elasticsearch

In the previous article we’ve managed to get our node data into the node index with each node type giving the name of an Elasticsearch document type. So for instance, this will return all the article node types:

http://localhost:9200/node/article/_search

We’ve also seen how to instantiate a client for our Elasticsearch SDK. Now it’s time to use it somehow. One way is to create a controller:

<?php

namespace Controller;

use Silex\Application;
use Symfony\Component\HttpFoundation\Response;

class NodeController {

  /**
   * Shows a listing of nodes.
   *
   * @return \Symfony\Component\HttpFoundation\Response
   */
  public function index() {
    return new Response('Here there should be a listing of nodes...');
  }
  
  /**
   * Shows one node
   *
   * @param $nid
   * @param \Silex\Application $app
   * @return mixed
   */
  public function show($nid, Application $app) {
    $client = $app['elasticsearch'];
    $params = array(
      'index' => 'node',
      'body' => array(
        'query' => array(
          'match' => array(
            'nid' => $nid,
          ),
        ),
      )
    );
    
    $result = $client->search($params);
    if ($result && $result['hits']['total'] === 0) {
      $app->abort(404, sprintf('Node %s does not exist.', $nid));
    }
    
    if ($result['hits']['total'] === 1) {
      $node = $result['hits']['hits'];
      return $app['twig']->render('node.html.twig', array('node' => reset($node)));
    }
  }
}

Depending on how you organise your Silex application, there are a number of places this controller class can go. In my case it resides inside the src/Controller folder and it’s autoloaded by Composer.

We also need to create a route that maps to this Controller though. Again, there are a couple of different ways to handle this but in my example I have a routes.php file located inside the src/ folder and required inside index.php:

<?php

use Symfony\Component\HttpFoundation\Response;

/**
 * Error handler
 */
$app->error(function (\Exception $e, $code) {
  switch ($code) {
    case 404:
      $message = $e->getMessage();
      break;
    default:
      $message = 'We are sorry, but something went terribly wrong. ' . $e->getMessage();
  }

  return new Response($message);
});

/**
 * Route for /node
 */
$app->get("/node", "Controller\\NodeController::index");

/**
 * Route /node/{nid} where {nid} is a node id
 */
$app->get("/node/{nid}", "Controller\\NodeController::show");

So what happens in my example above? First, I defined an error handler for the application, just so I can see the exceptions being caught and print them on the screen. Not a big deal. Next, I defined two routes that map to my two controller methods defined before. But for the sake of brevity, I only exemplified what the prospective show() method might do:

  • Get the Elasticsearch client
  • Build the Elasticsearch query parameters (similar to what we did in the Drupal environment)
  • Perform the query
  • Check for the results and if a node was found, render it with a Twig template and pass the node data to it.
  • If no results are found, abort the process with a 404 that calls our error handler for this HTTP code declared above.

If you want to follow this example, keep in mind that to use Twig you’ll need to register it with your application. It’s not so difficult if you have it already in your vendor folder through composer.

After you instantiate the Silex app, you can register the provider:

$app->register(new TwigServiceProvider());

Make sure you use the class at the top:

use Silex\Provider\TwigServiceProvider;

And add it as a service with some basic configuration:

$app['twig'] = $app->share($app->extend('twig', function ($twig, $app) {
  return $twig;
}));
$app['twig.path'] = array(__DIR__.'/../templates');

Now you can create template files inside the templates/ folder of your application. For learning more about setting up a Silex application, I do encourage you to read this introduction to the framework.

To continue with our controller example though, here we have a couple of template files that output the node data.

Inside a page.html.twig file:

        <!DOCTYPE html>
        <html>
        <head>
            {% block head %}
                <title>{% block title %}{% endblock %} - My Elasticsearch Site</title>
            {% endblock %}
        </head>
        <body>
        <div id="content">{% block content %}{% endblock %}</div>
        </body>
        </html>

And inside the node.html.twig file we used in the controller for rendering:

{% extends "page.html.twig" %}

{% block title %}{{ node._source.title }}{% endblock %}

{% block content %}

    <article>
        <h1>{{ node._source.title }}</h1>

        <div id="content">

            {% if node._source.field_image %}
                <div class="field-image">
                    {% for image in node._source.field_image %}
                        <img src="http://www.sitepoint.com/integrate-elasticsearch-silex//{{ image.url }}" alt="img.alt"/>
                    {% endfor %}
                </div>
            {% endif %}

            {% if node._source.body %}
                <div class="field-body">
                    {% for body in node._source.body %}
                        {{ body.value|striptags('<p><div><br><img><a>')|raw }}
                    {% endfor %}
                </div>
            {% endif %}
        </div>
    </article>


{% endblock %}

This is just some basic templating for getting our node data printed in the browser (not so fun otherwise). We have a base file and one that extends it and outputs the node title, images and body text to the screen.

Alternatively, you can also return a JSON response from your controller with the help of the JsonResponse class:

use Symfony\Component\HttpFoundation\JsonResponse;

And from your controller simply return a new instance with the values passed to it:

return new JsonResponse($node);

You can easily build an API like this. But for now, this should already work. By pointing your browser to http://localhost/node/5 you should see data from Drupal’s node 5 (if you have it). With one big difference: it is much much faster. There is no bootstrapping, theming layer, database query etc. On the other hand, you don’t have anything useful either out of the box except for what you build yourself using Silex/Symfony components. This can be a good thing or a bad thing depending on the type of project you are working on. But the point is you have the option of drawing some lines for this integration and decide its extent.

One end of the spectrum could be building your entire front end with Twig or even Angular.js with Silex as the API backend. The other would be to use Silex/Elasticsearch for one Drupal page and use it only for better content search. Somewhere in the middle would probably be using such a solution for an entire section of a Drupal site that is dedicated to interacting with heavy data (like a video store or something). It’s up to you.

Conclusion

We’ve seen in this article how we can quickly set up a Silex app and use it to return some data from Elasticsearch. The goal was not so much to learn how any of these technologies work, but more of exploring the options for integrating them. The starting point was the Drupal website which can act as a perfect content management system that scales highly if built properly. Data managed there can be dumped into a high performance data store powered by Elasticsearch and retrieved again for the end users with the help of Silex, a lean and fast PHP framework.

Apr 03 2015
Apr 03

A Silex and Elasticsearch app powered by Drupal 7 for content management

In this tutorial I am going to look at the possibility of using Drupal 7 as a content management system that powers another high performance application. To illustrate the latter, I will use the Silex PHP microframework and Elasticsearch as the data source. The goal is to create a proof of concept, demonstrating using these three technologies together.

esdrupalsilex

The article comes with a git repository that you should check out, which contains more complete code than can be presented in the tutorial itself. Additionally, if you are unfamiliar with either of the three open source projects being used, I recommend following the links above and also checking out the documentation on their respective websites.

The tutorial will be split into two pieces, because there is quite a lot of ground to cover.

In this part, we’ll set up Elasticsearch on the server and integrate it with Drupal by creating a small, custom module that will insert, update, and delete Drupal nodes into Elasticsearch.

In the second part, we’ll create a small Silex app that fetches and displays the node data directly from Elasticsearch, completely bypassing the Drupal installation.

Elasticsearch

The first step is to install Elasticsearch on the server. Assuming you are using Linux, you can follow this guide and set it up to run when the server starts. There are a number of configuration options you can set here.

A very important thing to remember is that Elasticsearch has no access control so, once it is running on your server, it is publicly accessible through the (default) 9200 port. To avoid having problems, make sure that in the configuration file you uncomment this line:

network.bind_host: localhost

And add the following one:

script.disable_dynamic: true

These options make sure that Elasticsearch is not accessible from the outside, nor are dynamic scripts allowed. These are recommended security measures you need to take.

Drupal

The next step is to set up the Drupal site on the same server. Using the Elasticsearch Connector Drupal module, you can get some integration with the Elasticsearch instance: it comes with the PHP SDK for Elasticsearch, some statistics about the Elasticsearch instance and some other helpful submodules. I’ll leave it up to you to explore those at your leisure.

Once the connector module is enabled, in your custom module you can retrieve the Elasticsearch client object wrapper to access data:

$client = elastic_connector_get_client_by_id('my_cluster_id');

Here, my_cluster_id is the Drupal machine name that you gave to the Elasticsearch cluster (at admin/config/elasticsearch-connector/clusters). The $client object will now allow you to perform all sorts of operations, as illustrated in the docs I referenced above.

Inserting data

The first thing we need to do is make sure we insert some Drupal data into Elasticsearch. Sticking to nodes for now, we can write a hook_node_insert() implementation that will save every new node to Elasticsearch. Here’s an example, inside a custom module called elastic:

/**
 * Implements hook_node_insert().
 */
function elastic_node_insert($node) {
  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');
  $params = _elastic_prepare_node($node);

  if ( ! $params) {
    drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));
    return;
  }

  $result = $client->index($params);
  if ($result && $result['created'] === false) {
    drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));
    return;
  }

  drupal_set_message(t('The node has been saved to Elasticsearch.'));
}

As you can see, we instantiate a client object that we use to index the data from the node. You may be wondering what _elastic_prepare_node() is:

/**
 * Prepares a node to be added to Elasticsearch
 *
 * @param $node
 * @return array
 */
function _elastic_prepare_node($node) {

  if ( ! is_object($node)) {
    return;
  }

  $params = array(
    'index' => 'node',
    'type' => $node->type,
    'body' => array(),
  );

  // Add the simple properties
  $wanted = array('vid', 'uid', 'title', 'log', 'status', 'comment', 'promote', 'sticky', 'nid', 'type', 'language', 'created', 'changed', 'revision_timestamp', 'revision_uid');
  $exist = array_filter($wanted, function($property) use($node) {
    return property_exists($node, $property);
  });
  foreach ($exist as $field) {
    $params['body'][$field] = $node->{$field};
  }

  // Add the body field if exists
  $body_field = isset($node->body) ? field_get_items('node', $node, 'body') : false;
  if ($body_field) {
    $params['body']['body'] = $body_field;
  }

  // Add the image field if exists
  $image_field = isset($node->field_image) ? field_get_items('node', $node, 'field_image') : false;
  if ($image_field) {
    $params['body']['field_image'] = array_map(function($img) {
      $img = file_load($img['fid']);
      $img->url = file_create_url($img->uri);
      return $img;
    }, $image_field);
  }

  return $params;
}

It is just a helper function I wrote, which is responsible for “serializing” the node data and getting it ready for insertion into Elasticsearch. This is just an example and definitely not a complete or fully scalable one. It is also assuming that the respective image field name is field_image. An important point to note is that we are inserting the nodes into the node index with a type = $node->type.

Updating data

Inserting is not enough, we need to make sure that node changes get reflected in Elasticsearch as well. We can do this with a hook_node_update() implementation:

/**
 * Implements hook_node_update().
 */
function elastic_node_update($node) {
  if ($node->is_new !== false) {
    return;
  }

  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');
  $params = _elastic_prepare_node($node);

  if ( ! $params) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
    return;
  }

  $result = _elastic_perform_node_search_by_id($client, $node);
  if ($result && $result['hits']['total'] !== 1) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
    return;
  }

  $params['id'] = $result['hits']['hits'][0]['_id'];
  $version = $result['hits']['hits'][0]['_version'];
  $index = $client->index($params);

  if ($index['_version'] !== $version + 1) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
    return;
  }
  
  drupal_set_message(t('The node has been updated in Elasticsearch.'));
}

We again use the helper function to prepare our node for insertion, but this time we also search for the node in Elasticsearch to make sure we are updating and not creating a new one. This happens using another helper function I wrote as an example:

/**
 * Helper function that returns a node from Elasticsearch by its nid.
 *
 * @param $client
 * @param $node
 * @return mixed
 */
function _elastic_perform_node_search_by_id($client, $node) {
  $search = array(
    'index' => 'node',
    'type' => $node->type,
    'version' => true,
    'body' => array(
      'query' => array(
        'match' => array(
          'nid' => $node->nid,
        ),
      ),
    ),
  );

  return $client->search($search);
}

You’ll notice that I am asking Elasticsearch to return the document version as well. This is so that I can check if a document has been updated with my request.

Deleting data

The last (for now) feature we need is the ability to remove the data from Elasticsearch when a node gets deleted. hook_node_delete() can help us with that:

/**
 * Implements hook_node_delete().
 */
function elastic_node_delete($node) {
  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');

  // If the node is in Elasticsearch, remove it
  $result = _elastic_perform_node_search_by_id($client, $node);
  if ($result && $result['hits']['total'] !== 1) {
    drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));
    return;
  }

  $params = array(
    'index' => 'node',
    'type' => $node->type,
    'id' => $result['hits']['hits'][0]['_id'],
  );

  $result = $client->delete($params);
  if ($result && $result['found'] !== true) {
    drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));
    return;
  }

  drupal_set_message(t('The node has been deleted in Elasticsearch.'));
}

Again, we search for the node in Elasticsearch and use the returned ID as a marker to delete the document.

Please keep in mind though that using early returns such as illustrated above is not ideal inside Drupal hook implementations unless this is more or less all the functionality that needs to go in them. I recommend splitting the logic into helper functions if you need to perform other unrelated tasks inside these hooks.

This is enough to get us started using Elasticsearch as a very simple data source on top of Drupal. With this basic code in place, you can navigate to your Drupal site and start creating some nodes, updating them and deleting them.

One way to check if Elasticsearch actually gets populated, is to disable the remote access restriction I mentioned above you need to enable. Make sure you only do this on your local, development, environment. This way, you can perform HTTP requests directly from the browser and get JSON data back from Elasticsearch.

You can do a quick search for all the nodes in Elasticsearch by navigating to this URL:

http://localhost:9200/node/_search

…where localhost points to your local server and 9200 is the default Elasticsearch port.

For article nodes only:

http://localhost:9200/node/article/_search

And for individual articles, by the auto generated Elasticsearch ids:

http://localhost:9200/node/article/AUnJgdPGGE7A1g9FtqdV

Go ahead and check out the Elasticsearch documentation for all the amazing ways you can interact with it.

Conclusion

We’ve seen in this article how we can start working to integrate Elasticsearch with Drupal. Obviously, there is far more we can do based on even the small things we’ve accomplished. We can extend the integration to other entities and even Drupal configuration if needed. In any case, we now have some Drupal data in Elasticsearch, ready to be used from an external application.

That external application will be the task for the second part of this tutorial. We’ll be setting up a small Silex app that, using the Elasticsearch PHP SDK, will read the Drupal data directly from Elasticsearch. As with part 1, above, we won’t be going through a step-by-step tutorial on accomplishing a given task, but instead will explore one of the ways that you can start building this integration. See you there.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web