Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Install and Integrate Elasticsearch with Drupal

Parent Feed: 

A Silex and Elasticsearch app powered by Drupal 7 for content management

In this tutorial I am going to look at the possibility of using Drupal 7 as a content management system that powers another high performance application. To illustrate the latter, I will use the Silex PHP microframework and Elasticsearch as the data source. The goal is to create a proof of concept, demonstrating using these three technologies together.

esdrupalsilex

The article comes with a git repository that you should check out, which contains more complete code than can be presented in the tutorial itself. Additionally, if you are unfamiliar with either of the three open source projects being used, I recommend following the links above and also checking out the documentation on their respective websites.

The tutorial will be split into two pieces, because there is quite a lot of ground to cover.

In this part, we’ll set up Elasticsearch on the server and integrate it with Drupal by creating a small, custom module that will insert, update, and delete Drupal nodes into Elasticsearch.

In the second part, we’ll create a small Silex app that fetches and displays the node data directly from Elasticsearch, completely bypassing the Drupal installation.

Elasticsearch

The first step is to install Elasticsearch on the server. Assuming you are using Linux, you can follow this guide and set it up to run when the server starts. There are a number of configuration options you can set here.

A very important thing to remember is that Elasticsearch has no access control so, once it is running on your server, it is publicly accessible through the (default) 9200 port. To avoid having problems, make sure that in the configuration file you uncomment this line:

network.bind_host: localhost

And add the following one:

script.disable_dynamic: true

These options make sure that Elasticsearch is not accessible from the outside, nor are dynamic scripts allowed. These are recommended security measures you need to take.

Drupal

The next step is to set up the Drupal site on the same server. Using the Elasticsearch Connector Drupal module, you can get some integration with the Elasticsearch instance: it comes with the PHP SDK for Elasticsearch, some statistics about the Elasticsearch instance and some other helpful submodules. I’ll leave it up to you to explore those at your leisure.

Once the connector module is enabled, in your custom module you can retrieve the Elasticsearch client object wrapper to access data:

$client = elastic_connector_get_client_by_id('my_cluster_id');

Here, my_cluster_id is the Drupal machine name that you gave to the Elasticsearch cluster (at admin/config/elasticsearch-connector/clusters). The $client object will now allow you to perform all sorts of operations, as illustrated in the docs I referenced above.

Inserting data

The first thing we need to do is make sure we insert some Drupal data into Elasticsearch. Sticking to nodes for now, we can write a hook_node_insert() implementation that will save every new node to Elasticsearch. Here’s an example, inside a custom module called elastic:

/**
 * Implements hook_node_insert().
 */
function elastic_node_insert($node) {
  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');
  $params = _elastic_prepare_node($node);

  if ( ! $params) {
    drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));
    return;
  }

  $result = $client->index($params);
  if ($result && $result['created'] === false) {
    drupal_set_message(t('There was a problem saving this node to Elasticsearch.'));
    return;
  }

  drupal_set_message(t('The node has been saved to Elasticsearch.'));
}

As you can see, we instantiate a client object that we use to index the data from the node. You may be wondering what _elastic_prepare_node() is:

/**
 * Prepares a node to be added to Elasticsearch
 *
 * @param $node
 * @return array
 */
function _elastic_prepare_node($node) {

  if ( ! is_object($node)) {
    return;
  }

  $params = array(
    'index' => 'node',
    'type' => $node->type,
    'body' => array(),
  );

  // Add the simple properties
  $wanted = array('vid', 'uid', 'title', 'log', 'status', 'comment', 'promote', 'sticky', 'nid', 'type', 'language', 'created', 'changed', 'revision_timestamp', 'revision_uid');
  $exist = array_filter($wanted, function($property) use($node) {
    return property_exists($node, $property);
  });
  foreach ($exist as $field) {
    $params['body'][$field] = $node->{$field};
  }

  // Add the body field if exists
  $body_field = isset($node->body) ? field_get_items('node', $node, 'body') : false;
  if ($body_field) {
    $params['body']['body'] = $body_field;
  }

  // Add the image field if exists
  $image_field = isset($node->field_image) ? field_get_items('node', $node, 'field_image') : false;
  if ($image_field) {
    $params['body']['field_image'] = array_map(function($img) {
      $img = file_load($img['fid']);
      $img->url = file_create_url($img->uri);
      return $img;
    }, $image_field);
  }

  return $params;
}

It is just a helper function I wrote, which is responsible for “serializing” the node data and getting it ready for insertion into Elasticsearch. This is just an example and definitely not a complete or fully scalable one. It is also assuming that the respective image field name is field_image. An important point to note is that we are inserting the nodes into the node index with a type = $node->type.

Updating data

Inserting is not enough, we need to make sure that node changes get reflected in Elasticsearch as well. We can do this with a hook_node_update() implementation:

/**
 * Implements hook_node_update().
 */
function elastic_node_update($node) {
  if ($node->is_new !== false) {
    return;
  }

  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');
  $params = _elastic_prepare_node($node);

  if ( ! $params) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
    return;
  }

  $result = _elastic_perform_node_search_by_id($client, $node);
  if ($result && $result['hits']['total'] !== 1) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
    return;
  }

  $params['id'] = $result['hits']['hits'][0]['_id'];
  $version = $result['hits']['hits'][0]['_version'];
  $index = $client->index($params);

  if ($index['_version'] !== $version + 1) {
    drupal_set_message(t('There was a problem updating this node in Elasticsearch.'));
    return;
  }
  
  drupal_set_message(t('The node has been updated in Elasticsearch.'));
}

We again use the helper function to prepare our node for insertion, but this time we also search for the node in Elasticsearch to make sure we are updating and not creating a new one. This happens using another helper function I wrote as an example:

/**
 * Helper function that returns a node from Elasticsearch by its nid.
 *
 * @param $client
 * @param $node
 * @return mixed
 */
function _elastic_perform_node_search_by_id($client, $node) {
  $search = array(
    'index' => 'node',
    'type' => $node->type,
    'version' => true,
    'body' => array(
      'query' => array(
        'match' => array(
          'nid' => $node->nid,
        ),
      ),
    ),
  );

  return $client->search($search);
}

You’ll notice that I am asking Elasticsearch to return the document version as well. This is so that I can check if a document has been updated with my request.

Deleting data

The last (for now) feature we need is the ability to remove the data from Elasticsearch when a node gets deleted. hook_node_delete() can help us with that:

/**
 * Implements hook_node_delete().
 */
function elastic_node_delete($node) {
  $client = elasticsearch_connector_get_client_by_id('my_cluster_id');

  // If the node is in Elasticsearch, remove it
  $result = _elastic_perform_node_search_by_id($client, $node);
  if ($result && $result['hits']['total'] !== 1) {
    drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));
    return;
  }

  $params = array(
    'index' => 'node',
    'type' => $node->type,
    'id' => $result['hits']['hits'][0]['_id'],
  );

  $result = $client->delete($params);
  if ($result && $result['found'] !== true) {
    drupal_set_message(t('There was a problem deleting this node in Elasticsearch.'));
    return;
  }

  drupal_set_message(t('The node has been deleted in Elasticsearch.'));
}

Again, we search for the node in Elasticsearch and use the returned ID as a marker to delete the document.

Please keep in mind though that using early returns such as illustrated above is not ideal inside Drupal hook implementations unless this is more or less all the functionality that needs to go in them. I recommend splitting the logic into helper functions if you need to perform other unrelated tasks inside these hooks.

This is enough to get us started using Elasticsearch as a very simple data source on top of Drupal. With this basic code in place, you can navigate to your Drupal site and start creating some nodes, updating them and deleting them.

One way to check if Elasticsearch actually gets populated, is to disable the remote access restriction I mentioned above you need to enable. Make sure you only do this on your local, development, environment. This way, you can perform HTTP requests directly from the browser and get JSON data back from Elasticsearch.

You can do a quick search for all the nodes in Elasticsearch by navigating to this URL:


http://localhost:9200/node/_search

…where localhost points to your local server and 9200 is the default Elasticsearch port.

For article nodes only:


http://localhost:9200/node/article/_search

And for individual articles, by the auto generated Elasticsearch ids:


http://localhost:9200/node/article/AUnJgdPGGE7A1g9FtqdV

Go ahead and check out the Elasticsearch documentation for all the amazing ways you can interact with it.

Conclusion

We’ve seen in this article how we can start working to integrate Elasticsearch with Drupal. Obviously, there is far more we can do based on even the small things we’ve accomplished. We can extend the integration to other entities and even Drupal configuration if needed. In any case, we now have some Drupal data in Elasticsearch, ready to be used from an external application.

That external application will be the task for the second part of this tutorial. We’ll be setting up a small Silex app that, using the Elasticsearch PHP SDK, will read the Drupal data directly from Elasticsearch. As with part 1, above, we won’t be going through a step-by-step tutorial on accomplishing a given task, but instead will explore one of the ways that you can start building this integration. See you there.

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web