Jul 16 2018
Jul 16

Modern applications are expected to be equipped with powerful search engines. Drupal provides a core search module that is capable of doing a basic keyword search by querying the database. When it comes to storing and retrieving data, databases are very efficient and reliable. They can be also used for basic filtering and aggregating of data. However, they are not very efficient when it comes to searching for specific terms and phrases.

seriers of files with the bookmark


Performing inefficient queries on large sets of data can result in a poor performance. Moreover, what if we want to sort the search results according to their relevance, implement advanced searching techniques like autocompletion, full-text, fuzzy search or integrate search with RESTful APIs to build a decoupled application?

This is where dedicated search servers come into the picture. They provide a robust solution to all these problems. There are a few popular open-source search engines to choose from, such as Apache Solr, Elasticsearch, and Sphinx. When to use which one depends on your needs and situation, and is a discussion for another day. In this article, we are going to explore how we can use Elasticsearch for indexing in Drupal.

What is Elasticsearch?

“Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.” – elastic.co 

It is a search server built using Apache Lucene, a Java library, that can be used to implement advanced searching techniques and perform analytics on large sets of data without compromising on performance.

“You Know, for Search”

It is a document-oriented search engine, that is, it stores and queries data in JSON format. It also provides a RESTful interface to interact with the Lucene engine. 

Many popular communities including Github, StackOverflow, and Wikipedia benefit from Elasticsearch due to its speed, distributed architecture, and scalability.

Downloading and Running Elasticsearch server

Before integrating Elasticsearch with Drupal, we need to install it on our machine. Since it needs Java, make sure you have Java 8 or later installed on the system. Also, the Drupal module currently supports the version 5 of Elasticsearch, so download the same.

  • Download the archive from its website and extract it
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.10.tar.gz
$ tar -zxvf elasticsearch-5.6.10.tar.gz
  • Execute the “elasticsearch” bash script located inside the bin directory. If you are on Windows, execute the “elasticsearch.bat” batch file
$ elasticsearch-5.6.10/bin/elasticsearch

The search server should start running on port 9200 port of localhost by default. To make sure it has been set up correctly, make a request at http://localhost:9200/ 

$ curl http://localhost:9200

If you receive the following response, you are good to go

{
  "name" : "hzBUZA1",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "5RMhDoOHSfyI4a9s78qJtQ",
  "version" : {
    "number" : "5.6.10",
    "build_hash" : "b727a60",
    "build_date" : "2018-06-06T15:48:34.860Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

Since Elasticsearch does not do any access control out of the box, you must take care of it while deploying it.

Integrating Elasticsearch with Drupal

Now that we have the search server up and running, we can proceed with integrating it with Drupal. In D8, it can be done in two ways (unless you build your own custom solution, of course).

  1. Using Search API and Elasticsearch Connector
  2. Using Elastic Search module

Method 1: Using Search API and Elasticsearch Connector

We will need the following modules.

However, we also need two PHP libraries for it to work – des-connector and php-lucene. Let us download them using composer as it will take care of the dependencies.

$ composer require 'drupal/elasticsearch_connector:^5.0'
$ composer require 'drupal/search_api:^1.8'

Now, enable the modules either using drupal console, drush or by admin UI.

$ drupal module:install elasticsearch_connector search_api

or

$ drush en elasticsearch_connector search_api -y

You can verify that the library has been correctly installed from Status reports available under admin/reports/status.

status of the library under Status ReportsViewing the status of the library under Status Reports

Configuring Elasticsearch Connector

Now, we need to create a cluster (collection of node servers) where all the data will get stored or indexed.

  1. Navigate to Manage → Configuration → Search and metadata → Elasticsearch Connector and click on “Add cluster” button
  2. Fill in the details of the cluster. Give an admin title, enter the server URL, optionally make it the default cluster and make sure to keep the status as Active.Adding an Elasticsearch ClusterAdding an Elasticsearch Cluster
  3. Click on “Save” button to add the cluster

Adding a Search API server

In Drupal, Search API is responsible for providing the interface to a search server. In our case, it is the Elasticsearch. We need to make the Search API server to point to the recently created cluster.

  1. Navigate to Manage → Configuration → Search and metadata → Search API and click on “Add server” button
  2. Give the server a suitable name and description. Select “Elasticsearch” as the backend and optionally adjust the fuzzinessAdding a Search API serverAdding a Search API server
  3. Click on “Save” to add the serverViewing the status of the newly added serverViewing the status of the newly added server

Creating a Search API Index and adding fields to it

Next, we need to create a Search API index. The terminologies used here can be a bit confusing. The Search API index is basically an Elasticsearch Type (and not Elasticsearch index). 

  1. On the same configuration page, click on “Add Index” button
  2. Give an administrative name to the index. Select the entities in the data sources which you need to indexAdding the data sources of the search indexAdding the data sources of the search index
  3. Select the bundles and language to be indexed while configuring the data source, and also select the indexing order.Configuring the added data sourcesConfiguring the added data sources
  4. Next, select the search API server, check enabled. You may want to disable the immediate indexing. Then, click on “Save and add fields”Configuring the search index optionsConfiguring the search index options
  5. Now, we need to add the fields to be indexed. These fields will become the fields of the documents in our Elasticsearch index. Click on the “Add field” button.
  6. Click on “Add” button next to the field you wish to add. Let’s add the title and click on “Done”Adding the required fields to the indexAdding the required fields to the index
  7. Now, configure the type of the field. This can vary with your application. If you are implementing a search functionality, you may want to select “Full-text”Customizing the fields of the indexCustomizing the fields of the index
  8. Finally, click on “Save Changes”

Processing of Data

This is an important concept of how a search engine works. We need to perform certain operations on data before indexing it into the search server. For example, consider an implementation of a simple full-text search bar in a view or a decoupled application. 

  1. To implement this, click on the “Processors” tab. Enable the following and arrange them in this order.
    1. Tokenization: Split the text into tokens or words
    2. Lower Casing: Change the case of all the tokens into lower
    3. Removing stopwords: Remove the noise words like ‘is’, ‘the’, ‘was’, etc
    4. Stemming: Chop off or modify the end of words like  ‘–-ing’, ‘–uous’, etc

      Along with these steps, you may enable checks on Content access, publishing status of the entity and enable Result Highlighting

  2. Scroll down to the bottom, arrange the order and enable all the processes from their individual vertical tabs.Arranging the order of ProcessorsArranging the order of Processors
  3. Click on “Save” to save the configuration.

Note that the processes that need to be applied can vary on your application. For example, you shouldn’t remove the stopwords if you want to implement Autocompletion.

Indexing the content items

By default, Drupal cron will do the job of indexing whenever it executes. But for the time being, let’s index the items manually from the “View” tab.

Indexing the content itemsIndexing the content items

Optionally alter the batch size and click on “Index now” button to start indexing.

Wait for the indexing to finishWait for the indexing to finish

Now, you can view or browse the created index using the REST interface or a client like Elasticsearch Head or Kibana. 

$ curl http://localhost:9200/elasticsearch_drupal_content_index/_search?pretty=true&q=*:*
Creating a view with full-text searchCreating a view with full-text search

You may create a view with the search index or use the REST interface of Elasticsearch to build a decoupled application.

Example of a full-text search using Drupal viewExample of a full-text search using Drupal view

Method 2: Using Elastic Search module

As you may notice, there is a lot of terminology mismatch between Search API and Elasticsearch’s core concepts. Hence, we can alternatively use this method.

For this, we will need the Elastic Search module and 3 PHP libraries – elasticsearch, elasticsearch-dsl, and twlib. Let’s download the module using composer.

$ composer require 'drupal/elastic_search:^1.2'

Now, enable it either using drupal console, drush or by admin UI.

$ drupal module:install elastic_search

or

$ drush en elastic_search -y

Connecting to Elasticsearch Server

First, we need to connect the module with the search server, similar to the previous method.

  1. Navigate to Configuration → Search and metadata → Elastic Server
  2. Select HTTP protocol, add the elastic search host and port number, and optionally add the Kibana host. You may also add a prefix for indices. Rest of the configurations can be left at defaults.Adding the Elasticsearch serverAdding the Elasticsearch server
  3. Click on “Save configurations” to add the server

Generating mappings and configuring them

A mapping is essentially a schema that will define the fields of the documents in an index. All the bundles of entities in Drupal can be mapped into indices.

  1. Click on “Generate mappings”
  2. Select the entity type, let’s say node. Then select its bundles. Optionally allow mapping of its childrenAdding the entity and selecting its bundles to be mappedAdding the entity and selecting its bundles to be mapped
  3. Click on “Submit” button. It will automatically add all the fields, you may want to keep only the desired fields and configure them correctly. Their mapping DSL can also be exported.Configuring the fields of a bundleConfiguring the fields of a bundle

Generating index and pushing the documents

Now, we can push the indices and the required documents to the search server.

  1. For that, move on to the indices tab, click on “Generate New Elastic Search Indices” and then click on “Push Server Indices and Mappings”. This will create all the indices on the server.
  2. Now index all the nodes using “Push All Documents”. You may also push the nodes for a specific index. Wait for the indexing to finish.Managing the indices using the admin UIManaging the indices using the admin UI

Conclusion

Drupal entities can be indexed into the Elasticsearch documents, which can be used to create an advanced search system using Drupal views or can be used to build a decoupled application using the REST interface of Elasticsearch. 
While Search API provides an abstract approach, the Elastic Search module follows the conventions and principles of the search engine itself to index the documents. Either way, you can relish the flexibility, power, and speed of Elasticsearch to build your desired solution.

Jun 21 2018
Jun 21

There has been a rapid increase in the popularity of JavaScript frameworks since their introduction in early 2010. They provide powerful architectures to build fluid, responsive and user-friendly web applications. Moreover, there are more people than ever using their mobile devices to access the digital content, hence building native applications for your site makes sense. 

headless drupal with a headless body on right and body-less head on left

Drupal has realized the potential of this market and has added the support for building RESTful APIs into the core. But the RESTful Web services of Drupal core does not provide a very robust solution out-of-the-box. You need to enable all the resources, configure the endpoints, verbs, authentication mechanisms, and create views with REST export to build the desired solution.

RESTful Web services of Drupal core do not provide a very robust solution out-of-the-box

But even then, the APIs built this way, do not necessarily follow any widely accepted guidelines or specifications like JSON API. You can always write custom logic, but luckily there is a contributed module for that. Before understanding this how this module proves to be a robust solution to build decoupled applications, let us clear some basics.

What is an API?

In terms of web services, an API is an agreement or a contract of request and response between the server(provider) and the client(consumer) for the purpose of exchange of data. It is that element bridges the front end and the back end. It defines which resources are accessible, who can access them, and how to access them.

What is JSON?

JavaScript Object Notation (JSON) is the most common data format for exchange of data over web services. It has primarily replaced XML due to its lightweight nature. It is easier for humans to read and for machines to parse. It is supported by almost every modern programming language and framework.

JSON API specifications – What and Why you should consider implementing them?

The JSON API specifications are a set of standards and conventions that describe how the APIs should be served by the servers and consumed by the clients for exchanging data in JSON format. The key benefits of implementing these specifications include:

  • Consistency
    The front end developers expect a consistent structure and behavior from the APIs while consuming them to build the applications.
     
  • Widely accepted and supported
    The specifications are widely accepted and implementations of client libraries can be found for almost every programming language and framework.
     
  • Efficiency
    The specification is designed to keep the number of requests and size of the data to a minimum.
     
  • Productivity
    There are numerous ways of designing an API and as a developer, you will often find yourself in an argument on what are the best practices or conventions to build an API. By following these set of standards, you can eliminate this and focus on building the application. 

Now, that we understand the foundations, let us see how does the JSON API module help in building a headless website in Drupal.

JSON API specifications of what and why are important.

Downloading and installing JSON API module

The module has a dependency on the Serialization module, so enable it first and then download and install JSON API using any of the below methods:

Using Drush

$ drush en serialization -y

$ drush dl jsonapi && drush en jsonapi -y

Using Drupal Console  

$ drupal module:install serialization
$ drupal module:download jsonapi && drupal module:install jsonapi

Using UI

Enabling JSON API and Serialization modules in drupal websiteEnabling JSON API and Serialization modules
  • Navigate to Manage → Extend → Install new module and enter the .tar.gz or .zip URL of the module and click on Install.
  • Once the downloader finishes downloading, click on “Enable newly added modules”.
  • Select the Serialization and the JSON API module under the Web Services package and click on “Install”.

How does JSON API help?

The module provides an implementation of the above-discussed specifications for Drupal. Following are the set of features of the API provided by the module.
 

  1. Zero configuration required – Production-ready out-of-the-box

    As soon as you enable the module, it exposes all the resources with appropriate verbs, endpoints, and fields. It allows no configurations to be modified (more on that later). This ensures that API always follows the JSON API specifications and also makes the deployment quick and easy.

    All the bundles get their unique endpoints, in case an entity does not have a bundle, the entity type is repeated. For example, in case of articles, it will be /jsonapi/node/article but for users, it will be /jsonapi/user/user. We need to specify the UUID (Universally Unique Identifier) of the entity, else we will receive a “collection” of the entities.

    The standard Drupal permissions determine the accessibility of the resource. So, you may need to authenticate using Basic Auth or OAuth to perform certain operations.

    You can make the following standard requests to perform CRUD operations on the resources. Note that the configuration entities only support the read operation, i.e. only the GET request.
     

    Accept: application/vnd.api+json
    Authorization: Basic {base64 encoded username + password}
    Content-Type:application/vnd.api+json
    
    GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json
    GET /jsonapi/{entity-type}/{bundle-name}/{uuid}?_format=api_json 
    POST /jsonapi/{entity-type}/{bundle-name}?_format=api_json
    PATCH /jsonapi/{entity-type}/{bundle-name}/{uuid}?_format=api_json
    DELETE /jsonapi/{entity-type}/{bundle-name}/{uuid}?_format=api_json
    
    For example, the following request will return a JSON response of the collection of all the articles (paginated in the sets of 50 articles per page).
    GET /jsonapi/node/article?_format=api_json
    
    To get a specific article you need to specify its UUID. For example: 
    GET /jsonapi/node/article/6a1571e0-26c7-423f-8ff5-04b2e4deb6d3?_format=api_json
    
    Retrieving a collection of articlesRetrieving a collection of articles
  2. Include relationships

    A “relationships” object contains all the related information to the resource. This may be the author, entity references, image fields, revision details, etc. Usually, you would have to make additional requests to retrieve further information. But instead, we can add a request parameter “include” and specify the required fields of this related information to be retrieved.

    All the additional fields will be available in the “included” object. This ensures that we receive all the required data in one single request. You can even use nesting in the relationships, i.e. if the object further has relationships, it can be specified using the “.” operator.

    GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&include={relationships-object}
    
    For example, the article bundle has an image (entity reference) field, we can retrieve its path along with the article as follows.
    GET /jsonapi/node/article?_format=api_json&include=field_image
    
    This time, we will receive the JSON data with an “included” object along with the data. As you may notice, it further has UID in the relationships object, we can retrieve that by using include=field_image.uid.
     "included": [
            {
                "type": "file--file",
                "id": "e7f9cd27-3cd0-43d3-b205-b46e88d09109",
                "attributes": {
                    "fid": 12,
                    "uuid": "e7f9cd27-3cd0-43d3-b205-b46e88d09109",
                    "langcode": "en",
                    "filename": "gen50F1.tmp.jpg",
                    "uri": "public://2018-04/gen50F1.tmp.jpg",
                    "filemime": "image/jpeg",
                    "filesize": 5652,
                    "status": true,
                    "created": 1523243077,
                    "changed": 1523243077,
                    "url": "/drupal-8.4.4/sites/default/files/2018-04/gen50F1.tmp.jpg"
                },
                "relationships": {
                    "uid": {
                        "data": {
                            "type": "user--user",
                            "id": "434ec884-0f9b-4593-8bc4-ef58e542ac0e"
                        },
                        "links": {
                            "self": "/drupal-8.4.4/jsonapi/file/file/e7f9cd27-3cd0-43d3-b205-b46e88d09109/relationships/uid",
                            "related": "/drupal-8.4.4/jsonapi/file/file/e7f9cd27-3cd0-43d3-b205-b46e88d09109/uid"
                        }
                    }
                },
                "links": {
                    "self": "/drupal-8.4.4/jsonapi/file/file/e7f9cd27-3cd0-43d3-b205-b46e88d09109"
                }
            }
        ]
    
  3. Filtering

    Filters can be applied to the collections to retrieve only the required resources. They can be added using the “filter” request parameter. We need to specify the field on which the comparison has to be done, the value with which we need to compare and optionally specify the operator(default is ‘=’).

     GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&filter[label][condition][path]={field}&filter[label][condition][operator]={operator}&filter[label][condition][value]={value}
    

    Or

     GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&filter[field][operator]={operator}&filter[field][value]={value}
    

    For example, we can search an article with the title that contains some keywords.

     GET /jsonapi/node/article?_format=api_json&filter[title][operator]=CONTAINS&filter[title][value]=search-keyword
    

    We can even use nested and grouped filters for advanced use cases. Read the official documentation for complete reference.

  4. Paging

    Paging is a common technique used to divide a long listing of items into pages. It can be used to implement an infinite scroll, or simply a pager. This requires two parameters – limit and offset. Limit decides maximum number (default is 50) and offset(default is 0) can be used to skip first ‘n’ items or resources. The presence of “next” and “prev” links indicates our position in the pager.

    ​​​​​​​GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&page[offset]={offset}&page[limit]={limit}
    

    For example, we can build an infinite feed of articles. When the user scrolls to the bottom, we can use the following request asynchronously to fetch let’s say 10 more articles.

    ​​​​​​​GET /jsonapi/node/article?_format=api_json&page[offset]=10&page[limit]=10
    
  5. Sparse Fieldsets

    We can specify the fields of the resource which are required performing a GET request using the “fields” parameter. This is useful when we need only a limited information and save bandwidth.

    GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&fields[entity-type--bundle]={field(s)}
    

    For example, to display only the titles of all the articles we can use the following request.

    GET /jsonapi/node/article?_format=api_json&fields[node--article]=title
     
     "data": [
            {
                "type": "node--article",
                "id": "6a1571e0-26c7-423f-8ff5-04b2e4deb6d3",
                "attributes": {
                    "title": "Drupal 8 is awesome"
                },
                "links": {
                    "self": "drupal-8.4.4/jsonapi/node/article/6a1571e0-26c7-423f-8ff5-04b2e4deb6d3"
                }
            },
            {...}
    ]
    
  6. Sorting

    To sort a collection, we can add a “sort” parameter, specifying the field and the sort direction.

     GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&sort[label][path]={field}&sort[label][direction]={ASC/DESC}
    

    Or

     GET /jsonapi/{entity-type}/{bundle-name}?_format=api_json&sort=±{field}
    

    For example, to retrieve all the articles sorted by their created date and then by their titles, we can use this request:

     GET /jsonapi/node/article?_format=api_json&sort=created,title
    

    This is not an exhaustive list, please refer the official documentation for more usage details and examples. But, with the right mix of these features, we can easily implement all the required features in our headless website.

Customizing the API

The module does not provide any way to configure the resources and the API. We need to install an additional module, JSON API Extras, in order to customize the API. This module allows us to configure the endpoints, fields and enabling or disabling the resources.

  1. Navigate to Manage → Configuration → Web services → JSON API Overwrites. This lists all the available resources. All of them are enabled by default.Admin UI for managing the resourcesAdmin UI for managing the resources
  2. Click on the “Overwrite” button next to the resource you wish to customize.
     
  3. You can alter the resource type, path or the endpoint, disable specific fields, and give an alias to fields. You may disable the resources that are not required, rest of the configurations can be mostly left untouched.
     Configuration options for resourcesConfiguration options for resources
  4. Click on “Save” when done to save the configuration.

To Conclude

The JSON API module provides production-ready API out of the box. It provides standard HTTP methods to perform basic CRUD operations on entities. It also provides some advanced features including paging, sorting, and filtering to retrieve all the required data in a single request.

However, it lacks few features like registering a user, logging in a user, checking login status, etc. But we can use Drupal core’s REST web service for this purpose and build a headless website or a native mobile application by using best of both worlds. 

At OpenSense Labs, we have worked on decoupled Drupal projects, drop a mail at [email protected] to connect with us. 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web