Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Make Search API work with Entity Translation

Parent Feed: 

The Entity Translation module is the latest and greatest way to manage multilingual content in Drupal. However, no Drupal search solution currently supports the field based translation it is built on. Here’s how we solved this challenge, made field based translations searchable and came up with the brand new Search API Entity Translation module – and how we can make it even better together.

The Situation

The Entity Translation (ET) module is essentially an interface and workflow for Drupal 7’s field translations, which are provided by the translation capability of core’s Field API. It will probably be the basis of translations in Drupal 8.

Managing multilingual content via Entity Translation offers interesting advantages compared to the “old school” of node-based core translation:

  • ET works with potentially all fieldable entities
  • one entity can have language-neutral fields like numbers side-by-side with translated fields like a description text
  • ET provides a language fallback mechanism when content isn’t available in the current language
  • ET allows for semantically better data structure
  • ET keeps together one piece of content in one entity
  • working with references between entities is much more efficient

So, Entity Translation is definitely a great development and great solution for many use cases.

However, being the cutting edge technology it is, there are some gaps left open, which a Drupal developer has yet to fill. One of those gaps is the fact that currently no search module for Drupal supports Entity Translation (with its foundation being the field based translations capability of Drupal core's Field API).

Wait a second – once you decide on using Entity Translations, then you are stuck without any ability to search through or find your precious translations? Yep. If you're used to build on the latest technology, you're also used to closing a lot of gaps. So we asked ourselves: What do we do about this? Entity Translation is a great module – so how can we all help making it even better supported and integrated?

The answer is obviously the Drupal way: Help fixing it and contribute.

The Problem

The Problem is two-fold: first, no search module for Drupal “knows” of the concept of translatable fields yet. So whatever we do, we have to build this concept into the search solution.

Second, all search modules equate entities like nodes with search documents (search backends like Apache Solr organize their data in indexes consisting of documents). So one entity is one search document. And many search backends like Apache Solr are not able to manage one document in more than one version or language. One entity is one search document. One document is one language/version. Thus, one entity is one language.

So searching for multilingual content is no problem, as long as one node carries only one of the translations – and that’s exactly the way the old node-based core translation mechanism works, which organizes multiple language version of the same content across multiple nodes, forming translation sets.

In contrast, the field based translation mechanism used by Entity Translation manages multiple language versions of individual fields within one and the same entity. All translations exist within one entity. So with this newer approach, the English and French versions of a, say, a node share the same id.

However, for all current Drupal 7 search solution (Core Search, Apache Solr and Search API), one node with one node ID is one atomic document in the search index, and is not supposed to have any language versions or individually translated fields. If a node with individually translated fields is indexed, it is always indexed in it’s origin language only, and in no other languages.

Double Bummer. So, somehow, we have to come up with a way to let the search solution 1) know of the field based translations managed by Entity Translation, and 2) get around the limitations of the equation “one entity is one document is one language”.

The Search API, with its extensive modularity and flexible architecture, seems to be the right Drupal search module to try this.

The Concept

So, what’s the “minimum viable product” here? What does a solution at least have to provide?

Well, the end user primarily wants to find the content he is looking for, using a search term in a language of his choosing. How to do this?

Let's KISS: Let’s simply make all translations of a content accessible to Search API at once (and not only the original language version). Why not simply concatenate the renderings of our entity and its translations together in one full text field and get that funneled into Search API (later we answer the "why not" part ;). Sounds blunt, but worth a try.

Luckily, Search API builds on the Entity API module, which offers hook-based extension of any entity by ‘properties’ that can be retrieved via a ‘getter callback’. Such an entity property is automatically available as search field in the Search API index UI. And we can do all the concatenation in the property’s getter callback. Lean. Mean. Let's do it!

The Coding

By implementing hook_entity_property_info_alter, the new entity property ‘unified_fulltext’ is introduced, and its retrieval delegated to the getter callback ‘search_api_et_fulltext_get’.

function search_api_et_entity_property_info_alter(&$info) {
  foreach (search_api_et_entity_types() as $etype => $einfo) {
    $info[$etype]['properties']['unified_fulltext'] = array(
      'type' => 'text',
      'label' => t('Multilingual full text (all languages via entity translation)'),
      'sanitized' => TRUE,
      'getter callback' => 'search_api_et_fulltext_get',
    );
  }
}

The getter is called with the search $item (aka the entity for an entity based Search API index, which we are using here). So we just iterate through the $item->translations and render the entity using entity_view() with the view mode configured for the entity type (defaults to “search_index”, which is (re)introduced by our module as well, just in case core search module is disabled).

  // Iterate through the $item->translations to render the entity in each lang.
  foreach ($item->translations->data as $langcode => $translation) {
    if ($translation['status']) {
      $render = entity_view($type, array($item), $view_mode, $langcode);
      $context = array('item' => $item, 'options' => $options, 'name' => $name, 'type' => $type, 'view_mode' => $view_mode, 'language' => $langcode);
      // Invoke some hooks, before or after the rendering, for alterations.
      drupal_alter('search_api_et_fulltext_prerender', $render, $context);
      $render = drupal_render($render);
      drupal_alter('search_api_et_fulltext_postrender', $render, $context);
      $fulltext .= $render;
    }
  }

As it turns out, there are cases where one needs to alter the entity before, or its textual representation after the rendering. So we let others do exactly this, by invoking a new hook_search_api_et_fulltext_prerender_alter just before rendering (having the structured entity data and render array available for alteration) and once just after the rendering via the new hook_search_api_et_fulltext_postrender_alter. (In our case, we had to add panelizer pane content before and had to eliminate ­ html codes after rendering for our indexing).

One more thing: When a translation is edited, no hook_node_* gets fired, and Search API doesn’t know something happened with this piece of content. So we finally need to implement Entity Translation’s hook_entity_translation_save() to react on translation editing with the invalidation of the respective entity in the search index, to get it scheduled for reindexing.

The Module

Introducing: The  module!

To be able to search in and find all translations of your content, you can now just get the search_api_et.module from Drupal.org, install / enable it, and enjoy the basic ability of searching through field translation based content in Search API:

All you have to do after enabling the module is adding the new property “Multilingual full text” as a search field to your node based Search API index. Then (re)index the content – et voilà: You can search and find content in all its entity translations. Sweet :)

The Result

So, how well does it work? – For all the simplification and stupidity involved? Surprisingly well.

Of course, bluntly concatenating all translations into one full text field opens up a bunch of potential drawbacks. First, the user still can’t search language aware, meaning that the user can’t search in content of a specific language only, just in all language versions at once (so the user will find the content, but maybe more unwanted stuff from other languages, too). This leads potentially to search result clutter, and also to confusion. If a search on the term “war” is found in results in English (‘war’ meaning “warfare”) and in results in German (‘war’ meaning “[sb./sth.] was”), users may get confused by having totally different topics mixing or showing up. This happens, because if the search term is found in the concatenated full text field (containing every language version of a content), this content will show up in the search results. The solution offered here covers all language versions of a content, but is not language aware.

In reality, the visibility of those drawbacks are highly depending on the topics of the site, the search and the languages involved. In most cases, these drawbacks rarely actually occur. But a dirty solution stays dirty, and Murphy would entirely agree that dirt hits the fan some time you’ll least expect it and you’ll least know what's happening. So let’s fix this fix.

Make it better, do it right

Okay. We know this blunt instrument is working, but has to be replaced by a more delicate and aware way of handling multilingual indexing and searching in Search API, because we don’t want this bluntness to throw back on us at some point of time. But what to do – and how?

First the what: The short term goal for Search API Entity Translation module is to implement real language specific and aware full text search in Search API (so for the beginning, we don’t specifically aim at offering language aware search on the field level). The requirement here is that a user can find content in any specific language via a full text search. So the language version for search and retrieval of content must be distinguishable and limitable.

Currently there seem to be several – at least two – ways to achieve this, which are outlined in http://drupal.org/node/1393058. The most promising way has been suggested by Search API maintainer Thomas Seidl: A data source plug-in that exposes different field-translated language version of an entity in some distinguishable form to Search API (e.g. as dedicated search items, or as language dependent dynamic fields).

Before driving this further, let’s get your opinion and expertise on board – Just …

  • tell us if you need a more delicate language aware search for entity translations at all
  • state your thoughts of this, and if all this makes sense to you
  • suggest other possible ways of getting better support of entity translations into Search API
  • participate in finding and deciding on the further work in the project’s issue queue at http://drupal.org/node/1393058

Or just give your general feedback as comments below!

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web