Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Improving search result with Search API Solr (Better search excerpts)

Parent Feed: 

Introduction

At Finalist we use the Search API and the Search API Solr modules to provide the search functionality for most of our websites. With a little bit of configuration you can get a great search experience that works perfectly for a basic website. However, sometimes customers want more then a standard implementation. In this post I’ll explain more about some of the improvements we make and how these work. The following topics will be covered in a series of blogs:

  • Stemming
  • Partial search
  • Better search excerpts
  • Custom boosting

Better search excerpts

The Search API Solr module allows you to configure your servers to provide excerpts for each search result. On the server configuration page you need to check ‘Return an excerpt for all results’ in the advanced section to make it work (in the latest version of the module you also need to check ‘Highlighting’ processor on the ‘Filters’ tab of your search index!). After this you can configure your search views to return an excerpt for your search results.

We’ve had a lot of questions about these excerpts. By default there are several issues which can be easily solved.

Words of 3 characters (or less) are ignored when returning excerpts

The spell field in the schema.xml file is used to store a long string of all content of a specific document. The spell field is also used by the Search API Solr module to get excerpts for each search result. The following section in the schema.xml file contains the spell field definition:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
       <tokenizer class="solr.StandardTokenizerFactory" />
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
       <filter class="solr.LengthFilterFactory" min="3" max="25" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
   </analyzer>
</fieldType>

The LengthFilterFactory limits the length of each word that can be used in the excerpts. Since the min length is 3, searching for the keywords ‘Apache Solr API’ will not highlight the word ‘API’ in all excerpts. By changing the min and max values you can make sure that all keywords in the search are properly highlighted.

When stemming is used, the excerpt for stemmed results is not returned

Stemmed words as described in a previous blog post can return search results, but by default when a user searches for the word ‘cars’, the word ‘car’ will not be highlighted in the search result excerpts. To add this, we need something like to SnowballPorterFilterFactory to provide stemming in our excerpts. By adding the line below to the section for the fieldType textSpell, stemmed words will also be highlighted:

<filter class="solr.PorterStemFilterFactory"/>

The excerpts are not very clear, random pieces of text are shown

Apache Solr provides an extended list of highlighting parameters to help creating better excerpts. Drupal has some pretty good defaults for this. There are several parameters that we change to make it a little better for the user. There parameters can be changed in hook_search_api_solr_query_alter() which allows you to change all the URL parameters right before the query is sent to Solr. We like to change the following parameters.

hl.snippets The snippets parameter is used to define how much different highlight snippets Apache Solr should return based on the keywords in your search. All excerpts are concatenated and separated by a couple of dots by Drupal. Since it looks like a full sentence, users get confused when there are too many snippets.

hl.fragsize The fragsize parameter is used to define how many characters each of the highlight snippets should be. Making these too short will give very strange excerpts, but making these too long will also make the search page harder to scan for proper results. Especially when you also return multiple highlighting snippets through the hl.snippets parameter.

hl.mergeContiguous The mergeContiguous parameter determines if Solr should combine multiple highlighting results as a single excerpt. We’ve noticed it is harder to read the excerpts when this parameter is true.

Example hook_search_api_solr_query_alter() Below you find an example of hook_search_api_solr_query_alter() to easily implement this yourself.

function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
  // Change hl settings for better excerpts.
  $call_args['params']['hl.snippets'] = '2';
  $call_args['params']['hl.fragsize'] = '100';
  $call_args['params']['hl.mergeContiguous'] = 'false';
}
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web