Improving search result with Search API Solr (Custom boosting)

Parent Feed: 

Introduction

At Finalist we use the Search API and the Search API Solr modules to provide the search functionality for most of our websites. With a little bit of configuration you can get a great search experience that works perfectly for a basic website. However, sometimes customers want more then a standard implementation. In this post I’ll explain more about some of the improvements we make and how these work. The following topics will be covered in a series of blogs:

  • Stemming
  • Partial search
  • Better search excerpts
  • Custom boosting

Custom Boosting

The Search API Solr module gives the user the ability to add a boost to different fields to help Apache Solr to determine the relevance for each search result. The relevance can be used to order your search results and help the user find the most important search results. One important thing is missing from the boosting options in the Search API Solr module. It does not allow users to add a boost for different values within fields. This is what hook_search_api_solr_query_alter() can be used for.

Solr has the options to add values to the bq parameter in the search results. This parameter can be used to boost specific field and/or values. You can read more about this parameter on the Apache Solr wiki.

Example hook_search_api_solr_query_alter() Below you find an example of hook_search_api_solr_query_alter() to easily implement this yourself. This example allow you to add an extra boost to the search results for specific node types. A similar approach can be used to boost results based on taxonomy terms etc.

function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
  // Boost news and blog nodes in Solr results.
  $call_args['params']['bq'][] = '(ss_type:”news”^4 OR ss_type:”blog”^2)';
}

Boosting exact matches

As explained in the previous chapter, Solr allows boosting for custom fields or conditions. While you might want to find more results based on stemming, you probably want the results matching the exact search phrase to appear higher in the search results.

FieldType in schema.xml The basic fieldType text in the schema.xml file has some filters to support stemming etc. For exact search boosting this could be a problem. That’s why it is probably a good idea to make a separate fieldType with better support for exact matches.

<!-- add textExact field to boost exact matched -->
<fieldType name="textExact" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LengthFilterFactory" min="2" max="100" />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LengthFilterFactory" min="2" max="100" />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="multiterm">
       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LengthFilterFactory" min="2" max="100" />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
</fieldType>

After creating this fieldType, we need to apply this to all fields where we want exact matching. For this example we will add the textExact fieldType to the title field. We do this by making a copy of the original.

<field name="title" type="text" stored="true" indexed="true"/>
<!-- add titleExact field to boost exact matched -->
<field name="titleExact" type="textExact" indexed="true" stored="true" />
<copyField source="title" dest="titleExact"/>

Example hook_search_api_solr_query_alter() After changing the schema.xml file, the search result should not have changed yet. This is because the Solr query doesn’t use our new field yet. To boost the exact search field, we can implement hook_search_api_solr_query_alter() and make sure our new field is used.

We need to add our new titleExact field to the query field throught the qf param. We can add a boost for matches in this field through the pf param. This boosts the result based on every exact word match in the title, but doesn’t boost entire phrase matches yet. To give an extra boost to exact phrases, we can fetch the search keyword in our query and use the bq param to give a bug boost to titleExact field that match the entire keyword phrase.

function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
        // Boost exact title matched.
        $keys = $query->getKeys();
        unset($keys['#conjunction']);
        $call_args['params']['qf'][] = 'titleExact^5';
        $call_args['params']['pf'][] = "titleExact^5";
        $call_args['params']['bq'][] = 'titleExact:"' . implode(' ', $keys) . '"^10';
}
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web