Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Keeping Solr Search Index Updated with Shared Database Sites

Parent Feed: 

Sites using shared databases for tables like shared users are great for sharing users between sites. It works for many other entities as well. The tricky part is using Solr indexing on this content.

The way Solr indexes things is when a new entity that should be indexed is added it is marked for Solr indexing as well. This works great but it's limitation is that the Solr index itself is stored in the current site's database. This means that if you have Site A and Site B and Site A adds a user (to the shared user table that Site B also uses), site A will mark it to be indexed by Solr. In turn Solr will add it to Site A's Solr index, but Site B still misses the Solr index marking (since as far as site B is concerned it now always existed.) To solve this problem I created a Cron job hook in a module that is used on both sites that scans the Solr index for items of a desired type and then compares them to the actual shared table to see if any are missing in the Solr index. If items are found that are not in the Solr index, they are added.

This is beneficial for many reasons. For example if you have views that use a Solr index for content you want to make sure everything in your shared table is in this index. In my case user searches in site A were missing users that were created on on Site B.

Why do a Cron hook (instead of hook_node_insert)? You need to make sure the other site is able to find the entities that it didn't create. Cron is great since it runs on both sites no matter what and checks. Insert response hooks will only work on the current site being used to add the user, but the other site is unaware even if they both share that same module. Site B will not respond to a node insert done on Site A with an insert hook.

The most important part of this is search_api_entity_insert which actually marks an item as to-index.

// Ensure that users are indexed correctly, regardless of which site they were created on.
// Since users can be created on two sites, search_api_entity_insert($entity, $type) might
// not be called, which results in an incomplete user index.
function MYMODULE_cron() {
  // Make sure search_api module exists
  if (module_exists('search_api')) {
    // I am searching for users so this is my type. This could be a different type if 
    // not being used for users
    $user_type = 'user';
    $conditions = array(
      'enabled' => 1,
      'item_type' => $user_type,
      'read_only' => 0,
    );

    // Each type in the search_api index has a different ID. That ID can be different 
    // even between sites that share the same item type so it is important to do this 
    // to get the Solr ID of the type of entity you are loading for use in the following 
    // steps. We of course will only get one index back.
    $indexes = search_api_index_load_multiple(FALSE, $conditions);
    if (!$indexes) {
      return;
    }
    
    // For the one Solr ID index found
    foreach ($indexes as $index) {
      // User type exists
      // Select all users from the shared users table that are not present in the search 
      // index on this local site.
      // Note that the users table is set to use the shared database:table in 
      // settings.php of both sites.
      // The following drupal query essentially does this sql query:
      //    select u.uid from SHARE_db.users as u 
      //      where uid not in 
      //      (select item_id from THISSITE_db.search_api_item
      //       where index_id = ThisSolrUserTypeID);

      $query = db_select('users', 'shared_u');
      $query->addField('shared_u', 'uid');
      
      $query_exists = db_select('search_api_item', 'indexed_users');
      $query_exists->condition('indexed_users.index_id', $index->id, '=');
      $query_exists->addExpression('NULL');
      $query_exists->where("indexed_users.item_id=shared_u.uid");

      $query->notExists($query_exists);
      $results = $query->execute();

      // Great idea to add a log that users are found that are about to be indexed since 
      // this is running in cron. That way if something goes wrong we can at least see 
      // that this started.
      if ($results) {
        watchdog('MYMODULE', 'Adding shared users to search index');
      }

      // $results now holds all the users that need to be indexed because they were not 
      // found in the current site's Solr index.
      foreach ($results as $result) {
        // The user record doesn't appear to have been indexed, so call
        // search api's search_api_entity_insert directly and make it so.
        search_api_entity_insert(user_load($result->uid), $user_type);
        // You can un-comment the following line if you want to see each entity you are 
        // adding to be indexed by Solr.
        //watchdog('MYMODULE', 'Adding user ' . $result->uid);
      }
    }
  }
}

As a follow up note this only handles entity adds. If an entity is removed a similar function needs to be created to un-index the entity as well. You can use search_api_entity_delete.

If there is interest or if it would be helpful we could put together a Birds of a Feather at Drupalcon Portland (2013) for this. Let us know! 

Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web