Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Using Search API Attachments with remote Solr extraction

Parent Feed: 

Posted Aug 28, 2012 // 1 comments

Search API Attachments is very similar to Apachesolr Attachments in that it lets you extract text from attachments using Apache Tika. It makes this text indexable and searchable so that documents on the site can be searched along with nodes and entities.

However, while Apachesolr Attachments lets you select either to use a local copy of Tika or Tika installed on a remote SOLR server, Search API doesn't support the same configuration. Search API Attachments only supports local Tika extraction. For large-scale sites, this is an issue as it takes resources away from the web server to do resource-intensive processing work.

There is a way to enable remote SOLR extraction in Search API with just a few patches.

First, make sure you are using the 7.x-1.2 copy of the Search API Attachments module. If you are not, upgrade to that version.

Next, apply the http://drupal.org/files/search_api_attachments-allow_external_extraction_and_cache_extraction-1289222-8.patch patch to your Search API Attachments module. This patch adds a configuration option to the Search API Attachments screen to allow for remote SOLR extraction or local Tika extraction, and contains the necessary code to make it work. It also adds a table to store the text that was extracted, so that you don't need to send the files to the server every time you need to re-index your site. Don't forget to run the database updates after this patch has been applied.

Last, apply the http://drupal.org/files/search_api_solr-allow_abitrary_query-1580118-1.patch patch to your Search API module ( not Search API Attachments ). This patch is required by the previous patch in order to make the query to the remote SOLR server.

You'll want to re-index your site after you've made these changes if you are already using the local Tika extraction.

Web Developer Brad Blake brings a wealth of expertise to our team and our clients whenever he creates software tools and websites on the LAMP platform. For more than five years, he has been using PHP to build cutting-edge technologies that ...

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web