Using Search API Attachments with remote Solr extraction
Posted Aug 28, 2012 // 1 comments
Search API Attachments is very similar to Apachesolr Attachments in that it lets you extract text from attachments using Apache Tika. It makes this text indexable and searchable so that documents on the site can be searched along with nodes and entities.
However, while Apachesolr Attachments lets you select either to use a local copy of Tika or Tika installed on a remote SOLR server, Search API doesn't support the same configuration. Search API Attachments only supports local Tika extraction. For large-scale sites, this is an issue as it takes resources away from the web server to do resource-intensive processing work.
There is a way to enable remote SOLR extraction in Search API with just a few patches.
First, make sure you are using the 7.x-1.2 copy of the Search API Attachments module. If you are not, upgrade to that version.
Next, apply the http://drupal.org/files/search_api_attachments-allow_external_extraction_and_cache_extraction-1289222-8.patch patch to your Search API Attachments module. This patch adds a configuration option to the Search API Attachments screen to allow for remote SOLR extraction or local Tika extraction, and contains the necessary code to make it work. It also adds a table to store the text that was extracted, so that you don't need to send the files to the server every time you need to re-index your site. Don't forget to run the database updates after this patch has been applied.
Last, apply the http://drupal.org/files/search_api_solr-allow_abitrary_query-1580118-1.patch patch to your Search API module ( not Search API Attachments ). This patch is required by the previous patch in order to make the query to the remote SOLR server.
You'll want to re-index your site after you've made these changes if you are already using the local Tika extraction.
Web Developer Brad Blake brings a wealth of expertise to our team and our clients whenever he creates software tools and websites on the LAMP platform. For more than five years, he has been using PHP to build cutting-edge technologies that ...