Jul 18 2013
Jul 18

I’m betting on there being a better way to do this.  There just has to be.  But I haven’t figured it out, so here is how we’re using (abusing?) Views PHP and ApacheSolr Views  to deliver results on a per-content type basis.

apachesolr views with per content type results

apachesolr views with per content type results

To start with we create a new view

choose the correct solr index

choose the correct solr index

After the new view is created load a bunch of fields on in – it may be useful to remember that “Bundle” will return a content type by name, the “content” will return all of the fielded content in an string, and “entity_id” has the node value.  One cool thing you can do with the node id is then pass that as an argument to another view using Views Field View - be aware that this is somewhat performance naughty, but it’ll work if you want access to images in your results and don’t feel like creating a custom module to index images.

exclude fields from display

exclude fields from display

After getting your fields loaded I set their values to “Exclude from display” in the field settings

Our view with a few fields loaded

Our view with a few fields loaded

And of course don’t forget to add a Global: PHP field – here’s our view now

In the Global PHP comes this fubar stank piece of code

<?php
$type=$row->bundle; // Here we’ll get our content types
$content=$row->content; // This is all of the content per node that’s been indexed
$title=$row->label; // This is the node title
if ($type==”mobile_resource”)
{
$stop=”URL:”;
$startI = 1;
$stopI = strpos($content, $stop, $startI);
$text=substr($content, $startI, $stopI – $startI);
$trimtext=trim($text);
$startsAt = strpos($content, “URL:”) + strlen(“URL:”);
$endsAt = strpos($content, “Platform:”, $startsAt);
$result = substr($content, $startsAt, $endsAt – $startsAt);
$trimurl=trim($result, chr(0xC2).chr(0xA0)); // This will save someone time – be sure to trim the nbsp spaces from your content when needed

echo “<a href=”$trimurl”>$title: $trimtext</a>”; // and then print stuff as you see fit

}
else if ($type == “cme_class”) // and more of the same
{
$stop=”Description:”;
$startI = 1;
$stopI = strpos($content, $stop, $startI);
$text=substr($content, $startI, $stopI – $startI);
$trimtext=trim($text);
$url=$row->path_alias;
echo “<a href=”$url”>$trimtext</a>”;

$startsAt = strpos($content, “Location:”) + strlen(“Location:”);
$endsAt = strpos($content, “Date and Time:”, $startsAt);
$location = substr($content, $startsAt, $endsAt – $startsAt);

$startsAt2 = strpos($content, “Time:”) + strlen(“Time:”);
$endsAt2 = strpos($content, “Registration”, $startsAt2);
$time = substr($content, $startsAt2, $endsAt2 – $startsAt2);

print(“<h5>Time: $time</h5>”);
print(“<h5>Place: $location</h5>”);
}
else if ($type == “employee_bio”) //etc
{
$startsAt = 1;
$endsAt = strpos($content, “My Education”, $startsAt);
$who = substr($content, $startsAt, $endsAt – $startsAt);
$url=$row->path_alias;
echo “<a href=”$url”>$who</a>”;
}
else
{
$url=$row->path_alias;
echo “<a href=”$url”>$title</a>”;
}
?>

And really… I’m pretty embarrassed by this.   It’s raunchy and rank.  It works.  The facets facet and so on… I just can’t believe that this is the way to do things.  I’ve looked at Solr Panels and feel like there must be a way to do better – maybe having multiple views pre-faceted by bundle at the view level and the made in to content panes?   Anyone with some sense of how to improve this is encouraged to chime in…. Bueller?  Bueller….

Mar 23 2012
Mar 23
Custom apachesolr results

Custom apachesolr results

We’ve been getting some questions about how we customize our apachesolr results by content type.  The answer:  very simply.  We use views php.  Views php allows us several new field type in views including a place to put custom PHP. It also has the advantage of allowing you to pull in variables already loaded into your view.  Thus while it is possible to write queries to the database in the php area, it is not necessary.  Moreover if you have need to tables not typically available in views you may use the data module to gain access.

For the video inclined here is a link to our 2 minutes on views php http://www.youtube.com/watch?v=cAehdUVrVDs

While there are several fields available in the Views PHP field I tend to use the output field area.  To hide the actual complete results

Using available variables in views php

Using available variables in views php

this makes it so that you can add a few lines of logic and have the results better reflect the information.  For our librarians we output their profile picture linked to their email address, for our web results we provide link descriptions etc.

Excluding results from display allows the results to be available to views php without cluttering the patrons view

Excluding results from display allows the results to be available to views php without cluttering the patrons view

Just remember to exclude your fields from display so that the user isn’t presented with  hundreds of fields and you’re good to go.

Feb 16 2012
Feb 16

Autocomplete for apachesolr views

For the folks who need a working copy of apachesolr autocomplete for use with apachesolr views here’s a repo that’ll help https://github.com/alibama/apachesolr-autocomplete-drupal – bear in mind that it’s just the latest version with patches #13 & 14 listed in the autocomplete issue queue applied.  Just figure it’ll save someone a minute of testing against the latest version of solr views…   It’s also attached in a zip at the end of this post.

Also in today’s two minute vid:  exposed form filters in a block = a great way to handle site search.  The exposed filters become available as a separate block that then points to the page.  We then take the newly created block over to our sitewide context and have a search box on every page that connects to the apachesolr search view….

Exposed form in block

Sitewide Contexts put blocks on every page...

 

 

 

 

 

Apachsolr Views Autocomplete Module

Jan 25 2012
Jan 25

It’s been a while since we looked at Apachesolr Drupal integration.  In large part that is because it “just works.”

It's solr, in a view, with exposed filters, facets, and it brews tea!

With the recent release of new code on Apachesolr Views (Big ups to dstuart, Ravi.J and ygerasimov for the recent contributions… everything seems to be aces) it’s time to revisit the subject.

If you have struggled with theming the search-result.tpl.php file and really don’t want to learn any more about getting great faceted search results you are totlally in luck!  Note to all: we’re using views 3.x-dev, apachesolr 3.x-dev, and apachesolr_views 3.x-dev.

For those of you who are video inclined here’s a ~4 minute screencast

For the rest of you Robert Douglass called it over two years ago in his “views 3 + apachesolr + acquia drupal = the future of search” post and for the most part that’s the deal – if you want to see more screen shots go there.

We’re also using ApacheSolr Custom Fields and the Batch Indexing module (thanks anarchivist)  (as mentioned previously).  Note that at this time the custom field module requires this fix to run… but still a handy module.

Learn to not code with the help of nice mods!

Custom Fields is well worth the minor effort in that it is another piece of the no-coding puzzle.  We also enjoyed having views php (a bit of coding is ok – we used it to set up displays per content type in the view)

Also using better exposed filters again to make the UI for the exposed filter better… gotta love naming conventions!

In any case the working view took about 10-15 minutes to produce, with ~5 minutes spent making tea.

Attached below is our working stack in a zip file

if anyone wants to download the entire package that we’re using go ahead, there are a few known issues including taxonomy facets that still need to be patched in this release, however for many use cases it is good to go
 Solr Stack – Views + Apachesolr + Apachesolr Views + Batch Reindex + Custom Fields

Nov 30 2011
Nov 30

Achieve Internet Releases the Drupal 7 ApacheSolr Media Module

This module allows website administrators to index files of any type so they can be included in site-wide search results. This is very useful for enterprise websites that need to manage a large number of files, such as videos, PDFs, documents in Excel, Word, and PowerPoint, as well as images. ApacheSolr Media module can index any field within the file entities, including title, description, and taxonomy fields.

Why the ApacheSolr Media Module

Over the years Achieve Internet has built our fair share of publishing and media websites.  A few years back it was only the large entertainment companies like NBCU or publishers like Fastcompany.com that required complex media management.  A lot has changed over the last five years and today it seems everyone is a publisher of one kind or another. An even greater challenge is enterprise organizations have gone global and the need for files and media to be distributed over multiple languages is at an all time high.   A great example of this issue is how organizations are managing their printed material, such as installation instructions, troubleshooting guides, catalogs, data sheets, brochures, and marketing material. It’s one thing for a website administrator to find those files, it’s a completely different challenge to make those files available via public facing search results. Achieve Internet’s new ApacheSolr Media module allows your website visitors access to all these files through a simple site search.

Example of This Module in Action on Hunterindustries.com

(The files below are PDF and assorted zip files, however this can be used for video, documents, even zipped files)

Screen Shot - ApacheSolr Media - Screen Shot 2.png

Screen Shot - ApacheSolr Media - Screen Shot 3.png

Engineering Challenges

One of the challenges faced in building the ApacheSolr Media module was to review all published nodes, detect all referenced files, and create a separate Solr document for every referenced file. Some nodes contained a large number of referenced files, so the page timing-out is a real issue. Achieve solved this issue by including each file attached to a node to count toward the node processing limit per cron run. For example, if the ApacheSolr Media module is configured to index up to 10 nodes per cron run, and the nodes have five files per node, then the module will index two nodes and 10 files per cron run.

The ApacheSolr Media module is the fourth module released by Achieve Internet in the fall of 2011.  The other modules are:

Media Updates

Fresh content is one key to great web experiences. This module simplifies the process of updating media content by allowing replacement of existing files. This release includes the capability to quickly and easily replace a media file currently in use at various locations on your site.   Media Updates Module Blog

Views Media Browser

One of the biggest problems in managing media is being able to find assets. This module enables views filtering, allowing you to refine by clarifying any type of field in your media files. Filtering by taxonomy terms and searching text fields are just two powerful examples. Having the capability to selectively screen information is extremely valuable. Views Media Browser Module Blog

Media Translation

This new module allows you to easily manage media files and taxonomy terms within multiple languages. Additional capability includes an automated “detect and replace” function that aligns files with the language mode displayed.  Media Translation Blog

Combining these modules together with the original Drupal Media module can produce a powerful and rich media management experience. An example of that power is by adding the Media Translation module to the ApacheSolr Media module, we can create “translation sets” of files, which will group together all translated versions of the same file. By integrating the two modules, we can create a node, attach files to it, and then when the node is translated, the correct language version of the referenced files automatically display on the correct language site – including in the search results. 

Assumptions

Like every good module Achieve did need to set the parameters to accomplish our goal of helping publishing and media related websites manage their files.  The most important assumption that dictates the outcome of this module would be; that only files to be indexed for the search are files that are referenced by a node. For example, for one of our recent clients, the only files included in the search results are those that are attached to a published product node, support document, or other node. This makes it much easier to display only relevant and current content without having to frequently delete outdated file content from the site.

This module also assumes that you are using version 1 of the Media module, and that your nodes use media selector fields to reference the files.

Items to Consider

There are a few issues that need to be considered before installing the ApacheSolr Media module:

Files must be attached to at least one content type that is indexed by Solr, and the files to be indexed must be attached in a media selector field. There can be multiple media selector fields per node.

This module does not index the content of the files – for example, the PDF file itself, or the Excel file, etc. It indexes only the fields in the file entities.

All file types to be indexed for the search must use the same title field.

Installation and Setup

Getting the ApacheSolr Media module set up and ready to use is simple – just install the module.

To set up the remainder of the items to support this module:

Configure the File Types

1.     Go to Configuration and select File types (admin/config/media/file-types).

2.     Click on manage fields for the file type you want to configure.

3.     Create a generic File Title field of type Text or add an existing File Title field.

       You must have a single Title field that is used by all file types; otherwise your files will  not have a title in the search results.

4.     Add any additional fields needed.

5.     Repeat for all file types you want to index for the search.

Configure the Content Types

1.     Go to Structure and select Content types (admin/structure/types).

2.     Click on manage fields.

3.     Add one or more Multimedia asset fields to the content type. These are the fields that will reference the file entities to be indexed.

Note: You must have at least one content type that is indexed by Solr that contains a Multimedia asset field.

Achieve recommends reusing the same field across multiple content types.

Configure Solr Integration

1.     Go to Configuration and select Apache Solr Search (admin/config/search/apachesolr).

2.     Click on the Media tab.

3.     Select the field to use as the media file title.

4.     Select the media fields attached to nodes to include in the Solr index.

Rebuild the Solr Index

1.     Go to Configuration and select Apache Solr Search (admin/config/search/apachesolr).

2.     Click on the Search Index tab.

3.     Select either Queue content for reindexing or Delete the index and click Begin.

Future Plans

We are delighted to be co-maintainers of this module with shenzhuxi.

Given the time Achieve and Shenzhuxi would like to see 2.x version of the ApacheSolr Media module use the upcoming File Entity module (under development in conjunction with the Media module version 2). This will make the ApacheSolr Media module much more flexible because it can work with any file management system.

The group would also like to expand the functionality of the ApacheSolr Media module to be able to index the file content itself (e.g., the actual PDF) instead of just the file entity fields. That however is an entirely different challenge and may need to be done separately from this module. 

This is only v.1 and like every good Drupal module the real future and power of this module will come from the community.  We would love your feedback, input and contribution to the ApacheSolr Media module. The power of Drupal comes from our collaboration!

For more information on Achieve Internet please visit our Drupal.org Market page. http://drupal.org/node/1123842

Nov 14 2011
Nov 14

The Apachesolr module (7.x) allows us to build sites that use powerful search technology. Besides being blazingly fast, Apachesolr has another advantage: faceted search.

Faceted search allows our search results to be filtered by defined criteria like category, date, author, location, or anything else that can come out of a field. We call these criteria facets. With facets, you can narrow down your search more and more until you get to the desired results.

Sometimes, however, the standard functionality is not enough. You might want or need to customize the way that facets work. This is controlled by Facet API. Unfortunately there is not much documentation for Facet API, and the API can be difficult to understand.

This post will change that!

In this post, you will learn how to use FacetAPI to create powerful custom faceted searches in Drupal 7. I will take a look at the UI of the module as well as its internals, so that we can define our own widgets, filters, and dependencies using some cool ctools features.

Introduction

This is a site that we built recently. It uses facets from SearchAPI instead of Apachesolr, but the general principle is the same.

As we move forward, I plan to answer the following questions:

  • How can I set the sort order of facet links? How can I create my own sort order?
  • How can I show/hide some facets depending on custom conditions? For example, how can I show a facet only if there are more than 100 items in the search results
  • How can I exclude some of the links from the facets?
  • How can I change the facets from a list of links to form elements with autosubmit behavior?

I will start with showing the general settings that can be configured out of the box and then dive into the code to show how those settings can be extended.

Block settings options


Let's take a look at the contextual links of the facet block. We can see there are three special options:

  • Configure facet display
  • Configure facet dependencies
  • Configure facet filters

Let's take a closer look at each of these.

Facets UI

On the facet display configuration page we can choose the widget used to display the facet and its soft limit, the setting of the widget, which controls how many facets are displayed. We can also use different sort options to set the links to display in the order we prefer.

The next settings page is for Facet dependencies. Here we can choose different options that will make Drupal understand when to show or hide the facet block.

The third settings page is Facet filters. Here we can filter what facet items to show (or not show) in the block.

As you can see, there are plenty of options in the UI that we can use when we set up our facets. We can filter some options, change sort order, control visibility, and even change the facet widget to something completely different than a list of links.

Now let's take a look at the internals of Facet API and learn how we can define our own facet widgets, facet dependencies, and facet filters.

Facet API is built on the ctools plugins system, which is a very, very flexible instrument.

Build your own facet widget

Imagine we want to create a facet that will consist of a select list of options and a submit button. Better yet, let's hide the submit button and just autosubmit the form when we change the value by selecting an item from the list.

In order to define our own widget we need to implement the following hook:

<?php
/**
* Implements hook_facetapi_widgets()
*/
function example_facetapi_widgets() {
  return array(
   
'example_select' => array(
     
'handler' => array(
       
'label' => t('Select List'),
       
'class' => 'ExampleFacetapiWidgetSelect',
       
'query types' => array('term', 'date'),
      ),
    ),
  );
}
?>

With this we define a new widget called "example_select" which is bound to a class called "ExampleFacetapiWidgetSelect".

All our logic will be in this ExampleFacetapiWidgetSelect plugin class:

<?php
class ExampleFacetapiWidgetSelect extends FacetapiWidget {
  
/**
   * Renders the form.
   */
 
public function execute() {
   
$elements = &$this->build[$this->facet['field alias']];     $elements = drupal_get_form('example_facetapi_select', $elements);
  }
}
?>

Instead of rendering the links directly, we will load a form to show our select element.

Now let's see how to build the form:

<?php
/**
* Generate form for facet.
*/
function example_facetapi_select($form, &$form_state, $elements) {   // Build options from facet elements.
 
$options = array('' => t('- Select -'));
  foreach (
$elements as $element) {
    if (
$element['#active']) {
      continue;
    }
   
$options[serialize($element['#query'])] = $element['#markup'] . '(' . $element['#count'] . ')';
  }  
$form['select'] = array(
   
'#type' => 'select',
   
'#options' => $options,
   
'#attributes' => array('class' => array('ctools-auto-submit')),
   
'default_value' => '',
  );
 
$form['submit'] = array(
   
'#type' => 'submit',
   
'#value' => t('Filter'),
   
'#attributes' => array('class' => array('ctools-use-ajax', 'ctools-auto-submit-click')),
  );  
// Lets add autosubmit js functionality from ctools.
 
$form['#attached']['js'][] = drupal_get_path('module', 'ctools') . '/js/auto-submit.js';
 
// Add javascript that hides Filter button.
 
$form['#attached']['js'][] = drupal_get_path('module', 'example') . '/js/example-hide-submit.js';   $form['#attributes']['class'][] = 'example-select-facet';   return $form;
}
/**
* Submit handler for facet form.
*/
function example_facetapi_select_submit($form, &$form_state) {
 
$form_state['redirect'] = array($_GET['q'], array('query' => unserialize($form_state['values']['select'])));
}
?>

In this form, the value of each select box option is the url where the page should be redirected. So, on the submit handler we simply redirect the user to the proper page. We add autosubmit functionality via ctools' auto-submit javascript. Also, we add our own javascript example-hide-submit to hide the Filter button. This enables our facet to work even if javascript is disabled. In such a case, the user will just need to manually submit the form.

Each element that we retrieved from FacetAPI has the following properties:

  • #active - Determines if this facet link is active
  • #query - The url to the page that represents the query when this facet is active
  • #markup - The text of the facet
  • #count - The number of search results that will be returned when this facet is active.

Please note how easy it is to add the autosubmit functionality via ctools. It is just a matter of setting up the right attributes class on the form elements, where one is used as sender (.ctools-autosubmit) and the other one is the receiver (.ctools-autosubmit-click). Then we just need to add the javascript in order to get the autosubmit functionality working.

Please use minimum beta8 version of the FacetAPI module.

Sorting order

All options in facets are sorted. On the settings page above, we saw different kinds of sorts. We can also define our own type of sorting if none of the options on the settings page meet our needs. As an example, I will show you how to implement a random sort order.

To reach this goal we need to implement hook_facetapi_sort_info():

<?php
/**
* Implements hook_facetapi_sort_info().
*/
function example_facetapi_sort_info() {
 
$sorts = array();   $sorts['random'] = array(
   
'label' => t('Random'),
   
'callback' => 'example_facetapi_sort_random',
   
'description' => t('Random sorting.'),
   
'weight' => -50,
  );   return
$sorts;
}
/**
* Sort randomly.
*/
function example_facetapi_sort_random(array $a, array $b) {
  return
rand(-1, 1);
}
?>

The sort callback is passed to the uasort() function, so we need to return -1, 0, or 1. Note that you again get a facetapi element, which has the same properties as outlined above, so you could, for example, compare $a['#count'] and $b['#count'].

In order to see the result we just need to disable all the other sort order plugins and enable our own.

Filter

When facet items are generated they are passed on to filters. If we want to exclude some of the items, we should look at the filters settings. We can also, of course, implement our own filter.

As an example, we will create a filter where we can specify what items (by labels) we want to exclude on the settings page.

<?php
/**
* Implements hook_facetapi_filters().
*/
function example_facetapi_filters() {
  return array(
   
'exclude_items' => array(
     
'handler' => array(
       
'label' => t('Exclude specified items'),
       
'class' => 'ExampleFacetapiFilterExcludeItems',
       
'query types' => array('term', 'date'),
      ),
    ),
  );
}
?>

Like with the widgets, we define a filter plugin and bind it to the ExampleFacetapiFilterExcludeItems class.

Now lets take a look at the ExampleFacetapiFilterExcludeItems class:

<?php /**
* Plugin that filters active items.
*/
class ExampleFacetapiFilterExcludeItems extends FacetapiFilter {   /**
   * Filters facet items.
   */
 
public function execute(array $build) {
   
$exclude_string = $this->settings->settings['exclude'];
   
$exclude_array = explode(',', $exclude_string);
   
// Exclude item if its markup is one of excluded items.
   
$filtered_build = array();
    foreach (
$build as $key => $item) {
      if (
in_array($item['#markup'], $exclude_array)) {
        continue;
      }
     
$filtered_build[$key] = $item;
    }     return
$filtered_build;
  }  
/**
   * Adds settings to the filter form.
   */
 
public function settingsForm(&$form, &$form_state) {
   
$form['exclude'] = array(
     
'#title' => t('Exclude items'),
     
'#type' => 'textfield',
     
'#description' => t('Comma separated list of titles that should be excluded'),
     
'#default_value' => $this->settings->settings['exclude'],
    );
  }  
/**
   * Returns an array of default settings
   */
 
public function getDefaultSettings() {
    return array(
'exclude' => '');
  }
}
?>

In our example, we define settingsForm to hold information about what items we want to exclude. In the execute method we parse our settings value and remove the items we don't need.

Again please note how easy it is to enhance a plugin to expose settings in a form: All that is needed is to define the functions settingsForm and getDefaultSettings in the class.

Dependencies

In order to create our own dependencies we need to implement hook_facetapi_dependencies() and define our own class. This is very similar to the implementation of creating a custom filter, so I am not going to go into great detail here. The main idea of dependencies is that it allows you to show facet blocks based on a specific condition. The main difference between using dependencies and using context to control the visibility of these blocks is that facets whose dependencies are not matched are not even processed by FacetAPI.

For example, by default there is a Bundle dependency that will show a field facet only if we have selected the bundle that has the field attached. This is very handy, for example, in the situation when we have an electronics shop search. Facets related to monitor size should be shown only when the user is looking for monitors (when he has selected the bundle monitor in the bundle facet). We could create our own dependency to show some facet blocks only when a specific term of another facet is selected. There are many potential use cases here. For example you can think of a Facet that really is only interesting to users interested in content from a specific country. So you could process and show this facet only if this particular country is selected. A practical example would be showing the state field facet only when the country "United States" is selected as you know that for other countries filtering by the state field is not useful. Being able to tweak this yourself gives you endless possibilities!

Here is a shorted code excerpt from FacetAPI that can be used as a sample. It displays facet if the user has one of the selected roles. The main logic is in execute() method.

<?php
class FacetapiDependencyRole extends FacetapiDependency {   /**
   * Executes the dependency check.
   */
 
public function execute() {
    global
$user;
   
$roles = array_filter($this->settings['roles']);
    if (
$roles && !array_intersect_key($user->roles, $roles)) {
      return
FALSE;
    }
  }  
/**
   * Adds dependency settings to the form.
   */
 
public function settingsForm(&$form, &$form_state) {
   
$form[$this->id]['roles'] = array(
     
'#type' => 'checkboxes',
     
'#title' => t('Show block for specific roles'),
     
'#default_value' => $this->settings['roles'],
     
'#options' => array_map('check_plain', user_roles()),
     
'#description' => t('Show this facet only for the selected role(s). If you select no roles, the facet will be visible to all users.'),
    );
  }  
/**
   * Returns defaults for settings.
   */
 
public function getDefaultSettings() {
    return array(
     
'roles' => array(),
    );
  }
}
?>

Conclusion

As you can see, FacetAPI gives us the option to change anything we want about facets. We can change the display, alter the order, filter some facets out, and control facet blocks visibility and content based on dependencies we define.

I would like to thank the maintainers of this module, Chris Pliakas and Peter Wolanin, for their great work!

An example module is attached to this article. Thank you for reading, and if you have any further questions please let us know in the comments!

(Author: Yuriy Gerasimov, Co-Author: Fabian Franz, Editor: Stuart Broz)

AttachmentSize 1.99 KB
May 02 2011
May 02

Last year Telenet successfully migrated its knowledge base and business site to Drupal. As a member of the Ausy/DataFlow group I'm proud to announce that Telenet today launched its multisite search using Drupal.

We continued on the Apache Solr based solution we had already created for the knowledge base while Robert Douglass was hired to work on a Nutch/Solr implementation to crawl the (for now) non-Drupal based sites in the company's portfolio. This powerful combination results in a multilingual multisite search that features autocompletion, spell checking and file indexing.

Contributions were made to the following projects:

Sep 20 2010
Sep 20

When you're setting up Tomcat to run Solr be aware that although Solr supports UTF-8 by default, Tomcat does not. You can enable the character encoding by doing the following in Tomcat's server.xml:

  • Add URIEncoding="UTF-8" to the correct Connector
  • Be sure to remove useBodyEncodingForURI from that Connector
Jul 07 2010
Jul 07

Drupal is in love with Solr, as can be seen by the absolutely great session proposals that have been submitted for DrupalCon Copenhagen, in August. If you want to see a healthy dose of search goodness happening in Denmark, here are the links to go vote.

    Feb 27 2010
    Feb 27


    This week I gave a talk for the Vancouver Island Java User Group on integrating Apache Solr search into web applications. Since the group is, of course, Java-focused, I didn't dwell overly much on Drupal except to demo a non-trivial example of integration showing some of the more advanced capabilities of Solr search, including faceted search, search spelling correction, "find similar content", and so on - all available out of the box with Robert Douglass, pwolanin, claudiu.cristea et al.'s excellent ApacheSolr module for Drupal.

    Slides are available here.

    Since I was originally schedule to give the talk in November of 2008, this was a great opportunity to look back over the past year and a half or so and see what has changed in the Solr and ApacheSolr world.

    Solr has had two major point releases, going from version 1.2 to 1.4, adding substantial performance improvements, replication, multi-select faceting, range queries (e.g., date between Sep 2004 and Oct 2006), nested queries, multiple cores, more flexible architecture, and much more. The number of installations and the community of developers seems to be steadily growing - I'd estimate that the numbers have at least doubled in the past 16 months.

    ApacheSolr has had steady development releases, leading to full DRUPAL-5--2 and DRUPAL-6--1 and DRUPAL-6--2 releases. More than 240 issues and feature requests have been addressed since Jan 2009. Many issues including indexing of attached documents, access control, implementation of various Solr features, have all been addressed in one or more ways by this or various associated modules.

    One of the issues that seems to come up relatively frequently is the difficulty of using Solr's fuzzy matching or wildcard matches out of the box, because the ApacheSolr module chooses to use the DisMax query handler rather than the "Standard" query handler for Solr, in order to better deal with weighted fields, if I understand the rationale correctly (i.e., a core use case trumps a special use case). This situation may soon improve with proposed improvements to the DisMax handler. Let's hope so (better yet, in the magical event of a sudden rush of free time or significant client interest, pitch in and help make this happen!)

    Dec 10 2009
    Dec 10

    At the recently Drupal Camp Prague 09, I was introduced to ApacheSolr as a replacement to the standard Drupal search or the Google CSE.

    On most sites the basic Google CSE setup is sufficient, however for some of the more serious "work" websites my colleague Nick and I got experimenting with the Drupal's implementation of ApacheSolr module.

    Here is a quick and rough writeup on how it was implemented on work and my personal (janaksingh.com website).

    Please feel free to share your experiences and tips.

    Basic Installation

    1) Install Java on CentOS if you havent already

    2) Install ApacheSolr drupal module

    cvs -z6 -d:pserver:anonymous:[email protected]:/cvs/drupal-contrib checkout -d apachesolr -r DRUPAL-6--2 contributions/modules/apachesolr/
    

    3) Checkout the ApacheSolr PHP Lib inside the ApacheSolr module folder

    svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient
    

    4) Create a temp folder in your home directory

    cd ~
    mkdir temp
    cd temp
    

    5) Download ApacheSolr in the temp folder
    SVN Method

    svn checkout http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 apache-solr
    

    TAR Method
    Grab the tarball from http://www.apache.org/dyn/closer.cgi/lucene/solr/

    wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/lucene/solr/1.4.0/apache-solr-1.4.0.zip
    

    tar -xzf apache-solr-1.4.0.zip
    mv apache-solr-1.4.0 apache-solr
    

    I have moved my "apache-solr" folder to "/usr/local/share/" but I am not sure if this matters

    6) Rename 2 default files in apache-solr/example/solr/conf/

    sudo mv solrconfig.xml solrconfig.bak
    sudo mv schema.xml schema.bak
    

    7) Copy the schema.xml and solrconfig.xml from the drupal apachesolr module folder into apache-solr/example/solr/conf/

    Multisite / Multicore Setup

    8) If you want multisite seach (multicore), here is what I did:

    • duplicate the example folder to "sites"
    • create a folder for each new website
    • copy the conf folder from sites folder into each one of the website folders
    • create a solr.xml file in the "sites" folder and defined a core for each of the sites.. here is what my sole.xml file looks like:
      <solr persistent="false">
        <!--
        adminPath: RequestHandler path to manage cores.
          If 'null' (or absent), cores will not be manageable via request handler
        -->
        <cores adminPath="/admin/cores">
          <core name="website1_core" instanceDir="website1.com" />
        </cores>
      </solr>
      

    9) Finally, in the Drupal ApacheSolr module, update the "Solr path" to the same core name you defined for the site in the sole.xml file. If you are using the example setup as outlined in the readme file that ships with the ApacheSolr Drupal module, you need not change anything as the default path the module ships with is correct. You only need to change this if you define your own cores.

    Testing

    If you are using the example site, test your installation as follows:

    cd apache-solr/example
    sudo java -jar start.jar
    

    If you are using the multicore setup from step 8, test as follows

    cd apache-solr/sites
    sudo java -jar start.jar
    

    Check the ApacheSolr admin page by visiting http://mydomain:8983/solr/admin/ or if you are using multicore http://mydomain:8983/solr/mycore_name/admin/

    Auto start ApacheSolr on server reboot

    You can follow the instructions on Want to start Apache solr automatically after a reboot? but depending on how you defined the cores above you will need to alter the path eg:

    SOLR_DIR="/opt/apache-solr/example"
    

    or
    SOLR_DIR="/opt/apache-solr/sites"
    

    Dont forget to change permissions on the sole bash file you created (etc/init.d/solr) and remember to install the script using chkconfig (all outlined in the guide above)

    Security

    By default, ApacheSolr does not ship with any kind of port protection, you are advised to secure your server ports by using iptables or a dedicated firewall (if you have one). For CentOS iptable guidelines click here

    Further Reading

    Related blog posts: 


    Bookmark and Share

    About Drupal Sun

    Drupal Sun is an Evolving Web project. It allows you to:

    • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
    • Facet based on tags, author, or feed
    • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
    • View the entire article text inline, or in the context of the site where it was created

    See the blog post at Evolving Web

    Evolving Web