Oct 24 2016
Oct 24

In this blog post I will present how, in a recent e-Commerce project built on top of Drupal7 (the former version of the Drupal CMS), we make Drupal7, SearchAPI and Commerce play together to efficiently retrieve grouped results from Solr in SearchAPI, with no indexed data duplication.

We used the SearchAPI and the FacetAPI modules to build a search index for products, so far so good: available products and product-variations can be searched and filtered also by using a set of pre-defined facets. In a subsequent request, a new need arose from our project owner: provide a list of products where the results should include, in addition to the product details, a picture of one of the available product variations, while keep the ability to apply facets on products for the listing. Furthermore, the product variation picture displayed in the list must also match the filter applied by the user: this with the aim of not confusing users, and to provide a better user experience.

An example use case here is simple: allow users to get the list of available products and be able to filter them by the color/size/etc field of the available product variations, while displaying a picture of the available variations, and not a sample picture.

For the sake of simplicity and consistency with Drupal’s Commerce module terminology, I will use the term “Product” to refer to any product-variation, while the term “Model” will be used to refer to a product.

Solr Result Grouping

We decided to use Solr (the well-known, fast and efficient search engine built on top of the Apache Lucene library) as the backend of the eCommerce platform: the reason lies not only in its full-text search features, but also in the possibility to build a fast retrieval system for the huge number of products we were expecting to be available online.

To solve the request about the display of product models, facets and available products, I intended to use the feature offered by Solr called Result-Grouping as it seemed to be suitable for our case: Solr is able to return just a subset of results by grouping them given an “single value” field (previously indexed, of course). The Facets can then be configured to be computed from: the grouped set of results, the ungrouped items or just from the first result of each group.

Such handy feature of Solr can be used in combination with the SearchAPI module by installing the SearchAPI Grouping module. The module allows to return results grouped by a single-valued field, while keeping the building process of the facets on all the results matched by the query, this behavior is configurable.

That allowed us to:

  • group the available products by the referenced model and return just one model;
  • compute the attribute’s facets on the entire collection of available products;
  • reuse the data in the product index for multiple views based on different grouping settings.

Result Grouping in SearchAPI

Due to some limitations of the SearchAPI module and its query building components, such plan was not doable with the current configuration as it would require us to create a copy of the product index just to apply the specific Result Grouping feature for each view.

The reason is that the features implemented by the SearchAPI Grouping are implemented on top of the “Alterations and Processors” functions of SearchAPI. Those are a set of specific functions that can be configured and invoked both at indexing-time and at querying-time by the SearchAPI module. In particular Alterations allows to programmatically alter the contents sent to the underlying index, while the Processors code is executed when a search query is built, executed and the results returned.
Those functions can be defined and configured only per-index.

As visible in the following picture, the SearchAPI Grouping module configuration could be done solely in the Index configuration, but not per-query.

SearchAPI: processor settings

Image 1: SearchAPI configuration for the Grouping Processor.

As the SearchAPI Grouping module is implemented as a SearchAPI Processor (as it needs to be able to alter the query sent to Solr and to handle the returned results), it would force us to create a new index for each different configuration of the result grouping.

Such limitation requires to introduce a lot of (useless) data duplication in the index, with a consequent decrease of performance when products are saved and later indexed in multiple indexes.
In particular, the duplication is more evident as the changes performed by the Processor are merely an alteration of:

  1. the query sent to Solr;
  2. the handling of the raw data returned by Solr.

This shows that there would be no need to index multiple times the same data.

Since the the possibility to define per-query processor sounded really promising and such feature could be used extensively in the same project, a new module has been implemented and published on Drupal.org: the SearchAPI Extended Processors module. (thanks to SearchAPI’s maintainer, DrunkenMonkey, for the help and review :) ).

The Drupal SearchAPI Extended Processor

The new module allows to extend the standard SearchAPI behavior for Processors and lets admins configure the execution of SearchAPI Processors per query and not only per-index.

By using the new module, any index can now be used with multiple and different Processors configurations, no new indexes are needed, thus avoiding data duplication.

The new configuration is exposed, as visible in the following picture, while editing a SearchAPI view under “Advanced > Query options”.
The SearchAPI processors can be altered and re-defined for the given view, a checkbox allows to completely override the current index setting rather than providing additional processors.

Drupal SearchAPI: view's extended processor settings

Image 2: View’s “Query options” with the SearchAPI Extended Processors module.

Conclusion: the new SearchAPI Extended Processors module has now been used for a few months in a complex eCommerce project at Liip and allowed us to easily implement new search features without the need to create multiple and separated indexes.
We are able to index Products data in one single (and compact) Solr index, and use it with different grouping strategies to build both product listings, model listings and model-category navigation pages without duplicating any data.
Since all those listings leverages the Solr FilterQuery query parameter to filter the correct set of products to be displayed, Solr can make use of its internal set of caches and specifically the filterCache to speed up subsequent searches and facets. This aspect, in addition to the usage of only one index, allows caches to be shared among multiple listings, and that would not be possible if separate indexes were used.

For further information, questions or curiosity drop me a line, I will be happy to help you configuring Drupal SearchAPI and Solr for your needs.

Sep 14 2016
Sep 14

Handling clients with more than one site involves lots of decisions. And yet, it can sometimes seem like ultimately all that doesn’t matter a hill of beans to the end-user, the site visitor. They won’t care whether you use Domain module, multi-site, separate sites with common codebase, and so on. Because most people don’t notice what’s in their URL bar. They want ease of login, and ease of navigation. That translates into things such as the single sign-on that drupal.org uses, and common menus and headers, and also site search: they don’t care that it’s actually sites search, plural, they just want to find stuff.

For the University of North Carolina, who have a network of sites running on a range of different platforms, a unified search system was a key way of giving visitors the experience of a cohesive whole. The hub site, an existing Drupal 7 installation, needed to provide search results from across the whole family of sites.

This presented a few challenges. Naturally, we turned to Apache Solr. Hitherto, I’ve always considered Solr to be some sort of black magic, from the way in which it requires its own separate server (http not good enough for you?) to the mysteries of its configuration (both Drupal modules that integrate with it require you to dump a bunch of configuration files into your Solr installation). But Solr excels at what it sets out to do, and the Drupal modules around it are now mature enough that things just work out of the box. Even better, Search API module allows you to plug in a different search back-end, so you can develop locally using Drupal’s own database as your search provider, with the intention of plugging it all into Solr when you deploy to servers.

One possible setup would have been to have the various sites each send their data into Solr directly. However, with the Pantheon platform this didn’t look to be possible: in order to achieve close integration between Drupal and Solr, Pantheon locks down your Solr instance.  That left talking to Solr via Drupal.

SearchAPI lets you define different datasources for your search data, and comes with one for each entity type on your site. In a datasource handler class, you can define how the datasource gets a list of IDs of things to index, and how it gets the content. So writing a custom datasource was one possibility.

Enter the next problem: the external sites we needed to index only exposed their content to us in one format: RSS. In theory, you could have a Search API datasource which pulls in data from an RSS feed. But then you need to write a SearchAPI datasource class which knows how to parse RSS and extract the fields from it.  That sounded like we’d be reinventing Feeds, so we turned to that to see what we could do with it. Feeds normally saves data into Drupal entities, but maybe (we thought) there was a way to have the data be passed into SearchAPI for indexing, by writing a custom Feeds plugin?  However, we found we had a funny problem of the sort that you don’t consider the existence of until you stumble on it: Feeds works on cron runs, pulling in data from a remote source and saving it into Drupal somehow. But SearchAPI also works on cron runs, pulling data in, usually entities. How do you get two processes to communicate when they both want to be the active participant?

With time pressing, we took the simple option: we defined a custom entity type for Feeds to put its data into, and SearchAPI to read its data from. (We could have just used a node type, but then there would have been an ongoing burden of needing to ensure that type was excluded from any kind of interaction with nodes.)

Essentially, this custom entity type acted like a bucket: Feeds dumps data in, SearchAPI picks data out. As solutions go, not the most massively elegant, at first glance. But if you think about it, if we had gone down the route of SearchAPI fetching from RSS directly, then re-indexing would have been a really lengthy process, and could have had consequences for the performance of the sites whose content was being slurped up. A sensible approach would then have been to implement some sort of caching on our server, either of the RSS feeds as files, or the processed RSS data. And suddenly our custom entity bucket system doesn’t look so inelegant after all: it’s basically a cache that both Feeds and SearchAPI can talk to easily.

There were a few pitfalls. With Search API, our search index needed to work on two entity types (nodes and the custom bucket entities), and while Search API on Drupal 7 allows this, its multiple entity type datasource handler had a few issues we needed to iron out or learn to live with.

The good news though is that the Drupal 8 version of Search API has the concept of multi-entity type search indexes at its core, rather than as a side feature: every index can handle multiple entity types, and there’s no such thing as a datasource for a single entity type.  With Feeds, we found that not all the configuration is exportable to Features for easy deployment. Everything about parsing the RSS feed into entities can be exported, except the actual URL, which is a separate piece of setup and not exportable. So we had to add a hook_updateN() to take care of setting that up.

The end result though was a site search that seamlessly returns results from multiple sites, allowing users to work with a network of disparate sites built on different technologies as if they were all the same thing. Which is what they were probably thinking they were all along anyway.

Author: Joachim Noreiko

Apr 21 2016
Apr 21
     This is blog describes about how to add custom field to Search API solr index in Drupal 7.

     Suppose we need a new field, we can add new fields for a content type at /admin/structure/content/ types. Then all fields are showed at /admin/config/search/search_ api/index/default_node_index/ fields. Now you can add desired fields to solr index.

     Suppose you want to show custom field to Search API results but that field is not created for any specific content types. Is it possible in Search API? Yes you can done this with use of hook_entity_property_info_ alter().

/**
 * Implement phponwebsites_get_nodecountviews_nid()
 *
 * Return count of views by particular nid
 */
function phponwebsites_get_nodecountviews_nid($ nid) {
  $result = db_query("SELECT COUNT(*) as count FROM {nodeviewcount} WHERE nid=:nid", array(':nid' => $nid))->FetchAssoc();
  return $result['count'];
}

/**
 * Implements hook_entity_property_info_ alter().
 */
function phponwebsites_entity_property_info_ alter(&$info) {
  $info['node']['properties'][' is_nodeviewcount'] = array(
    'type' => 'integer',
    'label' => t('Node view count'),
    'description' => t("Number of views."),
    'sanitized' => TRUE,
    'getter callback' => 'phponwebsites_get_is_nodeviewcount_ callback',
  );
}

/**
 * Implement phponwebsites_get_is_nodeviewcount_ callback()
 */
function phponwebsites_get_is_nodeviewcount_ callback($item) {
  $count = phponwebsites_get_nodecountviews_nid( $item->nid);
  $total = (int) $count;
  return $total;
}


          After added above code into your custom module, go to /admin/config/search/search_ api/index/default_node_index/ fields. Now you could see new custom field is displayed as in below images.

Add custom fields to search API solr index in Drupal 7

         Now you can add your custom field into search api solr index and index that field. The custom field is listed in views add field section. Now you can add custom field into search results.

Related articles:
Add new menu item into already created menu in Drupal 7
Add class into menu item in Drupal 7 Create menu tab programmatically in Drupal 7
Redirect users after login in Drupal 7
May 12 2014
May 12

Most of high traffic or complex Drupal sites use Apache Solr as the search engine. It is much faster and more scaleable than Drupal's search module.

In this article, we describe one way of many for having a working Apache Solr installation for use with Drupal 7.x, on Ubunutu Server 12.04 LTS. The technique described should work with Ubunut 14.04 LTS as well.

In a later article, now published at: article, we describe how to install other versions of Solr, using the Ubuntu/Debian way.

Objectives

For this article, we focus on having an installation of Apache Solr with the following objectives:

  • Use the latest stable version of Apache Solr
  • Least amount of software dependencies, i.e. no installation of Tomcat server, and no full JDK, and no separate Jetty
  • Least amount of necessary complexity
  • Least amount of software to install and maintain
  • A secure installation

This installation can be done on the same host that runs Drupal, if it has enough memory and CPU, or it can be on the database server. However, it is best if Solr is on a separate server dedicated for search, with enough memory and CPU.

Installing Java

We start by installing the Java Runtime Environment, and choose the headless server variant, i.e. without any GUI components.

sudo aptitude update
sudo aptitude install default-jre-headless

Downloading Apache Solr

Second, we need to download the latest stable version of Apache Solr from a mirror near you. At the time of writing this article, it is 4.7.2. You can find the closest mirror to you at Apache's mirror list.

cd /tmp
wget http://apache.mirror.rafal.ca/lucene/solr/4.7.2/solr-4.7.2.tgz

Extracting Apache Solr

Next we extract the archive, while still in the /tmp directory.

tar -xzf solr-4.7.2.tgz

Moving to the installation directory

We choose to install Solr in /opt, because it is supposed to contain software that is not installed from Ubuntu's repositories, using the apt dependency management system, nor tracked for security updates by Ubuntu.

sudo mv /tmp/solr-4.7.2 /opt/solr

Creating a "core"

Apache Solr can serve multiple sites, eached served by a "core". We will start with one core, called simply "drupal".

cd /opt/solr/example/solr
sudo mv collection1 drupal


Now edit the file ./drupal/core.properties and change the name= to drupal, like so:

name=drupal

Copying the Drupal schema and Solr configuration

We now have to copy the Drupal Solr configuration into Solr. Assuming your site is in installed in /var/www, these commands achieve the tasks:

cd /opt/solr/example/solr/drupal/conf
sudo cp /var/www/sites/all/modules/contrib/apachesolr/solr-conf/solr-4.x/* .

Then edit the file: /opt/solr/example/solr/drupal/conf/solrconfig.xml, and comment our or delete the following section:

<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>32</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>

Setting Apache Solr Authentication, using Jetty

By default, a Solr installation listens on the public Ethernet interface of a server, and has no protection whatsoever. Attackers can access Solr, and change its settings remotely. To prevent this, we set password authentication using the embedded Jetty that comes with Solr. This syntax is for Apache Solr 4.x. Earlier versions use a different syntax.

The following settings work well for a single core install, i.e. search for a single Drupal installation. If you want multi-core Solr, i.e. for many sites, then you want to fine tune this to add different roles to different cores.

Then edit the file: /opt/solr/example/etc/jetty.xml, and add this section:

<!-- ======= Securing Solr ===== -->
<Call name="addBean">
  <Arg>
    <New class="org.eclipse.jetty.security.HashLoginService">
      <Set name="name">Solr</Set>
      <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
      <Set name="refreshInterval">0</Set>
    </New>
  </Arg>
</Call>

Then edit the file: /opt/solr/example/etc/webdefault.xml, and add this section:

<security-constraint>
  <web-resource-collection>
    <web-resource-name>Solr</web-resource-name>
    <url-pattern>/*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
    <role-name>search-role</role-name>
  </auth-constraint>
</security-constraint>

<login-config>
  <auth-method>BASIC</auth-method>
  <realm-name>Solr</realm-name>
</login-config>

Finally, create a new file named /opt/solr/example/etc/realm.properties, and add the following section to it:

user_name: password, search-role


Note that "search-role" must match what you put in webdefault.xml above.

Instead of "user_name", use the user name that will be used for logging in to Solr. Also, replace "password" with a real strong hard to guess password.

Finally, make sure that the file containing passwords is not readable to anyone but the owner.

chmod 640 /opt/solr/example/etc/realm.properties

Changing File Ownership

We then create a user for solr.

sudo useradd -d /opt/solr -M -s /dev/null -U solr

And finally change ownership of the directory to solr

sudo chown -R solr:solr /opt/solr

Automatically starting Solr

Now you need Solr to start automatically when the server is rebooted. To do this, download the attached file, and copy it to /etc/init.d

sudo cp solr-init.d.sh.txt /etc/init.d/solr
sudo chmod 755 /etc/init.d/solr

And now tell Linux to start it automatically.

sudo update-rc.d solr start 95 2 3 4 5 .

For now, start Solr manually.

sudo /etc/init.d/solr start

Now Solr is up and running.

Verify that it is running by accessing the following URL:

http://x.x.x.x:8983/solr/


Replace x.x.x.x by the IP address of the server that is running Solr.

You can also view the logs at:

tail -f /opt/solr/example/logs/solr.log

Configuring Drupal's Apache Solr module

After you have successfully installed, configured and started Solr, you should configure your Drupal site to interact with the Solr seserver. First, go to this URL: admin/config/search/apachesolr/settings/solr/edit, and enter the information for your Solr server. You should use the URL as follows:

http://user:[email protected]:8983/solr/drupal

Now you can proceed to reindex your site, by sending all the content to Solr.

Removing Solr

If you ever want to cleanly remove Apache Solr that you installed from the server using the above instructions, then use the sequence of the commands below:

sudo /etc/init.d/solr stop

sudo update-rc.d solr disable

sudo update-rc.d solr remove

sudo rm /etc/init.d/solr

sudo userdel solr

sudo rm -rf /opt/solr

sudo aptitude purge default-jre-headless

Additional Resources

Attachment Size solr-init.d.sh_.txt 1.23 KB
May 11 2012
May 11

Use Case: We want patrons to find our forms easily.  If they need help finding something they should be able to go to a search bar, type in “consultation request”  and get to our consultation request form.

The Challenge:  First off Solr doesn’t index the node/add/your-content-type-here pages by default – I’m certain there’s some workaround, however it’s not obvious (dear readers are welcome to retort).  Secondly none of the form results should be indexed by solr in the first place – we certainly don’t need our patron’s requests showing up anywhere ever.

The Hack: Form Block allows you to make content types available as blocks.  Pretty straightforward, click a few buttons, go to the context editor, and woo-ha = forms in a block = we took the pages that were already returning for the queries we were interested in and just added the form to those pages.

Here’s the VERY brief overview – 1:30 seconds or so worth http://www.youtube.com/watch?v=gzvq-t1m03A that goes over turning a content type into a block as well as adding the block to a context using the context editor that comes with the admin toolbar

Form Block settings in content type

Form Block settings in content type

Now is also a good time to mention that if you are using context’s the Admin toolbar is almost required – it really extends the UI giving you a drag and drop interface for your blocks within every context active on a given page.

Adding node form in the context editor

Adding node form in the context editor

Form Nodes in context Editor

Form Nodes in context Editor

A form in a block!

A form in a block!

Sep 22 2011
Sep 22

At the last Drupal Indy User Group meeting, I did a presentation on integrating Openlayers and Apache Solr. In the presentation, I walk you through how to setup and configure the modules necessary to display search results on a map in Drupal 7. This is the same basic process which we used on Energy.gov.

The video is about 45 minutes long, so here is a general outline of the presentation. Be sure to watch the entire video or you will not see how all of the pieces are integrated:

  1. How to Geocode Content
  2. Sending Geocoded Content to Solr
  3. Local Solr with Apache Solr
  4. Openlayers
  5. Adding Solr Results to a Map
  6. Placing a Map on the Search Page
  7. Viewing Search Results on the Map

Modules covered in the video

Update...

To Help facility setting up Apache Solr with Local Solr I created this repository: https://github.com/treehouseagency/local-solr-config. It contains all of the files necessary to run solr with local solr on Mac OSX.

Instructions to Run:

  1. Checkout the repo.
  2. Change to the "examples" directory
  3. run "java -jar start.jar"
  4. Profit!
Aug 02 2010
tim
Aug 02

The Drupal.org redesign team, including recently hired contractors, met in Portland last week to coordinate their efforts and generate a workable project plan.

Among the agreements in place are the scheduling of a development release on August 7th, and the planning of subsequent releases every two weeks after that until project completion.  Additionally, the team will be tracking issues in the public queues and meeting by phone three times each week.

The hiring of contractors was a difficult decision for the Drupal Association, but it was necessary in order to accelerate Drupal development and to enable the project team to appropriately address the highly complex technical issues involved in the redesign.

Achieve Internet is leading the Solr development portion of the project. Solr is a robust search platform used by many large Drupal web sites to improve search speed and capability. Solr works by housing an index of a site's Drupal data and arranging it for easy access via the Lucene Java search library.

Drupal.org is a huge site, with nearly a million pages, nearly a million users, and at least 8600 projects. Project manager Chris Strahl noted that this puts the Drupal.org redesign project on the same scale as some of the largest corporate website projects in the world.

The redesign goes far beyond aesthetic upgrades to include key functional improvements. The project module, for example, is evolving from a Drupal.org-specific tool to a true project management module with robust Views integration. Also, site searching will include the ability to search all *.drupal.org sites, so the information distributed in various areas (groups.drupal.org, for example) will be accessible via one search box. Add to this increased functionality the implications of the GIT migration, and it’s easy to see how significant this Drupal development effort is.

With such a huge project, no task is truly discrete, and sorting out some of the dependencies has been challenging.For example, one of the nagging issues for Bill O’Connor, Achieve’s CTO, has been the difficulty of running an adequate development environment for his Solr work. Also, since searching is so intimately intertwined with the development of the Drupal project module, it has been difficult to build a Solr index on a moving target.

When the team met in Portland, they were able to address these sorts of issues, prioritize aspects of the project, and lay out clear communication paths for development. The infrastructure team has recently been able to provide a Solr index that can be stored locally, removing a key obstacle to Bill’s progress. And Bill and the project team have created a much clearer plan for ensuring that project field types, for example, meet Solr indexing requirements.

Achieve Internet’s part of the project is somewhat different than some of the others, in that there is less  Drupal community volunteer coordination and involvement. Strahl noted that "Since Bill is one of the few people in the world who can work on Solr for a site of this complexity, he’s hip-deep in Drupal.org Solr development.” Bill’s been focused on 1) ensuring that content indexing enables robust faceted searching and 2) enabling the ability to build dynamic, complex queries in Drupal across multiple sites.

Strahl believes the hiring of contractors and the meeting in Portland have had the desired effect. “More issues have been checked out in the last two weeks than in the last several months combined,” he said. “And the August 7th release will most likely include 50 resolved issues.”

Three of the team members--GIT migration lead Sam Boyer, Bill O'Connor, and Chris Strahl--were in San Diego at Achieve Internet on Friday to participate face-to-face in the weekly scrum, with the others joining by phone for a productive meeting.

“There is no lack of brain power on this team,” Strahl said after the call. “With a team of people who are used to being right, sometimes things can be difficult. But so far communication has been great, and I’m excited about the progress.”

Jul 07 2010
Jul 07

Drupal is in love with Solr, as can be seen by the absolutely great session proposals that have been submitted for DrupalCon Copenhagen, in August. If you want to see a healthy dose of search goodness happening in Denmark, here are the links to go vote.

    Jul 08 2009
    Jul 08

    We have installed a 'solr' server for the use of our Drupal sites. It has been an interesting experience so far. Many thanks to all who have blazed the trail ahead of us.

    I started getting the error

    "500" Status: Internal Server Error in apachesolr_cron

    on our main testing Drupal site recently. The error was showing up in  Adminstration -> Reports -> Recent log entries. I ran a few google searches and didn't find an answer. The index on this system is on the order of about 100K nodes. We had done some updates that required a 'rebuild' of the index. The error started to show up late into the rebuild process.
    Well, I was looking in the wrong place. The 500 error was NOT coming from our Drupal server. It was coming from the Solr server. Something that just wasn't obvious from the error message that I could see.

    Anyway, turned out that the error was very simple. The 'rebuild' of the index ran the system out of disk space. I killed off some old logs and all the RPMs that were used to install the solr system, and that provided enough space for the rebuild to finish and the old index to be removed.

    All is well again, but , for all the Google searching I did on this error, and the lack of response, I thought it would be a good thing to share what fixed it for me.

    Enjoy

    Kurt

    Jul 07 2009
    Jul 07

    Acquia Launches Cloud-based Solr Search Indexing

    Posted by Jeffrey Scott -TypeHost Web Development | Tuesday, July 7th, 2009
    , , , , ,

    Acquia, the start-up company founded by Dries Buytaert, the lead developer & founder of Drupal, has announced that they are now providing paid search indexing for Drupal sites on a subscription basis aimed at enterprise sites. Similar to Mollom, Acquia’s anti-spam software for CMS platforms, Acquia Search will also work for those running other open source software like WordPress, Joomla, TYPO3, etc as well as sites with proprietary code. Acquia Search is based on the Lucene and Solr distributions of Apache, and essentially works by having Acquia index your site’s content on their computers and then send it with encryption on demand to supply user queries using an integrated Acquia Search module. According to the announcement, Acquia is using Solr server farms on Amazon EC2 to power this on cloud architecture.

    Many people have complained about Drupal’s core search functionality over the years, but the server requirements behind Solr and Lucene require a Java extension that most people are not equipped to manage on their existing IT architecture, staff, or budget. So Acquia is offering these search functionalities as SaaS, or Software as a Service on a remote-hosted, pre-configured basis. If you want to do it yourself, see:
    http://drupal.org/project/apachesolr

    Reference: http://en.wikipedia.org/wiki/Solr

    According to Dries:

    “Acquia Search is included for no additional cost in every Acquia Network subscription. Basic and Professional subscribers have one ‘search slice’ and Enterprise subscribers have five ‘search slices’. A slice includes the processing power to index your site, to do index updates, to store your index, and to process your site visitors’ search queries. Each slice includes 10MB of indexing space – enough for a site with between 1,000 and 2,000 nodes. Customers who exceed the level included with their subscription may purchase additional slices. A ten-slice extension package costs an additional $1,000/year, and will cover an additional 10,000 – 20,000 nodes in an index of 100MB. For my personal blog, which has about 900 nodes at the time of this writing, a Basic Acquia Network subscription ($349 USD/year) would give me all the benefits of Acquia Search, plus all the other Acquia Network services.”1

    Put in this perspective, most Drupal users likely won’t be switching to Acquia Search anytime soon. But, for the most part… they have little need to. For small sites or social networks, Drupal’s core search is going to be generally sufficient. Drupal will index your site automatically on cron runs, and keep this index of keywords and nodes in a table of your MySQL database. If you are working a lot with taxonomy and CCK fields, then Faceted Search is a recommended choice: http://drupal.org/project/faceted_search

    I have used Faceted Search on a number of sites and it is excellent for building a custom search engine around your site’s own custom vocabularies, hierarchies, and site structures. Faceted Search is also important in a number of Semantic Web integrations working with RDF data and other micro-tags attached to data fields. Acquia Search is designed to work in this way as well as to facilitate the number crunching involved when high traffic sites with extremely large databases of content need to sift through search archives quickly to return results from user queries. Consider the example of Drupal.org in this context – Acquia Search is the solution to managing over 500,000 nodes and millions of search queries on an extremely active site.

    “Reality is that for a certain class of websites — like intranets or e-commerce websites — search can be the most important feature of the entire site. Faceted search can really increase your conversions if you have an e-commerce website, or can really boost the productivity of your employees if you have a large intranet. For those organizations, Drupal’s built in search is simply not adequate. We invested in search because we believe that for many of these sites, enterprise-grade search is a requirement… The search module shipped with Drupal core has its purpose and target audience. It isn’t right for everyone, just as Acquia Search is not for everyone. Both are important, not just for the Drupal community at large, but also for many of Acquia’s own customers. Regardless, there is no question that we need to keep investing and improving Drupal’s built-in search.”2

    In summary, Acquia Search is mostly targeted at enterprise level Drupal users with extremely large databases and high traffic, and is a cloud based solution that should not only speed up the rate of return on results, it should also improve the quality of the material returned based on faceted keywords & vocabularies. For those using Acquia’s personal or small business subscription accounts, the new search should appear as an additional “free bonus” with your monthly package of services. For users, even on a small site, the efficiency of faceted search may make information more accessible for visitors.

    To learn more, visit: http://buytaert.net/acquia-search-benefits-for-visitors

    1. http://buytaert.net/acquia-search-available-commercially [?]
    2. http://buytaert.net/acquia-search-versus-drupal-search [?]

    Pings/Trackbacks

    May 31 2009
    May 31

    I keep meaning to write shorter blog posts. For once this one is easy to write as a short piece. Solr is a powerful search server, and there has been some great work making a solr module that integrates really cleanly with Drupal. I've kept putting off trying it out because of the expected pain of having to get the correct java version, and then fight with tomcat to get it to work properly etc. etc. But and it short...

    It works!

    It works out of the box with the openjdk-1.6.0 rpms, with the packaged jetty container. Follow the README to get the phplibrary and swap over the configs in the example as described. Run java -jar start.jar and your in business! The docs explain the packaged jetty is good for a single instance (smaller) production site even.

    But it works with Tomcat too! Not quite out the box for me, but using the tomcat5 rpm and following the SolrTomcat "Configuring Solr Home with JNDI" section, just:

    1. copying the apache-solr-nightly/example/solr/ as the /my/solr/home and
    2. copying the apache-sorl-nightly/dist/apache-solr-nightly.war as /some/path/solr.war
    3. both to somewhere tomcat could get to it (it's running as user tomcat)
    4. swapping over the configs just as before with the jetty README version

    Plus the not quite out the box bit. It was giving me the error SEVERE: Exception starting filter SolrRequestFilter Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory implementation found for the object model: http://java.sun.com/jaxp/xpath/dom. The libraries for this xml are installed in the dependencies (xalan-j2 and xerces-j2) but clearly not in a path expected somewhere. So for now I just symlinked them from location (/usr/share/java/.) into the (/usr/share/tomcat5/webapps/solr/WEB-INF/lib) directory. But even with tomcat with the rpms that was it!

    Brilliant!

    About Drupal Sun

    Drupal Sun is an Evolving Web project. It allows you to:

    • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
    • Facet based on tags, author, or feed
    • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
    • View the entire article text inline, or in the context of the site where it was created

    See the blog post at Evolving Web

    Evolving Web