Feb 13 2017
Feb 13

The definition of “what a search page is” varies from project to project. Some clients are happy with the core Search module, others want a full blown search engine.

Drupal offers a wide range of options when it comes to building custom search pages. You can create a basic search page using the core Search module or if you’re looking for something advanced you could use Search API.

In the recorded webinar I cover the following:

  • Search in Drupal 7 (1:24)
  • What’s new in Drupal 8 (8:37)
  • Create Search Page using Core Search (15:24)
  • Create Search Page using Views (19:20)
  • How to Modify Search Results (22:34)
  • Introduction to Search API (25:28)
  • How to Create Facets (38:20)

Modules Mentioned in Webinar

Extra Resources

Oct 24 2016
Oct 24

In this blog post I will present how, in a recent e-Commerce project built on top of Drupal7 (the former version of the Drupal CMS), we make Drupal7, SearchAPI and Commerce play together to efficiently retrieve grouped results from Solr in SearchAPI, with no indexed data duplication.

We used the SearchAPI and the FacetAPI modules to build a search index for products, so far so good: available products and product-variations can be searched and filtered also by using a set of pre-defined facets. In a subsequent request, a new need arose from our project owner: provide a list of products where the results should include, in addition to the product details, a picture of one of the available product variations, while keep the ability to apply facets on products for the listing. Furthermore, the product variation picture displayed in the list must also match the filter applied by the user: this with the aim of not confusing users, and to provide a better user experience.

An example use case here is simple: allow users to get the list of available products and be able to filter them by the color/size/etc field of the available product variations, while displaying a picture of the available variations, and not a sample picture.

For the sake of simplicity and consistency with Drupal’s Commerce module terminology, I will use the term “Product” to refer to any product-variation, while the term “Model” will be used to refer to a product.

Solr Result Grouping

We decided to use Solr (the well-known, fast and efficient search engine built on top of the Apache Lucene library) as the backend of the eCommerce platform: the reason lies not only in its full-text search features, but also in the possibility to build a fast retrieval system for the huge number of products we were expecting to be available online.

To solve the request about the display of product models, facets and available products, I intended to use the feature offered by Solr called Result-Grouping as it seemed to be suitable for our case: Solr is able to return just a subset of results by grouping them given an “single value” field (previously indexed, of course). The Facets can then be configured to be computed from: the grouped set of results, the ungrouped items or just from the first result of each group.

Such handy feature of Solr can be used in combination with the SearchAPI module by installing the SearchAPI Grouping module. The module allows to return results grouped by a single-valued field, while keeping the building process of the facets on all the results matched by the query, this behavior is configurable.

That allowed us to:

  • group the available products by the referenced model and return just one model;
  • compute the attribute’s facets on the entire collection of available products;
  • reuse the data in the product index for multiple views based on different grouping settings.

Result Grouping in SearchAPI

Due to some limitations of the SearchAPI module and its query building components, such plan was not doable with the current configuration as it would require us to create a copy of the product index just to apply the specific Result Grouping feature for each view.

The reason is that the features implemented by the SearchAPI Grouping are implemented on top of the “Alterations and Processors” functions of SearchAPI. Those are a set of specific functions that can be configured and invoked both at indexing-time and at querying-time by the SearchAPI module. In particular Alterations allows to programmatically alter the contents sent to the underlying index, while the Processors code is executed when a search query is built, executed and the results returned.
Those functions can be defined and configured only per-index.

As visible in the following picture, the SearchAPI Grouping module configuration could be done solely in the Index configuration, but not per-query.

SearchAPI: processor settings

Image 1: SearchAPI configuration for the Grouping Processor.

As the SearchAPI Grouping module is implemented as a SearchAPI Processor (as it needs to be able to alter the query sent to Solr and to handle the returned results), it would force us to create a new index for each different configuration of the result grouping.

Such limitation requires to introduce a lot of (useless) data duplication in the index, with a consequent decrease of performance when products are saved and later indexed in multiple indexes.
In particular, the duplication is more evident as the changes performed by the Processor are merely an alteration of:

  1. the query sent to Solr;
  2. the handling of the raw data returned by Solr.

This shows that there would be no need to index multiple times the same data.

Since the the possibility to define per-query processor sounded really promising and such feature could be used extensively in the same project, a new module has been implemented and published on Drupal.org: the SearchAPI Extended Processors module. (thanks to SearchAPI’s maintainer, DrunkenMonkey, for the help and review :) ).

The Drupal SearchAPI Extended Processor

The new module allows to extend the standard SearchAPI behavior for Processors and lets admins configure the execution of SearchAPI Processors per query and not only per-index.

By using the new module, any index can now be used with multiple and different Processors configurations, no new indexes are needed, thus avoiding data duplication.

The new configuration is exposed, as visible in the following picture, while editing a SearchAPI view under “Advanced > Query options”.
The SearchAPI processors can be altered and re-defined for the given view, a checkbox allows to completely override the current index setting rather than providing additional processors.

Drupal SearchAPI: view's extended processor settings

Image 2: View’s “Query options” with the SearchAPI Extended Processors module.

Conclusion: the new SearchAPI Extended Processors module has now been used for a few months in a complex eCommerce project at Liip and allowed us to easily implement new search features without the need to create multiple and separated indexes.
We are able to index Products data in one single (and compact) Solr index, and use it with different grouping strategies to build both product listings, model listings and model-category navigation pages without duplicating any data.
Since all those listings leverages the Solr FilterQuery query parameter to filter the correct set of products to be displayed, Solr can make use of its internal set of caches and specifically the filterCache to speed up subsequent searches and facets. This aspect, in addition to the usage of only one index, allows caches to be shared among multiple listings, and that would not be possible if separate indexes were used.

For further information, questions or curiosity drop me a line, I will be happy to help you configuring Drupal SearchAPI and Solr for your needs.

Jun 30 2016
Jun 30

I feel really excited to have cleared the mid-Term requirement for my project in Google Summer of Code (GSoC). The results of the mid-Term evaluations were announced June 28, 00:30 IST. This was the evaluation for the first phase of GSoC. In this evaluation process, set up by GSoC organisers, students and mentors have to share their feedback about the current progress of the project. Mentors need to give a pass/ fail grade. Students can continue coding once they clear the evaluations successfully.

I have been working on Porting Search Configuration module to Drupal 8. Please go through my previous posts if you would like to have a look into the past activities in this port process.

Last week I worked on testing some of the units of this module using the Php unit tests framework. Testing is an important process when it comes to any software development process. It plays a crucial role for any software. It helps us to understand the improve our software to the required level by making use of various test cases. We input various values and check whether the tests are passed according to the requirement. If any condition fails to our expectations, we need to make the required changes to suit the application needs.

Php unit tests are generally used to test some units of an application. To check whether the functions implemented gives the expected output, behaviour of the functions in various test cases, giving different types of arguments as inputs to check the errors or flaws for improving the application.

We need to install the Php unit for this process. You could follow this documentation for this process. Furthermore, they give a detailed analysis of the Php Unit Tests.

Once the installation is completed, we can start writing the unit tests for the functionalities we have implemented. The tests are generally stored in the tests/src/Unit directory of the module. The name of the unit test file will be of the format xyzTest.php. All tests are suffixed by ‘Test’. ‘xyz’ can be replaced by the functionality you are going to test.

The following is a simple test to check the sum of two numbers: sumTest.php

<?php
class SampleTest extends PHPUnit_Framework_TestCase
{
  public function testSum()
  {
    $this->assertEquals(2+2, 4);
  }
}
?>

As mentioned in this above code snippet, we need to create a class, with class name suffixed by ‘Test’ which is an extension of PHPUnit_Framework_TestCase. Now, we need to write the tests inside as member functions. The functions starting with the name test are executed. Here we are checking the sum of the two numbers. This is a very simple demonstration.

The tests are run by using the command PHPUnit. i.e,

$ phpunit tests/src/Unit/sumTest.php

The output generated on running the above test is:

PHPUnit 5.4.6 by Sebastian Bergmann and contributors.

. 1 / 1 (100%)

Time: 252 ms, Memory: 13.25MB

OK (1 test, 1 assertion)

Stay tuned for future updates on this module port.

Jun 14 2016
Jun 14

Google Summer of Code (GSoC’ 16) is entering into the mid-Term evaluation stage. I have been working on the porting search configuration module for Drupal for the past three weeks.

Search configuration module helps to configure the search functionality in Drupal. This is really an important feature when it comes to a content management system like Drupal. I am almost mid-way in the port process as indicated in the timeline of Google Summer of Code.

It is really a great feeling to learn the Drupal concepts this summer. I would like to take this opportunity to share with you some key aspects I had to deal with in the past week.

Once a module is installed and later on if some changes are amended, we need not rewrite the Drupal code. Instead, Drupal gives up the option to make use of a hook, function hook_update_N which helps us to write the update needed and the database schema gets updated accordingly. Currently, since my module is just going into its first release after this port process, I need not write an update function. I just have to make this update in connection with the earlier code. The same hook works for both Drupal 7 and 8.

Another feature is the hook_node_insert, this hook is invoked to insert a new node into the database. So, here we are writing into the database a drupal record. In Drupal 7, this insertion was done by the drupal_write_record(). But,in D8, it has been replaced by the merge query and the entity API. The merge queries support the insert and update options to the database.

In connection with the nodes, another hook function is the hook_node_update. This functionality updates the node contents which has been inserted into the Drupal record (database). This takes in an argument, obviously,  the node has to be passed, for which the updating is intended.

The hook_uninstall gets invoked as the name suggests, in the uninstall process of the modules. The hook removes the variable used by the module so as to free the memory. It also modifies the existing tables if required. The D7 version of Drupal used the  variable_del for removing the variables.

For instance,

variable_del($nameOfVariable);

// Drupal 7 code

This has been replaced by the delete() of the configuration API.

i.e,

\Drupal::service('config.factory')->getEditable('search_config.settings')->delete();

search_config.settings is the default configuration file.

I will post the updates on this port process regularly. Please wait for the future posts.

May 25 2016
May 25

At this month’s Sydney Drupal meet up I did a presentation about Search in Drupal 8. In the video, I explain three ways you can create a search page, they are as follows.

1. Core Search

The core Search module which comes with Drupal has some new functionality in Drupal 8. The biggest change is the ability to create custom search pages without using any other module.

2. Views Filter

A common way to build search pages in Drupal 7 was to create a views page and use the “Search Keywords” filter in views. This can still be done in Drupal 8 and best of all Views is now part of core.

3. Search API

The Search API module is used to create powerful search pages and it’s highly extensible. It is the module to learn and use for building search pages.

I hope this presentation helps and that you enjoy the video.

May 20 2016
May 20

I have been selected for the Google summer of Code’ 16 for Drupal for the project, Port search configuration module to Drupal 8. Thanks to all the developers in #Drupal IRC channel for guiding me into this summer project by sharing their ideas and suggestions.

The search configuration feature is presently available in Drupal 7 and its preceding versions. This is really a cool feature which helps us a lot in improving the search and enhancing it for better search results. This summer, I will be engaged in porting this module to Drupal 8.

The GSoC projects were announced on April 22, 2016. All selected students have a community bonding period till May 22nd. This is the time when students get closer to the organisation, learn the organisation code base, interact with mentors, plan more about the project and its deadline for the coding period which starts soon  after this community bonding.

I have been blessed with three experienced mentors from Drupal- Naveen Valecha, Neetu Morwani and Karthik Kumar. I have been discussing with them regarding the project plan for the last few weeks. Meanwhile, I was also asked to learn some of the basic concepts of Drupal like hooks, hook permissions, forms in Drupal which are the real components of my project. This helped me a lot to understand more about the coding methodologies I need to adopt.  I could go through the code base of the module in Drupal 7 which has helped me collect more ideas for the project.

I also got the opportunity to hack with some simple modules by creating a sandbox project in Drupal and pushing commits on sample module developments I did to learn the basics of the module. I have created a project in Drupal for the search configuration port and has added the tasks I need to complete in association with this process.

I  will be posting regular updates regarding my GSoC project here.

Best of luck to all the selected students.

Looking for a bright summer ahead, coding for Drupal.

Thank you.

May 10 2016
May 10

[embedded content]

Building a search page isn’t as straight forward as you’d think. At first a client will want something which users can search content, then they may want to modify the search results or even change the ranking of certain content. Long story short, something you thought would be as simple as enabling a module, ends up taking twice as long.

If you need to create custom search pages in Drupal 7, more often than not, you use Search API or create a search page using Views. But the core Search module for Drupal 8 has become more powerful than in Drupal 7. One of the big changes, for site builders, in Drupal 8 is the ability to create custom search pages.

Fig 1.0

However, there’re a few limitations to creating a search page. First, it’ll have a prefix of “search/” in front but the full URL can be changed by creating a URL alias. Second, you can only adjust the content ranking on these pages. If you want to index extra fields or remove ones from being indexed, you’ll still need Search API to do this.

In this tutorial, you’ll learn how to create a custom search page and how to modify the search results by overriding a template.

Getting Started

Just make sure you’ve installed Drupal using the Standard installation profile. By using the Standard profile, it’ll automatically install and configure the core Search module.

If you’re using the Minimal profile, then you’ll need to install the Search module manually.

Generating Test Content

It’s useful to have real content indexed while building a search page. If you’re like me, you don’t like to spend too much time creating dummy content only to have it blown away.

So to generate test content, I’ll use the “Devel generate” sub-module which ships with Devel.

Just install the sub-module and then go to Configuration, “Generate content”. Select which content types you want generated and click on Generate.

How to Create a Search Page

1. Go to Configuration, and click on “Search pages” from within the “Search and Metadata” section.

Fig 1.1

2. Scroll down to the “Search pages” section and from here you can manage the existing pages and create new ones.

Fig 1.2

3. From the “Search page type” select Content then click on “Add new page”.

4. Enter “Site search” in the Label field and add site to Path.

Fig 1.3

Then scroll to the bottom and click on “Add search page”.

5. Once you’re back on the “Search pages” page, you can reorder how they’re displayed.

Fig 1.4

And, you can set the default page by clicking on the down arrow in the “Operations” column.

Fig 1.5

Index Content

Before we can test the search page make sure the site content has been indexed. If it hasn’t then nothing will appear in the search results.

To check the index status, go to the “Search pages” page, the one we’re previously on and look at the “Indexing progress” section.

Fig 1.6

If you see “0% of the site has been indexed.”, then make sure you index the site, by running cron. It can be ran manually by going to “admin/config/system/cron” and clicking on “Run cron”.

Once indexed you should see “100% of the site has been indexed.”.

How to Search for Content

To access the search page directly go to /search and you should see all the search pages as tabs. If you’re logged in as an administrator, you’ll see Content, Users and “Site search” (the one we created).

Fig 1.7

If you’re an anonymous user, by default you won’t see the “Advanced search” field-set and the Users tab. This can be changed by adjusting permissions in Drupal.

So to search for content enter in a keyword into “Enter your keywords” and click search icon.

Permissions

I mentioned in the last section that certain parts of the search page, i.e., “Advanced search” and the Users tab are only accessible if you have permission.

The Search module comes with three permissions: “Administer search”, “Use advanced search” and “Use search”.

Administer search

This permission let’s you access the “Search pages” configuration page.

Use advanced search

This permission allows access to the “Advanced search” field-set.

Use search

This permission is required to access the search. This permission is granted to anonymous and authenticated users when you use the Standard installation profile.

Where is the Permission for Users Tab?

You may have noticed there’s no specific permission for the Users tab. That’s because there is none. Instead, the “View user information” permission from the User module is used to access the tab. This is a bit confusing and I had to look in the code to see which permission is required to access the tab.

So if you want users to search account profiles, give them the “View user information” permission.

How to Modify Search Results

Sometimes you may want to change the markup that’s rendered in the search results. This can be achieved by overriding a specific template. Just copy over the search-result.html.twig file which can be found in the Search module or your base theme.

Summary

The functionality of the search page can very from site to site. Sometimes the out-of-the-box solution is enough other times it’s not. The core Search module should only be used if you’re happy with what it offers. If you need something custom then you’ll need to build your own page using Views (we’ll look at this in the next tutorial) or Search API.

FAQs

Q: Nothing appears when I search for something?

Make sure you’ve indexed the content by running cron.

Q: Anonymous users can’t access the advanced search section?

Assign the Anonymous role the “Use advanced search” permission and they’ll have access to the advanced search form.

Q: Can I change the URL to a search page?

Yes. Just create a URL alias for the search page you want to change.

Nov 09 2013
Nov 09

This post explains which Drupal URL’s you would want to hide from visitors, what should not get indexed by search engines, and how to avoid duplicate paths and other URL confusion. The main tools used to accomplish this are the robots.txt file and the Global Redirect, Pathauto and Rabbit Hole modules.

By default, Drupal generates a lot of URL addresses. Taxonomy creates a page for each tag, each node usually gets at least 3 different ways to access it, not to mention all the custom and contrib stuff. All this can result in messy search results, confusion for the user, and unwanted pages being public. In one of Mekaia’s projects, our motivation to clean this mess up came from using the Google Site Search engine as the page search – this needed a very clean URL structure to make sure the user got exactly those results for his queries that we wanted to give him. The motivation to use the methods described here can come from a lot of different places – but this post is about the solutions, not the problems.

Lets look at three ways to improve this situation. Cleanup comes first, as this reduces the amount of stuff we need to hide from users or search engines later. Then content visibility, as this matters in more cases and also reduces the amount of stuff you’ll later want to hide from search. And then what to actually hide from search – this is fine tuning really, which is probably relevant only in some specific cases.

Cleanup

Lets take a common node. Your clean URL’s are turned on, node ID is “2” and the node has an URL alias “My happy times”. These are SOME of the ways to access this node:

  • /q=node/2
  • /node/2
  • /node/2/
  • /en/node/2
  • /my-happy-times

All pointing to the same content. The different ways of accessing a node can of course be combined, so that results in even more URLs – when you usually only want to use (and allow to use) one, the /my-happy-times.

A module called Global Redirect helps to solve this, by allowing you to select settings such as “Deslash”, “Non-clean to clean” and “Language path checking”. It also has some other settings like simplifying case sensitivity, or using canonical URL’s to take the load off from search engine crawling. The last one is quite the big topic in itself and probably needs a separate post entirely.

The module is easy to use and pretty self explanatory, so, you know… use it!

Another module I’d file under the Cleanup section is Pathauto. This allows you to create path patterns for your content, which is pretty much common Drupal knowledge. But in our case, we disabled taxonomy tag page aliases, to have only one single way to access those pages and make it easier to hide them from search engines. We also disabled the default user page alias pattern, because we did not want the user names to be public.

What is visible

Often in Drupal there is some content that gets used in views or in complex layouts, but does not have any purpose as a separate node page. For example a Banner content type – the banners get shown in a carousel on the front page. What about the node page? Its not meant to be used, so its not styled and looks like crap, but by default still public at /node/xxx. You can use an interesting little module called Rabbithole, which presents you with these options to hide that default node page:

Rabbithole settings

Rabbithole settings

So you can set it to “Page not found” and don’t have to worry about visitors wondering into unformatted pages or those pages appearing in search results.

What is searchable

If you have already cleaned up your URLs and hidden any unwanted node pages, your search results are already better, but there’s still some fine tuning to do.

Lets say that the user makes a search for “red+bears+eat+green+apples”. You have one newsitem on your site that matches this phrase exactly. But this newsitem (or part of it) is shown in many different places: at the frontpage top news block, at the filtered news archive page, and finally at the newsitem node page. Search engines have crawled all these pages, so all of them are shown in the results, and the newsitem node page might not even be the top result – even though thats probably the only correct result, as its the only one that has the full text of the newsitem.

This is important for SEO reasons, and even more important if you use an external search engine, such as Google Site Search, as your site’s internal search mechanism. The crawling can be controlled with the robots.txt file (its in your Drupal site root folder). For the example above, you can have something like:

Disallow: /news
Allow: /news/
Disallow: /*?*

The first line says that your general news archive page, ie. mysite.com/news, is not crawled. The second line says that the newsitem node pages, such as mysite.com/news/what-bears-eat, should however be crawled. And the third line is more of a global command, that says any URLs that are the results of Views exposed filters, such as mysite.com/news?year=2011, will not get crawled. Here is a nice post about how Google handles robots.txt wildcards and such.

The front page will still get crawled and the correct method there is to hide only the “top news” block from search engines, and not the whole page, but there does not seem to be a good way to do that. Or is there?

Happy cleaning!

Sep 15 2013
Sep 15

Some good articles about using the Google Search with your Drupal site have already been written, for example here. Most of the standard process involved is not that hard to figure out anyway. So this post tries to focus on a certain deeper, maybe a bit less discussed part of the topic, specifically on Google Site Search and how to turn that into filtered search for Drupal.

The background of this is that we had to build a complex in-site search functionality for a client, including such lovely stuff as cross-domain, predictions, lemmatizing (with support for Estonian language), indexing documents, and search filters. We chose to use Google Search, more specifically the paid Google Site Search (GSS).  Its main advantages for us over the free Google Custom Search were search results in raw XML format and, therefore, custom interface for the results display. Trying to stay agile, we didn’t worry too much about analyzing how to implement all the needed functionality on GSS, but just counted on the raw XML data to provide us enough flexibility to implement pretty much anything.

As always with Drupal, we started off by Adding A Module – the Google Site Search module for Drupal. Most of the above-mentioned functionality was easily achieved and needless to say, we won huge hours of development time compared to building everything from scratch. Then we got to the point where we had to implement filters.

The plan

The users of our site were meant to be able to filter search results by date, domain, content type and taxonomy. So an abstract user interface would be something like this:

Searbox and various filters

Searbox and various filters

Domain was easy: we could just use the “site:” restriction built into Google Search. For the other three, our first approach was to get the XML search results, and then filter that right before showing to the user. So if your search word is “cow” and filter is set to show only results with the “farm” tag, we would send the “cow” search request to Google, get the raw results, and then have a PHP script run through it to throw out anything that did not point to a Drupal node that had the “farm” tag attached.

This approach quickly ran into problems. It did not scale well, and because Google returns results page-by-page, pagination became messy.

So instead we looked into ways to make Google understand the context and details of our pages, so that we could directly use Google’s built in functions without any results-twisting and keep it clean and nice. Google’s answer to make your pages machine readable is something called the Structure Data. This allows you to “explain” to the Google indexing system about the metadata and context of your content.

There are a number of ways to provide Structured Data of your page. We considered three of them to be more or less suitable for our purposes – PageMaps, meta tags, and RDFa.

PageMaps seems to be Google’s own way to describe Structured Data. It’s basically XML in your page’s <head> section, where you can describe page data in free-form attributes such as “autor”, “date”, “subject” or whatever.  Any other format is eventually transformed into PageMaps.

<meta> tags have been around for a long time. So long that they actually felt outdated to us – and a bit limited in their usage.

RDFa on the other hand allows for pretty much unlimited freedom. And since we had to provide basic RDF support for our site anyway, we eventually chose this over PageMaps. After all, RDF is meant to be THE standard for machine readability.

The execution

We already had the GSS for Drupal module in place. Next task at hand was to provide RDF annotations for our content.

Drupal 7 has built in RDF support via the core RDF module. But we needed custom annotations for fields like Date or Subject. The easiest way to achieve this is to install the RDF Extensions module. Go to your Content Type edit page, click on the new “RDF Mappings” tab, and you can add RDF predicates to every field:

The RDF mapping interface in Drupal 7

Mapping RDF predicates for a content type

We chose dc:subject for Subject field, dc:date for Date, and dc:type for Content Type. So now we had some new code in our page source (as we didn’t want to show this metadata to the enduser, we did some templating first to hide the values of the RDF annotations and put them into a “content=” HTML5 attribute):

<span property="dc:subject" content="Science and research" />
<span property="dc:type" content="news" />
<span property="dc:date" content="20130509" />

To test this out, you can use Google’s own Rich Snippets testing tool. There’s a special tab for Custom Search engine and this will show you which tags the indexer has read in:

Output from Rich Snippets testing tool

Output from Rich Snippets testing tool

and how it has translated them to its own PageMap model:

Pagemap data read by Rich Snippets tool

Pagemap data read by Rich Snippets tool

Google actually reads a lot more data automatically from your page – for example Open Graph annotations you might have used to teach Facebook, meta viewport tags, derived date etc. I have cleared all that from the screenshot to make it more simple.

The PageMap model presented by Rich Snippets tool is important – because this is the actual syntax used in the Google Search box to restrict the search results, exactly as you would use the “site:” restriction. So you can write something like

cow more:pagemap:item-subject:farm

into your GSS search box, and Google would understand that you only want to display results that have the “farm” Subject tag attached. This is called “Filtering by Attribute” in Google lingo, and is described in more detail at the Google Developers “Filtering and sorting search results” page.

That is the core of what you need to know. Now you can write a bit of background code that adds the (hidden, if you want) “more:pagemap:” restriction into the search box whenever a user clicks on the filters. There are some additional details, like for the Date you would want to specify a range criteria and you have to use the Restric to range method, which for some reason does not work directly in the search box and therefore you have to hack patch the GSS module a bit to make such a query possible.

Useful notes

Google Site Search pricing goes only up to 500 000 queries. This can fill up fast with bigger sites, because a query is not only the request for search results, but gets made much more often than that – for example when changing search results page, or getting predictions while you type. So plan ahead, because at this time (mid-2013), the only next option above 500K queries is the Google Search Appliance server-side platform, and thats a very different level of cost.

When developing stuff for GSS, you’ll constantly need to test, and that means you’ll constantly need to re-index your site. Register your site at Webmaster Tools and submit a sitemap, so that you can have on-demand indexing crawl your whole site with a single click command – there’s a button in the GSS admin interface.

Jan 02 2013
Jan 02

Happy New Year! We're kicking off 2013 with some FREE videos to get people up and running with our Drupal community tools. There are a lot of aspects of the Drupal community that many people take for granted. Even something as "simple" as figuring our what community websites are out there, and how to use them, is often overlooked when talking to people new to Drupal. So, if you want to really dive into this Drupal thing in 2013, here is a gentle orientation to help get you started. We've added three new videos to our free Community category that walk you through the various community websites, how to get an account, and what you can do with it, from customizing your dashboard, to editing and creating new documentation on Drupal.org. We also take a look at how to use the main search on Drupal.org so you can start finding the things you're looking for.

We hope you find these videos helpful, and we plan to keep creating more community videos over the coming months. Let us know if there is something in particular about our community that is mysterious to you, and we'll add it to our list.

Dec 13 2012
Dec 13

If you've searched for anything online, you're probably familiar with the handy "number of results" counter that's often displayed alongside the matching documents. You know -- that nice "Displaying results 11–20 of 196"? Somehow, Drupal 7's core Search module still doesn't doesn't include that information on its standard results page!

A lot of sites handle search with Apache Solr or use Search API to display search results in a View, and both make it easier to show a result count. For simple sites without many nodes, the core Search works just fine… except for the glaring omission of a result count. I wanted a quick solution, and I found one that worked for me.

The best method I tried is to sneak a peak at the pager information. For any pager displayed—be it for search results or a View of blog posts or nodes promoted to the front page with no Views at all—the total number of items is stored in a global variable. This way, Drupal can determine the total number of pages in the set, so that it knows where to take you when you click the "Last" page link. This is an imperfect solution, but it was the best I've found. I started with Eric London's 2009 post for Drupal 6, and made some improvements from there. For example, the format of displayed result count depends on the total number of results:

  • 1: "Displaying 1 result"
  • 2 to 10 (one page): "Displaying 5 results"
  • 11 and above (multiple pages): "Displaying 11 - 20 of 44 results"

There are three steps to add this result count to the page:

1. Add a preprocess hook. I won't get into the details of preprocess hooks here (you can learn more about that from Drupalize.me). Open template.php in your theme directoy, and paste in the preprocess code provided below. Replace "THEMENAME" with the name of your theme.

2. Copy search-results.tpl.php into your theme directory. You'll find the original in modules/search. Leave that intact and make a copy.

3. Print the result code in the template. Edit your copy of search-results.tpl.php to print the search result count wherever you want. I put mine right below the "Search results" header tag.

<?php print $search_totals; ?>

Preprocess code

/**
* Implements hook_preprocess_search_results().
*/
function THEMENAME_preprocess_search_results(&$vars) {
  // search.module shows 10 items per page (this isn't customizable)
  $itemsPerPage = 10;

  // Determine which page is being viewed
  // If $_REQUEST['page'] is not set, we are on page 1
  $currentPage = (isset($_REQUEST['page']) ? $_REQUEST['page'] : 0) + 1;

  // Get the total number of results from the global pager
  $total = $GLOBALS['pager_total_items'][0];

  // Determine which results are being shown ("Showing results x through y")
  $start = (10 * $currentPage) - 9;
  // If on the last page, only go up to $total, not the total that COULD be
  // shown on the page. This prevents things like "Displaying 11-20 of 17".
  $end = (($itemsPerPage * $currentPage) >= $total) ? $total : ($itemsPerPage * $currentPage);

  // If there is more than one page of results:
  if ($total > $itemsPerPage) {
    $vars['search_totals'] = t('Displaying !start - !end of !total results', array(
      '!start' => $start,
      '!end' => $end,
      '!total' => $total,
    ));
  }
  else {
    // Only one page of results, so make it simpler
    $vars['search_totals'] = t('Displaying !total !results_label', array(
      '!total' => $total,
      // Be smart about labels: show "result" for one, "results" for multiple
      '!results_label' => format_plural($total, 'result', 'results'),
    ));
  }
}

If you'd like a core solution for the search result count, consider rolling a patch on this old issue. The last attempt was nearly four years ago, so you may need to start from scratch. Feature freeze for Drupal 8 was just pushed back to Feb 18, so you've got time to get it done!

Nov 14 2012
Nov 14

This is a small tidbit of information in the event that you wanted to alter the Drupal search results page. You can add a custom CSS class to the last search result item (for whatever reason you may have). In my case, I wanted to remove the border-bottom from the last result, so I had to add a special CSS class to do this.

Just follow these simple steps:

  1. Override template_preprocess_search_results

    Here is how to alter the code. This goes in your template.php:

    function yourthemename_preprocess_search_results(&$variables) {
      $variables['search_results'] = '';
      if (!empty($variables['module'])) {
        $variables['module'] = check_plain($variables['module']);
      }
      //checking the total number of results
      $num_results = count($variables['results']);
      $counter = 0;
      foreach ($variables['results'] as $result) {
        $counter++;
       
        if ($num_results == $counter) {
            //means we have the last result so we add the class
            $variables['search_results'] .= theme('search_result', array('result' => $result, 'module' => $variables['module'], 'last' => 'last'));   
        }
        else {
            $variables['search_results'] .= theme('search_result', array('result' => $result, 'module' => $variables['module']));
        }
      }
     
      $variables['pager'] = theme('pager', array('tags' => NULL));
      $variables['theme_hook_suggestions'][] = 'search_results__' . $variables['module'];
    }

  2. Now we override search-result.tpl.php Create this file and put it in your custom theme folder.

    <li class="<?php print $classes.' '.$last; ?>"<?php print $attributes; ?>>
        <?php print render($title_prefix); ?>
        <h3 class="title"<?php print $title_attributes; ?>>
        <a href="<?php print $url; ?>"><?php print $title; ?></a>
        </h3>
        <?php print render($title_suffix); ?>
        <div class="search-snippet-info">
        <?php if ($snippet): ?>
        <p class="search-snippet"<?php print $content_attributes; ?>><?php print $snippet; ?></p>
        <?php endif; ?>
        <?php if ($info): ?>
        <p class="search-info"><?php print $info; ?></p>
        <?php endif; ?>
        </div>
        </li>

  3. Clear your cache

Now if you search for something, you will notice that your very last search result has the CSS class of "last".

This also works for search results that have a pager. That is, the last result on every page will have the class of "last".

Aug 01 2012
Aug 01

In this lesson we cover using the search display portion of Display Suite. This allows you to have a custom layout for your search results as well as your user search results. Display Suite offers lots of functionality here that you just can't get with a core configuration of search.

Jun 25 2012
Jun 25

Posted Jun 25, 2012 // 0 comments

Apache Solr is one of the great solutions for providing search functionality to one's site and there are many modules for Drupal that provide the functionality. The Search API and ApacheSolr modules are two great examples of such. I became a huge fan of the options that Search API provided until I ran into one major roadblock... the site(s) I was implementing search for, was hosted on Acquia and using Acquia's Solr service, which only supports the ApacheSolr module.

So, what's the drawback? Well, out of the box ApacheSolr only supports nodes for indexing. Taxonomy and entities are ignored, which for many of the requirements for my sites in progress was necessary. Was it the end of the world? Not at all. After a little research I found that taxonomy and other entities could be added for indexing by the simple implementation of hook_apachesolr_entity_info_alter.

Though I'm not directly demonstrating it, taxonomy can be included as well as Drupal (for the most part.) The taxonomy is treated like an entity.

Here's an example of using the hook to add some custom photo and text entities for indexing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
 * Implements hook_apachesolr_entity_info_alter().
 *
 * Adds Text and Photo items to the search index.
 */
function tr_search_apachesolr_entity_info_alter(&$entity_info) {
  $entity_info['text_item']['indexable'] = TRUE;
  $entity_info['text_item']['status callback'] = 'tr_search_status_callback';
  $entity_info['text_item']['document callback'][] = 'tr_search_solr_document';
  $entity_info['text_item']['reindex callback'] = 'tr_search_solr_reindex_text_item';
  $entity_info['text_item']['index_table'] = 'apachesolr_index_entities_text_item';
  
  $entity_info['photo_item']['indexable'] = TRUE;
  $entity_info['photo_item']['status callback'] = 'tr_search_status_callback';
  $entity_info['photo_item']['document callback'][] = 'tr_search_solr_document';
  $entity_info['photo_item']['reindex callback'] = 'tr_search_solr_reindex_photo_item';
  $entity_info['photo_item']['index_table'] = 'apachesolr_index_entities_photo_item';  
}

Within that hook we are saying that we wish the two new entities to be included for indexing.

The structure of the $entity_info array is a series of sub-arrays keyed by the entity bundle name followed by a series values for the appropriate callbacks for each of the new entities.

For those elements, let's step through what each means with an example of a callback function.

  • "Indexable" - indicates to ApacheSolr that the entity can be indexed and exposes the entity within the ApacheSolr configuration pages in Drupal's admin.
  • "status callback" - indicates that the entity is enabled and available in the system
    Example:
    1
    2
    3
    4
    5
    6
    
    /**
     * Callback for the search status. Since all text and photo items are 'published', this is always true.
     */
    function tr_search_status_callback($term, $type) {
      return TRUE;
    }

    In this case, we are just returning TRUE, but more advanced logic could be implemented to indicate the status pending advanced needs.
  • "document callback" - callback to provide the content/values that will be sent to Solr server for indexing.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    
    /**
     * Builds the information for a Solr document.
     *
     * @param ApacheSolrDocument $document
     *   The Solr document we are building up.
     * @param stdClass $entity
     *   The entity we are indexing.
     * @param string $entity_type
     *   The type of entity we're dealing with.
     */
    function tr_search_solr_document(ApacheSolrDocument $document, $entity, $entity_type) {
      // Headline
      $document->label = apachesolr_clean_text((!empty($entity->field_tr_headline[LANGUAGE_NONE][0])) ? $entity->field_tr_headline[LANGUAGE_NONE][0]['value'] : '');
     
      // Credit
      $document->ss_credit = (!empty($entity->field_tr_credit[LANGUAGE_NONE][0])) ? $entity->field_tr_credit[LANGUAGE_NONE][0]['value'] : '';
     
      // Slug
      $document->ss_slug = (!empty($entity->field_tr_slug[LANGUAGE_NONE][0])) ? $entity->field_tr_slug[LANGUAGE_NONE][0]['value'] : '';
      $document->tus_slug = (!empty($entity->field_tr_slug[LANGUAGE_NONE][0])) ? $entity->field_tr_slug[LANGUAGE_NONE][0]['value'] : '';
     
      // Content
      if ($entity_type == 'photo_item') {
        $document->content = apachesolr_clean_text((!empty($entity->field_tr_caption[LANGUAGE_NONE][0])) ? $entity->field_tr_caption[LANGUAGE_NONE][0]['value'] : '');
      }
      else {
        $document->content = apachesolr_clean_text((!empty($entity->field_tr_text_body[LANGUAGE_NONE][0])) ? $entity->field_tr_text_body[LANGUAGE_NONE][0]['value'] : '');
      }
     
      $documents = array();
      $documents[] = $document;
      return $documents;
    }

    Pretty much checking the entity fields that we want to send we capture their values as properties of the solr document object and return
  • "reindex callback" - callback to provide instructions for when the content is being re-indexed in the system
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
    /**
     * Reindexing callback for ApacheSolr, for text items
     */
    function tr_search_solr_reindex_text_item() {
      $indexer_table = apachesolr_get_indexer_table('text_item');
      $transaction = db_transaction();
      $env_id = apachesolr_default_environment();
      try {
        db_delete($indexer_table)
          ->condition('entity_type', 'text_item')
          ->execute();
          
        $select = db_select('text_item', 't');
        $select->addField('t', 'eid', 'entity_id');     
        $select->addExpression(REQUEST_TIME, 'changed');
     
        $insert = db_insert($indexer_table)
          ->fields(array('entity_id', 'changed'))
          ->from($select)
          ->execute();
      }
      catch (Exception $e) {
        $transaction->rollback();
        watchdog_exception('Apache Solr', $e);
        return FALSE;
      }
     
      return TRUE;
    }
  • "index_table" - the name of the database table that ApacheSolr will locally store references for.
    This table is one that you define in a *.install file using drupal's schema api. The table structure should mimic the same structure that ApacheSolr already defines for nodes.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    
    /**
     * Implements hook_schema().
     */
    function tr_search_schema() {
      $types = array(
          'text_item' => 'apachesolr_index_entities_text_item',
          'photo_item' => 'apachesolr_index_entities_photo_item',
      );
      foreach ($types as $key => $type) {
        $schema[$type] = array(
          'description' => t('Stores a record of when an entity changed to determine if it needs indexing by Solr.'),
          'fields' => array(
            'entity_type' => array(
              'description' => t('The type of entity.'),
              'type' => 'varchar',
              'length' => 128,
              'not null' => TRUE,
              'default' => $key,          
            ),
            'entity_id' => array(
              'description' => t('The primary identifier for an entity.'),
              'type' => 'int',
              'unsigned' => TRUE,
              'not null' => TRUE,
            ),
            'bundle' => array(
              'description' => t('The bundle to which this entity belongs.'),
              'type' => 'varchar',
              'length' => 128,
              'not null' => TRUE,
              'default' => $key,
            ),
            'status' => array(
              'description' => t('Boolean indicating whether the entity is visible to non-administrators (eg, published for nodes).'),
              'type' => 'int',
              'not null' => TRUE,
              'default' => 1,
            ),
            'changed' => array(
              'description' => t('The Unix timestamp when an entity was changed.'),
              'type' => 'int',
              'not null' => TRUE,
              'default' => 0,
            ),
          ),
          'indexes' => array(
            'changed' => array('changed', 'status'),
          ),
          'primary key' => array('entity_id'),
        );
      }
      return $schema;
    }

Using this base set of code you can be able to add any drupal entity to ApacheSolr for indexing. Not everything demonstrated may conform to ones exact needs but, the premise remains the same in allowing entities not supported out of the box to be added to search indexes.

(A hand to Brad Blake for contribution in the code as well)

Geoff is a developer at Phase2 who originally studied computer science in college before switching to culinary arts. Thus began a career that led Geoff to become an executive chef for different restaurants in Columbia, SC.

After close ...

Dec 19 2011
Dec 19

If you are familiar with Drupal Views, you might have come across a very nifty feature called exposed filters. If you expose one of the fields as a filter then views provides a widget and a search option, where the exposed field can be searched. There is good video to learn about filter here.

However, we needed to extend the functionality so all the fields of the content type are exposed as  ‘searchable’ fields, not just the exposed fields. If we expose every fields explicitly, then the views filter will create a textbox for each field, which is not pretty. So in our example, we wanted to search ‘Person’ content type. And if the user entered either the first name, last name, address, of any of the values for the fields in the Person content type, a result would be returned.

In order to do this, we extended the hook_views_query_alter (&$view, &$query)

if ($view->name == 'people_list') {

if (startsWith($query->where[0]['clauses'][0], 'search_index.word')) {

$query->where[0]['clauses'][0] = "search_index.word LIKE '%s'";

$query->where[0]['args'][0] = '%' . $query->where[0]['args'][0] . '%';

}

}

Essentially we are modifying the where clause of the query to search with LIKE %<search term>% which will search across all the fields in the content type.

This simple extension of the hook will enable us to search across all the fields of the content type, in addition to the one that is exposed.

In addition to the code above, in the view we also have to add the ‘ Search: Search Terms ‘ as the exposed field in the filter. This step makes sure that the view uses the Drupal search as the search and filter the content type.

Nov 30 2011
Nov 30

Achieve Internet Releases the Drupal 7 ApacheSolr Media Module

This module allows website administrators to index files of any type so they can be included in site-wide search results. This is very useful for enterprise websites that need to manage a large number of files, such as videos, PDFs, documents in Excel, Word, and PowerPoint, as well as images. ApacheSolr Media module can index any field within the file entities, including title, description, and taxonomy fields.

Why the ApacheSolr Media Module

Over the years Achieve Internet has built our fair share of publishing and media websites.  A few years back it was only the large entertainment companies like NBCU or publishers like Fastcompany.com that required complex media management.  A lot has changed over the last five years and today it seems everyone is a publisher of one kind or another. An even greater challenge is enterprise organizations have gone global and the need for files and media to be distributed over multiple languages is at an all time high.   A great example of this issue is how organizations are managing their printed material, such as installation instructions, troubleshooting guides, catalogs, data sheets, brochures, and marketing material. It’s one thing for a website administrator to find those files, it’s a completely different challenge to make those files available via public facing search results. Achieve Internet’s new ApacheSolr Media module allows your website visitors access to all these files through a simple site search.

Example of This Module in Action on Hunterindustries.com

(The files below are PDF and assorted zip files, however this can be used for video, documents, even zipped files)

Screen Shot - ApacheSolr Media - Screen Shot 2.png

Screen Shot - ApacheSolr Media - Screen Shot 3.png

Engineering Challenges

One of the challenges faced in building the ApacheSolr Media module was to review all published nodes, detect all referenced files, and create a separate Solr document for every referenced file. Some nodes contained a large number of referenced files, so the page timing-out is a real issue. Achieve solved this issue by including each file attached to a node to count toward the node processing limit per cron run. For example, if the ApacheSolr Media module is configured to index up to 10 nodes per cron run, and the nodes have five files per node, then the module will index two nodes and 10 files per cron run.

The ApacheSolr Media module is the fourth module released by Achieve Internet in the fall of 2011.  The other modules are:

Media Updates

Fresh content is one key to great web experiences. This module simplifies the process of updating media content by allowing replacement of existing files. This release includes the capability to quickly and easily replace a media file currently in use at various locations on your site.   Media Updates Module Blog

Views Media Browser

One of the biggest problems in managing media is being able to find assets. This module enables views filtering, allowing you to refine by clarifying any type of field in your media files. Filtering by taxonomy terms and searching text fields are just two powerful examples. Having the capability to selectively screen information is extremely valuable. Views Media Browser Module Blog

Media Translation

This new module allows you to easily manage media files and taxonomy terms within multiple languages. Additional capability includes an automated “detect and replace” function that aligns files with the language mode displayed.  Media Translation Blog

Combining these modules together with the original Drupal Media module can produce a powerful and rich media management experience. An example of that power is by adding the Media Translation module to the ApacheSolr Media module, we can create “translation sets” of files, which will group together all translated versions of the same file. By integrating the two modules, we can create a node, attach files to it, and then when the node is translated, the correct language version of the referenced files automatically display on the correct language site – including in the search results. 

Assumptions

Like every good module Achieve did need to set the parameters to accomplish our goal of helping publishing and media related websites manage their files.  The most important assumption that dictates the outcome of this module would be; that only files to be indexed for the search are files that are referenced by a node. For example, for one of our recent clients, the only files included in the search results are those that are attached to a published product node, support document, or other node. This makes it much easier to display only relevant and current content without having to frequently delete outdated file content from the site.

This module also assumes that you are using version 1 of the Media module, and that your nodes use media selector fields to reference the files.

Items to Consider

There are a few issues that need to be considered before installing the ApacheSolr Media module:

Files must be attached to at least one content type that is indexed by Solr, and the files to be indexed must be attached in a media selector field. There can be multiple media selector fields per node.

This module does not index the content of the files – for example, the PDF file itself, or the Excel file, etc. It indexes only the fields in the file entities.

All file types to be indexed for the search must use the same title field.

Installation and Setup

Getting the ApacheSolr Media module set up and ready to use is simple – just install the module.

To set up the remainder of the items to support this module:

Configure the File Types

1.     Go to Configuration and select File types (admin/config/media/file-types).

2.     Click on manage fields for the file type you want to configure.

3.     Create a generic File Title field of type Text or add an existing File Title field.

       You must have a single Title field that is used by all file types; otherwise your files will  not have a title in the search results.

4.     Add any additional fields needed.

5.     Repeat for all file types you want to index for the search.

Configure the Content Types

1.     Go to Structure and select Content types (admin/structure/types).

2.     Click on manage fields.

3.     Add one or more Multimedia asset fields to the content type. These are the fields that will reference the file entities to be indexed.

Note: You must have at least one content type that is indexed by Solr that contains a Multimedia asset field.

Achieve recommends reusing the same field across multiple content types.

Configure Solr Integration

1.     Go to Configuration and select Apache Solr Search (admin/config/search/apachesolr).

2.     Click on the Media tab.

3.     Select the field to use as the media file title.

4.     Select the media fields attached to nodes to include in the Solr index.

Rebuild the Solr Index

1.     Go to Configuration and select Apache Solr Search (admin/config/search/apachesolr).

2.     Click on the Search Index tab.

3.     Select either Queue content for reindexing or Delete the index and click Begin.

Future Plans

We are delighted to be co-maintainers of this module with shenzhuxi.

Given the time Achieve and Shenzhuxi would like to see 2.x version of the ApacheSolr Media module use the upcoming File Entity module (under development in conjunction with the Media module version 2). This will make the ApacheSolr Media module much more flexible because it can work with any file management system.

The group would also like to expand the functionality of the ApacheSolr Media module to be able to index the file content itself (e.g., the actual PDF) instead of just the file entity fields. That however is an entirely different challenge and may need to be done separately from this module. 

This is only v.1 and like every good Drupal module the real future and power of this module will come from the community.  We would love your feedback, input and contribution to the ApacheSolr Media module. The power of Drupal comes from our collaboration!

For more information on Achieve Internet please visit our Drupal.org Market page. http://drupal.org/node/1123842

Nov 14 2011
Nov 14

The Apachesolr module (7.x) allows us to build sites that use powerful search technology. Besides being blazingly fast, Apachesolr has another advantage: faceted search.

Faceted search allows our search results to be filtered by defined criteria like category, date, author, location, or anything else that can come out of a field. We call these criteria facets. With facets, you can narrow down your search more and more until you get to the desired results.

Sometimes, however, the standard functionality is not enough. You might want or need to customize the way that facets work. This is controlled by Facet API. Unfortunately there is not much documentation for Facet API, and the API can be difficult to understand.

This post will change that!

In this post, you will learn how to use FacetAPI to create powerful custom faceted searches in Drupal 7. I will take a look at the UI of the module as well as its internals, so that we can define our own widgets, filters, and dependencies using some cool ctools features.

Introduction

This is a site that we built recently. It uses facets from SearchAPI instead of Apachesolr, but the general principle is the same.

As we move forward, I plan to answer the following questions:

  • How can I set the sort order of facet links? How can I create my own sort order?
  • How can I show/hide some facets depending on custom conditions? For example, how can I show a facet only if there are more than 100 items in the search results
  • How can I exclude some of the links from the facets?
  • How can I change the facets from a list of links to form elements with autosubmit behavior?

I will start with showing the general settings that can be configured out of the box and then dive into the code to show how those settings can be extended.

Block settings options


Let's take a look at the contextual links of the facet block. We can see there are three special options:

  • Configure facet display
  • Configure facet dependencies
  • Configure facet filters

Let's take a closer look at each of these.

Facets UI

On the facet display configuration page we can choose the widget used to display the facet and its soft limit, the setting of the widget, which controls how many facets are displayed. We can also use different sort options to set the links to display in the order we prefer.

The next settings page is for Facet dependencies. Here we can choose different options that will make Drupal understand when to show or hide the facet block.

The third settings page is Facet filters. Here we can filter what facet items to show (or not show) in the block.

As you can see, there are plenty of options in the UI that we can use when we set up our facets. We can filter some options, change sort order, control visibility, and even change the facet widget to something completely different than a list of links.

Now let's take a look at the internals of Facet API and learn how we can define our own facet widgets, facet dependencies, and facet filters.

Facet API is built on the ctools plugins system, which is a very, very flexible instrument.

Build your own facet widget

Imagine we want to create a facet that will consist of a select list of options and a submit button. Better yet, let's hide the submit button and just autosubmit the form when we change the value by selecting an item from the list.

In order to define our own widget we need to implement the following hook:

<?php
/**
* Implements hook_facetapi_widgets()
*/
function example_facetapi_widgets() {
  return array(
   
'example_select' => array(
     
'handler' => array(
       
'label' => t('Select List'),
       
'class' => 'ExampleFacetapiWidgetSelect',
       
'query types' => array('term', 'date'),
      ),
    ),
  );
}
?>

With this we define a new widget called "example_select" which is bound to a class called "ExampleFacetapiWidgetSelect".

All our logic will be in this ExampleFacetapiWidgetSelect plugin class:

<?php
class ExampleFacetapiWidgetSelect extends FacetapiWidget {
  
/**
   * Renders the form.
   */
 
public function execute() {
   
$elements = &$this->build[$this->facet['field alias']];     $elements = drupal_get_form('example_facetapi_select', $elements);
  }
}
?>

Instead of rendering the links directly, we will load a form to show our select element.

Now let's see how to build the form:

<?php
/**
* Generate form for facet.
*/
function example_facetapi_select($form, &$form_state, $elements) {   // Build options from facet elements.
 
$options = array('' => t('- Select -'));
  foreach (
$elements as $element) {
    if (
$element['#active']) {
      continue;
    }
   
$options[serialize($element['#query'])] = $element['#markup'] . '(' . $element['#count'] . ')';
  }  
$form['select'] = array(
   
'#type' => 'select',
   
'#options' => $options,
   
'#attributes' => array('class' => array('ctools-auto-submit')),
   
'default_value' => '',
  );
 
$form['submit'] = array(
   
'#type' => 'submit',
   
'#value' => t('Filter'),
   
'#attributes' => array('class' => array('ctools-use-ajax', 'ctools-auto-submit-click')),
  );  
// Lets add autosubmit js functionality from ctools.
 
$form['#attached']['js'][] = drupal_get_path('module', 'ctools') . '/js/auto-submit.js';
 
// Add javascript that hides Filter button.
 
$form['#attached']['js'][] = drupal_get_path('module', 'example') . '/js/example-hide-submit.js';   $form['#attributes']['class'][] = 'example-select-facet';   return $form;
}
/**
* Submit handler for facet form.
*/
function example_facetapi_select_submit($form, &$form_state) {
 
$form_state['redirect'] = array($_GET['q'], array('query' => unserialize($form_state['values']['select'])));
}
?>

In this form, the value of each select box option is the url where the page should be redirected. So, on the submit handler we simply redirect the user to the proper page. We add autosubmit functionality via ctools' auto-submit javascript. Also, we add our own javascript example-hide-submit to hide the Filter button. This enables our facet to work even if javascript is disabled. In such a case, the user will just need to manually submit the form.

Each element that we retrieved from FacetAPI has the following properties:

  • #active - Determines if this facet link is active
  • #query - The url to the page that represents the query when this facet is active
  • #markup - The text of the facet
  • #count - The number of search results that will be returned when this facet is active.

Please note how easy it is to add the autosubmit functionality via ctools. It is just a matter of setting up the right attributes class on the form elements, where one is used as sender (.ctools-autosubmit) and the other one is the receiver (.ctools-autosubmit-click). Then we just need to add the javascript in order to get the autosubmit functionality working.

Please use minimum beta8 version of the FacetAPI module.

Sorting order

All options in facets are sorted. On the settings page above, we saw different kinds of sorts. We can also define our own type of sorting if none of the options on the settings page meet our needs. As an example, I will show you how to implement a random sort order.

To reach this goal we need to implement hook_facetapi_sort_info():

<?php
/**
* Implements hook_facetapi_sort_info().
*/
function example_facetapi_sort_info() {
 
$sorts = array();   $sorts['random'] = array(
   
'label' => t('Random'),
   
'callback' => 'example_facetapi_sort_random',
   
'description' => t('Random sorting.'),
   
'weight' => -50,
  );   return
$sorts;
}
/**
* Sort randomly.
*/
function example_facetapi_sort_random(array $a, array $b) {
  return
rand(-1, 1);
}
?>

The sort callback is passed to the uasort() function, so we need to return -1, 0, or 1. Note that you again get a facetapi element, which has the same properties as outlined above, so you could, for example, compare $a['#count'] and $b['#count'].

In order to see the result we just need to disable all the other sort order plugins and enable our own.

Filter

When facet items are generated they are passed on to filters. If we want to exclude some of the items, we should look at the filters settings. We can also, of course, implement our own filter.

As an example, we will create a filter where we can specify what items (by labels) we want to exclude on the settings page.

<?php
/**
* Implements hook_facetapi_filters().
*/
function example_facetapi_filters() {
  return array(
   
'exclude_items' => array(
     
'handler' => array(
       
'label' => t('Exclude specified items'),
       
'class' => 'ExampleFacetapiFilterExcludeItems',
       
'query types' => array('term', 'date'),
      ),
    ),
  );
}
?>

Like with the widgets, we define a filter plugin and bind it to the ExampleFacetapiFilterExcludeItems class.

Now lets take a look at the ExampleFacetapiFilterExcludeItems class:

<?php /**
* Plugin that filters active items.
*/
class ExampleFacetapiFilterExcludeItems extends FacetapiFilter {   /**
   * Filters facet items.
   */
 
public function execute(array $build) {
   
$exclude_string = $this->settings->settings['exclude'];
   
$exclude_array = explode(',', $exclude_string);
   
// Exclude item if its markup is one of excluded items.
   
$filtered_build = array();
    foreach (
$build as $key => $item) {
      if (
in_array($item['#markup'], $exclude_array)) {
        continue;
      }
     
$filtered_build[$key] = $item;
    }     return
$filtered_build;
  }  
/**
   * Adds settings to the filter form.
   */
 
public function settingsForm(&$form, &$form_state) {
   
$form['exclude'] = array(
     
'#title' => t('Exclude items'),
     
'#type' => 'textfield',
     
'#description' => t('Comma separated list of titles that should be excluded'),
     
'#default_value' => $this->settings->settings['exclude'],
    );
  }  
/**
   * Returns an array of default settings
   */
 
public function getDefaultSettings() {
    return array(
'exclude' => '');
  }
}
?>

In our example, we define settingsForm to hold information about what items we want to exclude. In the execute method we parse our settings value and remove the items we don't need.

Again please note how easy it is to enhance a plugin to expose settings in a form: All that is needed is to define the functions settingsForm and getDefaultSettings in the class.

Dependencies

In order to create our own dependencies we need to implement hook_facetapi_dependencies() and define our own class. This is very similar to the implementation of creating a custom filter, so I am not going to go into great detail here. The main idea of dependencies is that it allows you to show facet blocks based on a specific condition. The main difference between using dependencies and using context to control the visibility of these blocks is that facets whose dependencies are not matched are not even processed by FacetAPI.

For example, by default there is a Bundle dependency that will show a field facet only if we have selected the bundle that has the field attached. This is very handy, for example, in the situation when we have an electronics shop search. Facets related to monitor size should be shown only when the user is looking for monitors (when he has selected the bundle monitor in the bundle facet). We could create our own dependency to show some facet blocks only when a specific term of another facet is selected. There are many potential use cases here. For example you can think of a Facet that really is only interesting to users interested in content from a specific country. So you could process and show this facet only if this particular country is selected. A practical example would be showing the state field facet only when the country "United States" is selected as you know that for other countries filtering by the state field is not useful. Being able to tweak this yourself gives you endless possibilities!

Here is a shorted code excerpt from FacetAPI that can be used as a sample. It displays facet if the user has one of the selected roles. The main logic is in execute() method.

<?php
class FacetapiDependencyRole extends FacetapiDependency {   /**
   * Executes the dependency check.
   */
 
public function execute() {
    global
$user;
   
$roles = array_filter($this->settings['roles']);
    if (
$roles && !array_intersect_key($user->roles, $roles)) {
      return
FALSE;
    }
  }  
/**
   * Adds dependency settings to the form.
   */
 
public function settingsForm(&$form, &$form_state) {
   
$form[$this->id]['roles'] = array(
     
'#type' => 'checkboxes',
     
'#title' => t('Show block for specific roles'),
     
'#default_value' => $this->settings['roles'],
     
'#options' => array_map('check_plain', user_roles()),
     
'#description' => t('Show this facet only for the selected role(s). If you select no roles, the facet will be visible to all users.'),
    );
  }  
/**
   * Returns defaults for settings.
   */
 
public function getDefaultSettings() {
    return array(
     
'roles' => array(),
    );
  }
}
?>

Conclusion

As you can see, FacetAPI gives us the option to change anything we want about facets. We can change the display, alter the order, filter some facets out, and control facet blocks visibility and content based on dependencies we define.

I would like to thank the maintainers of this module, Chris Pliakas and Peter Wolanin, for their great work!

An example module is attached to this article. Thank you for reading, and if you have any further questions please let us know in the comments!

(Author: Yuriy Gerasimov, Co-Author: Fabian Franz, Editor: Stuart Broz)

AttachmentSize 1.99 KB
May 02 2011
May 02

Last year Telenet successfully migrated its knowledge base and business site to Drupal. As a member of the Ausy/DataFlow group I'm proud to announce that Telenet today launched its multisite search using Drupal.

We continued on the Apache Solr based solution we had already created for the knowledge base while Robert Douglass was hired to work on a Nutch/Solr implementation to crawl the (for now) non-Drupal based sites in the company's portfolio. This powerful combination results in a multilingual multisite search that features autocompletion, spell checking and file indexing.

Contributions were made to the following projects:

Sep 23 2010
Sep 23

A few posts back, I wrote about creating a list of articles that link to some other article, i.e. a list of "backlinks". That post back then was focused on Drupal 7.

I'm a bit shocked though, that nobody noticed or mentioned that this feature already exists in Drupal 6. I had no idea either. So this heads up here is to make things right.

When you have the core Search module enabled and have Views installed and enabled, you should get a default view called "backlinks".

drupal-6-backlinks.png

Sorry for my bad researching skills!

Sep 06 2010
Sep 06

For one of our clients, we are running a Drupal site with about a millions of nodes. Before launch, those nodes are imported from another database and then indexed into Apache SOLR. The total time to index all of these nodes in an empty SOLR instance is measured in days rather than hours or minutes.

A bit too long to do this import regularly. So me and my (XDebug) profiler delved into the Apache SOLR module code to look where we could scrape of a few hours/days of the execution time.

Seemed like in our case, there were 3 components responsible for a large share of the execution time. Let's have a look.

BTW. We are using the latest dev build of version 2 of the Apache SOLR module.

Tip 1: Not indexing $document->body

When indexing nodes, the SOLR module needs to construct an Apache_Solr_Document object for each node. It passes all fields and metadata of the node in that document. The heaviest part of constructing this document is the assembling of the $document->body field. The module uses the node_build_content and drupal_render($node->content) functions to generate the body of the node.

In our case, we didn't really use the body since we were indexing companies with fields like name, address, manager, ... So we decided to remove the code from apachesolr_node_to_document that calculates the body. Although this one gave us a major performance boost, it might not be applicable in your case. We could use this because we didn't need the body of a node.

Keep in mind also that in the body all other fields and metadata are assembled too (dependent on your search build mode configuration).

Tip 2: Add static caching to apachesolr_add_taxonomy_to_document

Another heavy thing that is going on while generating the Apache_Solr_Document object is fetching the taxonomy terms in apachesolr_add_taxonomy_to_document. For each term, the ancestors are calculated. In some cases you don't have a hierarchical vocabulary, so you could remove that code, but in case you have a hierarchical vocabulary, you could benefit a lot from static caching. You might have millions of nodes, but you probably have only a handful of terms (hundreds). So the ancestors of some term will be calculated multiple times.

Keep in mind though that you won't benefit too much from static caching if you're using batch processing for indexing with small batches, since the static cache is rebuilt for each batch step. So we wrote a Drush command to do the indexing. This way we're keeping the static cache for the full batch.

function drush_slimkopen_solr_index() {
  $cron_limit = variable_get('apachesolr_cron_limit', 50);

  while ($rows = apachesolr_get_nodes_to_index('apachesolr_search', $cron_limit)) {
    apachesolr_index_nodes($rows, 'apachesolr_search');
  }
}

Tip 3: Don't check excluded content types

The SOLR module has a nice feature that allows you to exclude certain content types from being indexed. Turns out the check for excluded content types is pretty expensive. This happens in the apachesolr_get_nodes_to_index('apachesolr_search', $limit) call where the apachesolr_search_node table is joined with the node table. For the initial import, we removed the check for excluded types (the join with node) and indexed all nodes. The excluded ones we removed after indexing.

This was possible in our case since the bulk of nodes (99.9% of them) needed to be indexed.

Conclusion

Drupal and its modules are developed to work in a lot of environments and situations. So next to the implementation of what they're designed to do, they also contain a lot of code that checks if a certain condition or context applies. But when you are using or deploying a module, you know what the context is. So you may be able to remove some code. Keep in mind though that tampering with core and module code is bad, but there are a few practices that can help here!

For those curious about what kind of performance gain you might have with these tricks: in our case it was about 50% but it highly depends site's implementation.

Aug 22 2010
Aug 22

A popular feature request for sites that deal with a lot of content, is to see for each page what other pages are linking back to it. This can be helpful when doing some SEO or cleaning up and rewriting old content.

In Drupal 7 this is easily done. It exists where you wouldn't immediately expect it though: the core search module. Kind of mimicking how search bots like Google's indexer works, Drupal 7's search module now takes the amount of nodes linking back to another node to calculate the score for some result.

To keep track of which nodes (and other) link to which other ones, the search indexer stores all links in a table called search_node_links. This table is only used internally by the search module though, but if you enable the Views module, you can enable a default view called "Backlinks".

The Backlinks view has 2 displays: a block and a page. The block you can put wherever you want, the page is added as an extra tab on each node next to "view" and "edit".

You can also choose to create your own views using the "Search: Links to" and "Search: Links from" argument and filter.

Let me finish this quick tip by mentioning that this feature works fine with url aliases too.

backlinks.png

Note: It appears this was already possible in Drupal 6. Check out the update.

Dec 10 2009
Dec 10

At the recently Drupal Camp Prague 09, I was introduced to ApacheSolr as a replacement to the standard Drupal search or the Google CSE.

On most sites the basic Google CSE setup is sufficient, however for some of the more serious "work" websites my colleague Nick and I got experimenting with the Drupal's implementation of ApacheSolr module.

Here is a quick and rough writeup on how it was implemented on work and my personal (janaksingh.com website).

Please feel free to share your experiences and tips.

Basic Installation

1) Install Java on CentOS if you havent already

2) Install ApacheSolr drupal module

cvs -z6 -d:pserver:anonymous:[email protected]:/cvs/drupal-contrib checkout -d apachesolr -r DRUPAL-6--2 contributions/modules/apachesolr/

3) Checkout the ApacheSolr PHP Lib inside the ApacheSolr module folder

svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient

4) Create a temp folder in your home directory

cd ~
mkdir temp
cd temp

5) Download ApacheSolr in the temp folder
SVN Method

svn checkout http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 apache-solr

TAR Method
Grab the tarball from http://www.apache.org/dyn/closer.cgi/lucene/solr/

wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/lucene/solr/1.4.0/apache-solr-1.4.0.zip

tar -xzf apache-solr-1.4.0.zip
mv apache-solr-1.4.0 apache-solr

I have moved my "apache-solr" folder to "/usr/local/share/" but I am not sure if this matters

6) Rename 2 default files in apache-solr/example/solr/conf/

sudo mv solrconfig.xml solrconfig.bak
sudo mv schema.xml schema.bak

7) Copy the schema.xml and solrconfig.xml from the drupal apachesolr module folder into apache-solr/example/solr/conf/

Multisite / Multicore Setup

8) If you want multisite seach (multicore), here is what I did:

  • duplicate the example folder to "sites"
  • create a folder for each new website
  • copy the conf folder from sites folder into each one of the website folders
  • create a solr.xml file in the "sites" folder and defined a core for each of the sites.. here is what my sole.xml file looks like:
    <solr persistent="false">
      <!--
      adminPath: RequestHandler path to manage cores.
        If 'null' (or absent), cores will not be manageable via request handler
      -->
      <cores adminPath="/admin/cores">
        <core name="website1_core" instanceDir="website1.com" />
      </cores>
    </solr>
    

9) Finally, in the Drupal ApacheSolr module, update the "Solr path" to the same core name you defined for the site in the sole.xml file. If you are using the example setup as outlined in the readme file that ships with the ApacheSolr Drupal module, you need not change anything as the default path the module ships with is correct. You only need to change this if you define your own cores.

Testing

If you are using the example site, test your installation as follows:

cd apache-solr/example
sudo java -jar start.jar

If you are using the multicore setup from step 8, test as follows

cd apache-solr/sites
sudo java -jar start.jar

Check the ApacheSolr admin page by visiting http://mydomain:8983/solr/admin/ or if you are using multicore http://mydomain:8983/solr/mycore_name/admin/

Auto start ApacheSolr on server reboot

You can follow the instructions on Want to start Apache solr automatically after a reboot? but depending on how you defined the cores above you will need to alter the path eg:

SOLR_DIR="/opt/apache-solr/example"

or
SOLR_DIR="/opt/apache-solr/sites"

Dont forget to change permissions on the sole bash file you created (etc/init.d/solr) and remember to install the script using chkconfig (all outlined in the guide above)

Security

By default, ApacheSolr does not ship with any kind of port protection, you are advised to secure your server ports by using iptables or a dedicated firewall (if you have one). For CentOS iptable guidelines click here

Further Reading

Related blog posts: 


Bookmark and Share
May 12 2008
May 12

For a project at work, we needed to be able to manage, and therefore search, large-scale taxonomies (10,000+ terms). Users needed to be able to search for term names, descriptions and synonyms, so I figured a module using the Drupal search API seemed to be the best bet for a solution. I dove in deep and came back up with Taxonomy Search 2.x.

I've just released version 2.0 of the module, and you can check it out at the Taxonomy Search project page. I'd love some feedback, as this is my first module that utilizes the search API, and there may be some rough edges. Please take a look at let me know what you think!

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web