Feeds

Author

Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
Dec 17 2015
Dec 17

Introduction

Drupal provides a lot of functionality to build community driven websites. For the award winning platform ModeMuze, we have developed a platform to expose the different fashion collections of different Dutch museums. Besides exposing the different collection items, one of the goals is to engage the Dutch fashion community and enrich the metadata of the collection.

We want to make it easy for people to join the community. Captcha was not really an option. We enabled the possibility to provide anonymous comments, optionally creating an account through the comment registration module. The standard registration form was enabled as well. Once they are registered, the users are able to create theme related expositions of collection items and help out with the tagging of these items.

It would be great if all users are enthousiastic people who create beautiful content, but unfortunately this is not always the case.Every community website is going to be targeted by malicious users who create spam and/or try to hack their way into the site one way or the other. This is why Drupal has a lot of options to secure websites and fight spam.

For this particular project, we found that the setup described below worked really well in stopping the creation of spam users, content and comments.

Honeypot

The Honeypot module is the most basic form of protection. We basically add this to all sites we build. This module adds the honeypot method to the forms (it is possible to configure which forms you want to protect) and a timestamp. In a nutshell, when a user submits a form too fast, or fills in a hidden field that shouldn’t contain a value, the modules stop the form submission from completion.

The Comment verification module is used to add an extra check for comment by anonymous users. When an anonymous user adds a comment, they need to verify their comment through an e-mail link.

Spambot

The Spambot module protects the user registration form from spammers and spambots by verifying registration attempts against the Stop Forum Spam online database. It also adds some useful features to help deal with spam accounts. The module allows up to 20.000 checks per day. In the end, this module helped the most. It is also possible to delay the request for malicious users, which helped to bring the number of stopped spam accounts from 10 per minute to about 3 per minute.

Userone

The Userone module main purpose is to protect the user with uid 1. An important and special user in a Drupal installation. The module also has an important extra feature which helped a lot in stopping hackers. It can automatically block IP addresses when they have a certain amount of failed logins.

Cloudflare

The Cloudflare module provides beter Drupal integration with the online Cloudflare service. CloudFlare is a FREE reverse proxy, firewall, and global content delivery network. It has a ton of features. Besides improving performance by caching your pages for anonymous users, it can also provide SSL options (even in the free version). Since your domain points to the reverse proxy of Cloudflare, hackers will not find out the IP of your server. This will make it harder to attack your site. It also provides options to serve a captcha to users when it detects malicious behaviour. The performance is the most important feature, but all the bonus options are really nice to have.

Mollom

Last but not least it is good to mention the Mollom module that provides integration with the external Mollom service. This service can check user input for possible spam, and is very effective in stopping malicious users. For this project we made the choice not to use Mollom. The client did not feel comfortable about an external service checking their content. That is something to seriously consider when using external services like Mollom.

Jun 28 2015
Jun 28

Introduction

At Finalist we use the Search API and the Search API Solr modules to provide the search functionality for most of our websites. With a little bit of configuration you can get a great search experience that works perfectly for a basic website. However, sometimes customers want more then a standard implementation. In this post I’ll explain more about some of the improvements we make and how these work. The following topics will be covered in a series of blogs:

  • Stemming
  • Partial search
  • Better search excerpts
  • Custom boosting

Custom Boosting

The Search API Solr module gives the user the ability to add a boost to different fields to help Apache Solr to determine the relevance for each search result. The relevance can be used to order your search results and help the user find the most important search results. One important thing is missing from the boosting options in the Search API Solr module. It does not allow users to add a boost for different values within fields. This is what hook_search_api_solr_query_alter() can be used for.

Solr has the options to add values to the bq parameter in the search results. This parameter can be used to boost specific field and/or values. You can read more about this parameter on the Apache Solr wiki.

Example hook_search_api_solr_query_alter() Below you find an example of hook_search_api_solr_query_alter() to easily implement this yourself. This example allow you to add an extra boost to the search results for specific node types. A similar approach can be used to boost results based on taxonomy terms etc.

function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
  // Boost news and blog nodes in Solr results.
  $call_args['params']['bq'][] = '(ss_type:”news”^4 OR ss_type:”blog”^2)';
}

Boosting exact matches

As explained in the previous chapter, Solr allows boosting for custom fields or conditions. While you might want to find more results based on stemming, you probably want the results matching the exact search phrase to appear higher in the search results.

FieldType in schema.xml The basic fieldType text in the schema.xml file has some filters to support stemming etc. For exact search boosting this could be a problem. That’s why it is probably a good idea to make a separate fieldType with better support for exact matches.

<!-- add textExact field to boost exact matched -->
<fieldType name="textExact" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LengthFilterFactory" min="2" max="100" />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LengthFilterFactory" min="2" max="100" />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="multiterm">
       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LengthFilterFactory" min="2" max="100" />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
</fieldType>

After creating this fieldType, we need to apply this to all fields where we want exact matching. For this example we will add the textExact fieldType to the title field. We do this by making a copy of the original.

<field name="title" type="text" stored="true" indexed="true"/>
<!-- add titleExact field to boost exact matched -->
<field name="titleExact" type="textExact" indexed="true" stored="true" />
<copyField source="title" dest="titleExact"/>

Example hook_search_api_solr_query_alter() After changing the schema.xml file, the search result should not have changed yet. This is because the Solr query doesn’t use our new field yet. To boost the exact search field, we can implement hook_search_api_solr_query_alter() and make sure our new field is used.

We need to add our new titleExact field to the query field throught the qf param. We can add a boost for matches in this field through the pf param. This boosts the result based on every exact word match in the title, but doesn’t boost entire phrase matches yet. To give an extra boost to exact phrases, we can fetch the search keyword in our query and use the bq param to give a bug boost to titleExact field that match the entire keyword phrase.

function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
        // Boost exact title matched.
        $keys = $query->getKeys();
        unset($keys['#conjunction']);
        $call_args['params']['qf'][] = 'titleExact^5';
        $call_args['params']['pf'][] = "titleExact^5";
        $call_args['params']['bq'][] = 'titleExact:"' . implode(' ', $keys) . '"^10';
}
Jun 21 2015
Jun 21

Introduction

At Finalist we use the Search API and the Search API Solr modules to provide the search functionality for most of our websites. With a little bit of configuration you can get a great search experience that works perfectly for a basic website. However, sometimes customers want more then a standard implementation. In this post I’ll explain more about some of the improvements we make and how these work. The following topics will be covered in a series of blogs:

  • Stemming
  • Partial search
  • Better search excerpts
  • Custom boosting

Better search excerpts

The Search API Solr module allows you to configure your servers to provide excerpts for each search result. On the server configuration page you need to check ‘Return an excerpt for all results’ in the advanced section to make it work (in the latest version of the module you also need to check ‘Highlighting’ processor on the ‘Filters’ tab of your search index!). After this you can configure your search views to return an excerpt for your search results.

We’ve had a lot of questions about these excerpts. By default there are several issues which can be easily solved.

Words of 3 characters (or less) are ignored when returning excerpts

The spell field in the schema.xml file is used to store a long string of all content of a specific document. The spell field is also used by the Search API Solr module to get excerpts for each search result. The following section in the schema.xml file contains the spell field definition:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
       <tokenizer class="solr.StandardTokenizerFactory" />
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
       <filter class="solr.LengthFilterFactory" min="3" max="25" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
   </analyzer>
</fieldType>

The LengthFilterFactory limits the length of each word that can be used in the excerpts. Since the min length is 3, searching for the keywords ‘Apache Solr API’ will not highlight the word ‘API’ in all excerpts. By changing the min and max values you can make sure that all keywords in the search are properly highlighted.

When stemming is used, the excerpt for stemmed results is not returned

Stemmed words as described in a previous blog post can return search results, but by default when a user searches for the word ‘cars’, the word ‘car’ will not be highlighted in the search result excerpts. To add this, we need something like to SnowballPorterFilterFactory to provide stemming in our excerpts. By adding the line below to the section for the fieldType textSpell, stemmed words will also be highlighted:

<filter class="solr.PorterStemFilterFactory"/>

The excerpts are not very clear, random pieces of text are shown

Apache Solr provides an extended list of highlighting parameters to help creating better excerpts. Drupal has some pretty good defaults for this. There are several parameters that we change to make it a little better for the user. There parameters can be changed in hook_search_api_solr_query_alter() which allows you to change all the URL parameters right before the query is sent to Solr. We like to change the following parameters.

hl.snippets The snippets parameter is used to define how much different highlight snippets Apache Solr should return based on the keywords in your search. All excerpts are concatenated and separated by a couple of dots by Drupal. Since it looks like a full sentence, users get confused when there are too many snippets.

hl.fragsize The fragsize parameter is used to define how many characters each of the highlight snippets should be. Making these too short will give very strange excerpts, but making these too long will also make the search page harder to scan for proper results. Especially when you also return multiple highlighting snippets through the hl.snippets parameter.

hl.mergeContiguous The mergeContiguous parameter determines if Solr should combine multiple highlighting results as a single excerpt. We’ve noticed it is harder to read the excerpts when this parameter is true.

Example hook_search_api_solr_query_alter() Below you find an example of hook_search_api_solr_query_alter() to easily implement this yourself.

function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
  // Change hl settings for better excerpts.
  $call_args['params']['hl.snippets'] = '2';
  $call_args['params']['hl.fragsize'] = '100';
  $call_args['params']['hl.mergeContiguous'] = 'false';
}
Jun 14 2015
Jun 14

Introduction

At Finalist we use the Search API and the Search API Solr modules to provide the search functionality for most of our websites. With a little bit of configuration you can get a great search experience that works perfectly for a basic website. However, sometimes customers want more then a standard implementation. In this post I’ll explain more about some of the improvements we make and how these work. The following topics will be covered in a series of blogs:

  • Stemming
  • Partial search
  • Better search excerpts
  • Custom boosting

Partial search is supported by Apache Solr, but is not activated by default. Partial search can be added through the so called NGramFilterFactory. This could look something like this:

<filter class="solr.NGramFilterFactory" mingramsize="3" maxgramsize="25"/>

The NGramFilterFactory uses a min/max ngramsize attribute to define how big each ngram can be. Implementing partial search can have a great impact on the performance, so please test this properly with different ngram sizes. The quote below (read more in the original blog) explains how the performance impact works:

“There is a high price to be paid for n-gramming. Recall that in the earlier example, Tonight was split into 15 substring terms, whereas typical analysis would probably leave only one. This translates to greater index sizes, and thus a longer time to index. Note the ten-fold increase in indexing time for the artist name, and a five-fold increase in disk space. Remember that this is just one field!”

The best place to add this to the schema.xml file for Apache Solr is to the following section:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

Basically add this in the same places the SnowballPorterFilterFactory is already added.

Jun 07 2015
Jun 07

Introduction

At Finalist we use the Search API and the Search API Solr modules to provide the search functionality for most of our websites. With a little bit of configuration you can get a great search experience that works perfectly for a basic website. However, sometimes customers want more then a standard implementation. In this post I’ll explain more about some of the improvements we make and how these work. The following topics will be covered in a series of blogs:

  • Stemming
  • Partial search
  • Better search excerpts
  • Custom boosting

Stemming

As Wikipedia puts it: “Stemming is the term used in linguistic morphology and information retrieval to describe the process for reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form.”

Basically stemming will help your users find what you are looking for. Search for Cars will also return search results for the word car. Searching for working will also return results for work or worked etc.

Apache Solr uses the so called SnowballPorterFilterFactory to add stemming. In your schema.xml file for Solr you will probably find something like this:

<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

As you can see, the stemming algoritm needs a language to make sure the stemming will be accurate. For example, the root form for English words is different than the root form for Dutch words. Most languages are supported. You can find a list of languages in the documentation.

Dutch stemming For the Dutch language (which a lot of our customers use), there are 2 different supported languages: Dutch and Kp (Kraaij-Pohlmann). We’ve found the Kp stemmer provides much better results than the Dutch stemmer.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web