
Upgrade Your Drupal Skills
We trained 1,000+ Drupal Developers over the last decade.
See Advanced Courses NAH, I know EnoughSolr Streaming Expressions
Solr 5.1 introduced a revolutionary Streaming API. With Solr 5.2, you get Streaming Expressions on top of it. Ever wondered on how to run nested queries in SOLR or running parallel computing capabilities, this could be the answer.
Streaming Expressions provide a simple query language for SolrCloud that merges search with parallel computing. Under the covers Streaming Expressions are backed by a java Streaming API that provides a fast map/reduce implementation for SolrCloud. Streaming Expressions are composed of functions. All functions behave like Streams, which means that they don't hold all the data in memory at once. Read more about the basics here https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
Setup:
Assuming a debian based system, say Ubuntu 12.04 or 14.04. If you have not installed Solr 5.2, go grap latest codebase (For eg http://apache.mirror1.spango.com/lucene/solr/5.2.1/), extract it.
Setup Solr in cloud mode.
Cloud mode lets you create collection and nodes. See https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud for more details. bin/solr -e cloud Enter the port and other details.
To start a single node, use,
bin/solr start -cloud -s example/cloud/node1/solr -p 8983
Streaming API:
Now comes the interesting part. We have the following streaming API functions,
- Search
- Merge
- Unique
- Group
- Top
- Parallel
I am going to write about Search, Merge and Unique. Let us assume we have two fields, called id and city.
Search:
Search is the basic streaming method that process the single expression and returns the data.
curl --data-urlencode 'stream=search(gettingstarted,q="*:*",fl="id, city", fq="city:San Pedro",sort="id asc")' http://localhost:8983/solr/gettingstarted/stream
'gettingstarted' is the collection name. We use fl and fq parameters here.
Merge:
Merges two Streaming Expressions and maintains the ordering of the underlying streams.
curl --data-urlencode 'stream=merge(search(gettingstarted,q=":",fl="id, city", fq="city:San Pedro",sort="id asc", rows=5), search(gettingstarted, q=":",fl="id, city", fq="city:Stockbridge",sort="id asc", rows=10), on="id asc" )' http://localhost:8983/solr/gettingstarted/stream
Here, we have two expressions that are merged into a single one. Also note, we have 'rows' attribute inside each expression that limites records for each individual expression separately. Merge by default supports only two expresssion, if you want to extend it to support mulitple expressions, you can nest the merge methods.
For eg,
curl --data-urlencode 'stream=merge(search(gettingstarted,q=":",fl="id, city", fq="city:San Pedro",sort="id asc"), merge(search(gettingstarted,q=":",fl="id, city", fq="city:Stockbridge",sort="id asc", rows=5),search(gettingstarted,q=":",fl="id, city", fq="city:Stockbridge",sort="id asc", rows=5),on="id asc"), on="id asc")' http://localhost:8983/solr/gettingstarted/stream
Unique:
Wraps a Streaming Expression and emits a unique stream of Tuples based on the over parameter.
curl --data-urlencode 'stream=unique(merge(search(gettingstarted,q=":",fl="id, city", fq="city:San Pedro",sort="id asc"), search(gettingstarted, q=":",fl="id, city", fq="city:Stockbridge",sort="id asc"), on="id asc"), over="id asc")' http://localhost:8983/solr/gettingstarted/stream
See that, I have used merge method inside unique, this way you can do a lot of things by combining the methods.
About Drupal Sun
Drupal Sun is an Evolving Web project. It allows you to:
- Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
- Facet based on tags, author, or feed
- Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
- View the entire article text inline, or in the context of the site where it was created
See the blog post at Evolving Web