Jun 02 2013
Jun 02

Last week, I described a technique to query and display nodes in all available translations. This worked well enough, but a performance-minded reader pointed out that the query generated by Views (that includes N self-joins for N enabled languages) would not scale to a large number of nodes.

My usual approach when implementing new ideas is to ensure the logic works first, and only handle optimization when needed. It's a strategy that has worked well for me in the past. So I set out to test this hypothesis, and to optimize the query if it was needed. Here's what happened:

The first obstacle was to generate a large set of nodes and their translations. Devel Generate, the Devel sub-module that generates Drupal objects for development purposes, does not support content translation at the time of writing. I submitted a D7 patch to the 2 years old feature request to achieve this. I tested it with 10K nodes, and it seems to work well. Your review is appreciated!

Having generated 10K nodes and their translations to Arabic and French (30K nodes in total), I cloned the Proverbs view from last time to query and display this content. The result was quite explicit: the view page never finished loading! Clearly, the Views-generated query was not scaling. And for good reason: 3 SQL JOINS of 30,000 records each is a performance black hole. Optimization was needed.

My goal for optimizing the query was to retain all the advantages that Views offers in terms of theming query results, integrating into Drupal pages, etc. - these are indispensable features when creating real-world applications. In short, I wanted to transparently override the Views-generated query. To do so, I needed to:

  • Remove the peformance-killing JOINs from the query
  • Perform an optimized query to find node translations
  • Re-insert the results from the optimized query into the Views results, to allow it to proceed with the display

The code I used follows. I will explain the important parts below.

Remove the peformance-killing JOINs from the query

The function demo_i18n_views_query_alter() removes from the Views query object all references to the SQL JOINs, which are called "relationships" in Views parlance. Views core invokes this hook just before converting the query object into an SQL statement. The resulting query that Views will execute looks like this:

SELECT node.nid AS nid, node.created AS node_created
FROM {node} node
WHERE (( (node.status = '1') AND (node.type IN  ('multilingual_node')) AND (node.tnid = node.nid OR node.tnid = 0) ))

Perform an optimized query to find node translations

The query as modified above will only return nodes that are translation sources. It's now up to me to query the node translations, by waiting for Views to execute the modified query, and then gathering the nids to find their translations (as stored in {node}.tnid). This is a simple query using the SQL IN operator. I call this hand-made query in the demo_i18n_views_post_execute() function, which is invoked by Views after it executes its own query.

Re-insert the results from the optimized query into the Views results

The challenge with the new query is that it returns one node translation per row, as opposed to the original query which returned all translations on the same row. In addition, the results need to be copied into the view::result object, with the right key names that Views expects. In order to find the right key names, I first displayed the results from the unmodified Views query and noted the result keys. With this information, I then proceeded to loop over the optimized query results, and find the corresponding entry in the Views result array that would receive them. This loop is also implemented in the demo_i18n_views_post_execute() function.

The results were impressive! The view page loaded in very acceptable time (ApacheBench reports a mean time of ~1350ms, against ~650ms in the case of a view with just 4 nodes), and Views happily themed the translated nodes as if it had queried them itself. You can see this code in action on my i18n demo site.

The approach of hand-crafting Views queries has been on my mind for a long time, and I'm glad I took the first step. So far, I am not sure that a generic module can be created out of this, mainly due to the necessity to transform the result set after the optimized query is run. In any case, I'll be applying this technique in my projects!

Dec 19 2009
Dec 19

There are several modules I have used during building sites. I am not going to get into details of the modules because you should visit the modules. But by far, the most important / useful modules for me have been:

1. Views ; You can think of views as output to your custom query (except you don't write the query). There are several ways to present the views and Calendar is one example. So this is a query builder which can execute queries at run time and present the data however you want.

2. : You can add custom fields to a node or define a new content type based on CCK, This is powerful functionality with no coding. You can order the fields according to your preference. And CCK will become very powerful once is introduced. As of now, Multigroup is in alpha, but I have been using it with no problems. It lets you create a composite or a compound field, consisting of different base CCK fields.

3. : The module renders all administrative menu items below 'administer' in a clean, attractive and purely CSS-based menu at the top of your website. This is a life saver if you are administering a site. Essentially it is a overlay of the entire admin functionality as a ribbon on the top of your web site.

4. : I have to give it to katrib for this module. This is great if you want spreadsheet functionality in your website. We did some pretty advanced things with the module like embedding it within a node as a tab and YUI integration. We also configured a java bridge to be able to upload excel sheets from desktop to the website. Google spread sheet integration is pretty straightforward. This is based on SocialText platform

5. Enables users to create and manage their own groups (like forums). This can bring great social aspects to your website. For some of the sites, we changed the module to represent a company and it was great.

6. : This adds a lot of umph to the site. It is Mainly used to showcase featured content at a prominent place on the frontpage of the site. Demos and tutorials are excellent.

7. : Essential for developers and themers It can generate SQL query summaries, create data for your test site, show the theme information on your site. Don't develop without this module.

8. Pathauto : It automatically generates path aliases for various kinds of content (nodes, categories, users) without requiring the user to manually specify the path alias. Helps get rid of default / ugly paths (node/3 ... )

9. jQuery : jQuery depends on jQuery UI and jQuery update in drupal 6. And here is a good overview of jQuery.This architectural diagram might also be a useful reference. jQuery modules provides all the nice functionality jQuery library.

10. : This module provides several useful extensions to the login system of drupal. What I found really useful was to allow email based logins, email confirmation during registration and auto logout. Happy drupaling :)

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web