Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
Apr 25 2009
Apr 25

This is part five of my series, The DX Files: Improving Drupal Developer Experience. Today I am simply posting, verbatim, a message that a potential new Drupal developer sent me. The point here is just to understand how Drupal's DX influences the way it is perceived and the kind of people that want to work with it.

Hi Barry,

I am a new Drupal user/aspiring developer. I stumbled across your presentation "Developer Experience: Because Coders are People Too" on the DrupalCon 2008 Szeged site just last week. I watched the video and just nodded my head in agreement the whole time. Thank you! This validated my own opinions ­ there is something wrong (or at least unusual) about Drupal; it's not just me not getting it.

Anonymous arrays (the poor man's OOP) and un-typed arguments seem like a terrible idea to me. I almost react viscerally against them. I can understand why some have rejected Drupal. Your friend who will not work on Drupal again ­ I am almost there myself.

A little about my background: I have a Berkeley CS degree, worked for HP for 5+ years doing software development and now work at a small company that I cofounded a couple of years ago. I have lots of OOP experience. I have worked on a few 50,000 line+ projects in C++ and Java over the years. To do these medium to largish size projects, it has been my experience that one must do "defensive" programming. Part of this is knowing that the compiler is your friend. One should write code so that mistakes can be caught by the compiler. In Drupal, there are string literals everywhere and anonymous arrays ­ basically ticking time bombs waiting to blow off limbs. I find this frightening. I would never do this in C++ or Java.

I am just starting with Drupal. Part of why I liked your presentation so much is now I realize these are issues ­ something that I never saw acknowledged elsewhere ­ so I can hopefully move on and actually start making progress with Drupal without wondering why I wasn't getting it.

I am reminded of Winston Churchill's famous quotation: "It has been said that democracy is the worst form of government except all the others that have been tried." To me, Drupal is "The worst CMS except all the others that have been tried."

Contributed modules like Views, CCK, Voting API, etc and themes make Drupal more than a CMS. It is a terrific platform. Much easier/better than rolling your own. Java doesn't have a viable competitor, that I am aware of.

So basically I want to know how can I learn more and how can I help? I want to make Drupal sites, because of its huge wealth of contributed modules and the ability to extend the system. I am extremely frustrated by many parts of Drupal. I basically agreed with everything you put in your series of blog posts about DX.

What, as a noob, can I do?

I would like to contribute some how.

Thanks,
Nick

Nov 05 2008
Nov 05

This is part four of my series, The DX Files: Improving Drupal Developer Experience. I started this series with fairly simplistic suggestions. They proved not very popular and some of them I agree were of questionable benefit due to PHP’s nature. I was pleased to discover, however, that they nevertheless had quite an impact on raising the visibility of “Developer Experience” within the Drupal community. I am therefore ready to move on to some of the more complex DX issues in Drupal.

The most important DX change Drupal needs to make is switching from a form-driven model to an API-driven model. There are many parts to such a change. Today’s topic: static caching.

A simple example

Much static caching in Drupal core and contrib is based on the assumption of a “normal” page request in which only a fairly controlled set of operations can occur during a single execution. Consider this fairly typical pattern (from D7 HEAD as of today):

<?php
function taxonomy_get_term_data($tid, $reset = FALSE) {
  static
$terms = array();
  if (!isset(
$terms[$tid]) || $reset) {
   
$terms[$tid] = db_query('SELECT * FROM {term_data} WHERE tid = :tid', array(':tid' => $tid))->fetchObject();
  }
  return
$terms[$tid];
}
?>

This works great so long as no term ever changes during the course of execution. As soon as some code calls other taxonomy “API” functions, the interface can fail. Suppose we have a term whose tid is SOME_TID and whose name is “my name”. Consider:

<?php
 
// load SOME_TID and print its name
 
$term = taxonomy_get_term_data(SOME_TID);
  print
$term->name . "\n"; // change the name and save the term
 
$term->name = 'some new name';
 
taxonomy_save_term($term); // load the term again and print the new (?) name
 
$term = taxonomy_get_term_data(SOME_TID);
  print
$term->name . "\n";
?>

This program outputs “my name” twice, not “my name” and then “some new name.” Bottom line: The taxonomy_save_term() has no effect during a single page request!

It is horrible DX to have “API” functions that behave this way. Sadly, not just the taxonomy module has this problem; the pattern is widespread within Drupal.

The $reset non-solution

The typical solution within Drupal is an optional “clear the cache” argument like $reset for taxonomy_get_term_data(). When $reset is TRUE, the static cache is erased. This would be fine if it were always used by all the taxonomy_*() functions to maintain the API’s specification (in that case, the “taxonomy class” would simply be doing internal housekeeping on “private member variables”), but it isn’t. Callers using the taxonomy “API” need to know to pass the $reset argument themselves.

The problem gets even worse when you consider all the myriad contrib modules that use the same technique. It seems everybody has their own static cache and you need to call the right magic “clear the cache” function to get anything done. The API docs rarely spell this out and it is a PITA.

A real-world example of where this can bite badly: Try writing hook_update_N() functions that create or change CCK types, Views, taxonomy terms, etc. Disaster! Why should I have to figure out to call views_invalidate_cache(), content_clear_type_cache(), etc. etc.?

Static caching-induced bugs

The technique can also lead to bugs/security vulnerabilities when the static cache, conceived to operate in a normal page request, is used in a longer-running program.

Case in point: drupal_validate_form(). Here’s an excerpt:

<?php
function drupal_validate_form($form_id, $form, &$form_state) {
  static
$validated_forms = array();

  if (isset(

$validated_forms[$form_id])) {
    return;
  }
_form_validate($form, $form_id);
 
$validated_forms[$form_id] = TRUE;
}
?>

drupal_validate_form() caches the fact that a form successfully validated. This means that you cannot call the same form twice via drupal_execute() in one execution safely. The second and all successive calls will not call the form’s validate function. Instead, they will always invoke the submit handler.

Practice Statics

Sometimes the operation being cached is not even that expensive or called that often. It’s just being cached “for practice.” Here’s an excerpt from node.module:

<?php
function _node_revision_access($node, $op = 'view') {
  static
$access = array();
  if (!isset(
$access[$node->vid])) {
   
// compute and cache the access for $node and $op here
 
}
  return
$access[$node->vid];
}
?>

This is another example of static caching introducing a potential security hole. The $access static cache is based on $node->vid but not on $op. Suppose an attacker has permission to view a node but not delete it. If the attacker can cause _node_revision_access($node, 'view') to be invoked before _node_revision_access($node, 'delete), he can perform an unauthorized action.

This would be an important security hole in Drupal except it turns out that _node_revision_access() is never called more than once in a single page request. But… wait a minute! If the function is only ever called once per page request, why does it use a static cache at all? It shouldn’t.

This is just ridiculous. The code is unnecessarily complex in order to optimize a code path whose performance was obviously never measured and as a result introduces what would be a security hole but only if the static cache was ever actually used!

Solving the problem

So, what should we do? My suggestions:

  1. Eliminate unnecessary static caching completely. Unless and until you have empirical data proving that a particular static cache makes an important improvement in real-world usage scenarios, just don’t use one. If you already have a static cache without such data, remove it. The drupal_validate_form() and _node_revision_access() examples fall into this category.

  2. Maintain API integrity. If you must implement a static cache inside an API function, fine, but make sure that the API maintains its documented behavior within a page request. This means you have to make sure to update every “read” static cache in your code every time a “write” function is performed. Do NOT leave this to your callers. In the taxnomony example above, taxonomy_save_term() should invalidate or update the taxonomy_get_term_data() cache entry for the affected term.

http://drupal.org/node/254491 proposes a centralized static cache API for Drupal. I haven’t reviewed it recently, but it will probably help. However, even if it works really well, remember that not doing static caching at all will lead to simpler, less error-prone code. If you do not have a solid reason for using a static cache, just skip it until you do.

Aug 12 2008
Aug 12

This is part three of my series, The DX Files: Improving Drupal Developer Experience. This time, I’m suggesting changing some of Drupal’s most basic data structures and APIs by replacing anonymous arrays with well-defined data structures. I fully expect lots of disagreement.

Many of Drupal’s APIs (Form API, Schema API, etc.) use PHP arrays to represent complex structured data. For example, here is a Form API data structure:

<?php
$form
['author'] = array(
 
'#type' => 'fieldset',
 
'#access' => user_access('administer nodes'),
 
'#title' => t('Authoring information'),
 
'#collapsible' => TRUE,
 
'#collapsed' => TRUE,
 
'#weight' => 20,
);
$form['author']['name'] = array(
 
'#type' => 'textfield',
 
'#title' => t('Authored by'),
 
'#maxlength' => 60,
 
'#autocomplete_path' => 'user/autocomplete',
 
'#default_value' => $node->name ? $node->name : '',
 
'#weight' => -1,
 
'#description' => t('Leave blank for anonymous.');
);
?>

Some of the downsides to this representation include:

  • Developer IDEs cannot provide auto-completion or a similar form of assistance while the code is being written.
  • Invalid form properties (those that begin with “#”) cannot be identified at compile- or run-time.
  • It is awkward to associate default values or other automatic behaviors with array structures.
  • Functions that operate on specific kinds of form elements, such as textfield_validate(), are not assured they are being passed an “array of the right type.”

An alternative representation uses typed data structures, specifically a PHP class but without any methods (basically, what C calls a struct). For example:

<?php
$form
= new Form();$form->elements['author'] = $author = new FieldsetElement();
$author->access = user_access('administer nodes');
$author->title = t('Authoring information'),
$author->collapsible = TRUE;
$author->collapsed = TRUE;
$author->weight = 20;$author->elements['name'] = $name = new TextfieldElement();
$name->title = t('Authored by');
$name->maxlength = 60;
$name->autocomplete_path = 'user/autocomplete';
$name->default_value = $node->name ? $node->name : '';
$name->weight = -1;
$name->description = t('Leave blank for anonymous.');
?>

The second version of the code is almost identical to the first except for the change in representation; it is no harder to write and the conversion can even mostly be handled by search-and-replace. However, the object representation addresses (or can address) all the problems with the array representation and provide a variety of other benefits.

As an example, class representation solves the problem of functions not being sure about the kind of data they are passed. In the file that defines text field elements, we might have code like:

<?php
class TextfieldElement extends FormElement {
 
// Maximum length; NULL means no limit;
 
public $maxlength = NULL; // AJAX path for auto-completition; NULL means no auto-complete.
 
public $autocomplete_path = NULL;
}

function

textfield_validate(TextfieldElement $element) {
 
// $element is guaranteed to be a TextfieldElement
}
?>

Now, textfield_validate() is guaranteed to be passed a TextfieldElement.

Note: Yes, we could also change functions like textfield_validate() into methods of class TextfieldElement. Gotta start somewhere, though. Baby steps!

Jul 02 2008
Jul 02

This is part two of my series, The DX Files: Improving Drupal Developer Experience.

Many Drupal APIs accept a boolean argument (TRUE or FALSE) to determine some behavior. I believe that practice should be banned in all but exceptional cases, instead using a defined constant with a descriptive name.

Here is a perfect example from Drupal core:

<?php
    $output
= node_view($node, FALSE, TRUE);
?>

Now, quick! Who can tell me what passing FALSE as the second argument and TRUE as the third argument means? Unless you are a Drupal guru, you almost certainly have no idea. Compare the previous line of code with this one:

<?php
   $output
= node_view($node, NODE_FULL_VIEW, NODE_IS_PAGE_CONTENT);
?>

Now it is much more clear what is going on. We're displaying the full view of a node and it is the page's primary content. Perhaps you do not know exactly what those constants mean ("A 'full view' as compared to what?", you might ask) but you are certainly better off than if you just see "TRUE" or "FALSE". If you later encountered

<?php
   $output
= node_view($node, NODE_TEASER_VIEW, NODE_IS_NOT_PAGE_CONTENT);
?>

everything would become pretty clear(*).

The end result of this change would be an API that is easier to learn and code that is easier to read, understand, and maintain.

* Disclaimer: I'll grant that it is far from obvious exactly what NODE_IS_PAGE_CONTENT actually causes to happen and in fact in a recent query on #drupal, several senior Drupal developers (myself included) couldn't remember, but it is still way better than "TRUE". This is an unrelated issue with node_view() that ought to be improved.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web