Oct 21 2010
Oct 21

Version Control API is central to Drupal's migration from CVS to git. It's also the single thing that's taken up the most time in the work we've done to date, and there's still a fair bit left to do. But we're now at a point where we need to step back and take a high-level look at the direction it'll finally take, so I thought I'd use where we are as an opportunity to explain the goals and architecture of the module, both historically and looking to the future. Apologies in advance for any of the history I get wrong - I'm sure I'll do it, so please feel free to correct me.

In The Beginning

Version Control API was originally written as a 2007 Google Summer of Code project by Jakob Petsovits (aka jpetso). From the outset, VCAPI was intended to replace Project*'s tight coupling with CVS (via the cvslog module) so that Drupal could get off CVS and on to a different version control system. VCAPI tried to build a system & datastructure similar enough to cvslog that moving over wouldn't be too painful, but at the same time was VCS-agnostic. We could decide later which VCS would fill the gap. (Technically, it would even have been possible for different projects to use a different VCS - though we ultimately decided against that because of the added social and technical complexity.)

Given that VCAPI was intended from the beginning to replace cvslog, it's hardly surprising that they both do essentially the same thing: store representations of VCS repository data in Drupal's database, such that that data is readily accessible for direct use by Drupal. They also map Drupal's users to user data in repositories, thereby allowing for the management of repository ACLs directly in Drupal. (cvslog also integrates directly with Project*, while VCAPI opted to separate that into versioncontrol_project). They then provide output that any drupal.org user would be familiar with - the project maintainers block, the commit activity information in users' profiles, the commit stream, etc. Whereas cvslog was only concerned with integrating with CVS, VCAPI attempted to solve these problems (particularly storing repository data) in an abstracted fashion such that the data from any source control system could be adequately represented in a unified set of Drupal database tables. VCAPI would provide the datastructure, helper functions, hooks, etc., and then "backend" modules (such as the git backend) would implement that API in order to provide integration with a particular source control system.

A quick aside - any good engineer will see "storing representations of VCS repository data in Drupal's database" and trip a mental red flag. It's data duplication, which raises potentially knotty synchronization problems. So let me head that one off: extracting the data was especially necessary with CVS, as it was _far_ too slow and unscalable to make system calls directly against the repository in order to fulfill standard browser requests. And while git is MUCH faster than CVS, the data abstraction layer is still necessary. System calls are slow, and there's disk IO to think about; it's worth trying to avoid tripping those during normal web traffic. More importantly, generating an aggregate picture of versioncontrol-related activity within a given Drupal system, particularly one that has a lot of complex vcs/drupal user mapping and/or a lot of repositories, really requires a single, consistent datastore. Stitching together db- and repo-sourced data on the fly gets infeasible very quickly. Finally, putting the data into a database makes it possible for us to punt on caching, since Views/Drupalistas are accustomed to caching database queries/output.

Anyway, with all this in mind, jpetso made a herculean effort in writing the original 1.x branch of VCAPI. He came up with the original abstracted datastructures and general methodologies that allowed us to replicate the functionality of cvslog in an API that could be reimplemented by different VCSes. More about that history can be seen in g.d.o posts. And at its core, the system worked.

Unfortunately, there were also aspects of the system that were awkward and overengineered. Much of the original API was actually just a querybuilder; many of the abstracted concepts had become so abstract as to be unintuitive to new developers (e.g., there were no "branches" or "tags" in VCAPI - just the meta-concept of "labels"). The underlying problem, though, was an architectural predilection towards an 'API' that did backflips to abstract and accommodate all possible backend behaviors, then own all the UIs, rather than providing crucial shared functionality and readily overridable UIs that backends could extend as needed. You can't work with, let alone refactor, VCAPI without running into this last problem. The module was suffering from an identity crisis - is it an API for the backends? Or an API for third-party systems, like say Project*, which want to utilize the repository tracking features of VCAPI? The crisis was also evident in the querybuilder: the same system was used for building aggregate listings as for retrieving individual items, and optimized for neither.

Enter: OO

jpetso needed to start moving on to other things by 2008, and when he offered the project up for maintainership, I volunteered. After porting to Drupal 6, discussions began about how well-suited VCAPI & backends would be to object orientation. In particular, it could help to make the API less overbearing and release more control into the backends. And for GSoC 2009, marvil07 made exactly that his goal: porting VCAPI over to OO.

Note - there was other work going on throughout this time period by a variety of people, GSoC and otherwise. I do NOT mean to slight any of that work - it's just that those changes were less central to the evolution of the API itself, and therefore tangential to the focus here.

Prior to marvil07's work, VCAPI was an exemplary instance of Drupal's love for massive arrays. They were used to capture all the data being stored in the database, to send instructions to the querybuilders, as return values for all the various informational hooks implemented by backends...and just about everything else. marvil07's refactor revealed some of the real 'things' VCAPI deals with, in the form of discrete classes:

  • VersioncontrolRepository - Represents a 'repository' somewhere; at the bare minimum, this includes information like VCS backend, path to the repository root, and any additional information specified by the backend.
  • VersioncontrolItem - Represents a known versioned item - that is, a file or a directory - in a repository.
  • VersioncontrolBranch - Represents a known branch in a repository.
  • VersioncontrolTag - Represents a known tag in a repository.
  • VersioncontrolOperation - Represents, usually, a commit action in a repository. The 'operation' concept is one of the abstractions that can get confusing.

Each of these classes have two responsibilities - CUD (that's CRUD sans-R), and retrieving other related data (e.g., you could call VersioncontrolRepository::getItem() to retrieve a set of VersioncontrolItems, or VersioncontrolRepository::getLabels() to retrieve a set of VersioncontrolBranch or VersioncontrolTag). CUD was fairly well implemented on each of these classes by the time marvil07's original GSoC project was over. Related data retrieval was a bit more limited.

This set of classes also replaced awkward alters with inheritance as the new way for backends to interact with VCAPI: VersioncontrolGitRepository extending VersioncontrolRepository, VersioncontrolGitBranch extending VersioncontrolBranch, etc. Interfaces were also introduced to tell VCAPI that a particular backend's objects supported specific types of operations - generating repository URLs, for example. The crucial contribution of marvil07's GSoC project was developing this family of classes, which has remained largely unaltered. Unfortunately there wasn't really time to get to refactoring the logic, so much was simply cut from old 1.x procedural functions and moved into an analogous class method.

By the time we had reached the end of GSoC, I'd grown into the opinion that marvil07's work was an excellent first step. We still largely the same 1.x logic, just moved into an object-oriented environment. API<->backend interaction via inheritance had helped the identity crisis, but not resolved it entirely. There was some more flexibility for the backends to control logic that had once been the sole domain of the API, but we were still swimming upstream - too many disparate hooks, too much logic in VCAPI that the backends couldn't touch. A good foundation, but far from finished.

The Great Git Migration

When the big discussion about switching VCSes happened in February 2010, we were still gradually fleshing out the skeleton that had been introduced during GSoC 2009. During the discussion, the question was quite rightly raised whether we should even bother with VCAPI, or if we should just use something else (or start from scratch), especially given the wide agreement on wanting "deep integration". (On using VCAPI at all, this bit of the thread is particularly enlightening.) I ended up arguing that VCAPI, while by no means perfect, had already done a pretty good job of tackling the not-inconsiderable datastructure and CRUD questions. Those problems would have to be solved anyway, so starting from scratch would have been a waste. Folks ultimately found that to be a convincing argument, and that's been one of the major principles guiding the migration work thus far.

Another guiding principle also emerged from the initial discussions - if we're going to build our own system, it must be developer-friendly & maintainable. For years, the cruft and complexity of Project* has limited contributions to a very small circle of overworked developers; allowing the migration work to produce similarly impenetrable code would be horribly shortsighted. Consequently, the architectural decisions we've made have been as much motivated by the long-term benefits of architecting a tight, intuitive system as the short-term benefits of just finishing the damn migration already. Let's run through some of the big architecture shifts made thus far:

  • One of the biggest weaknesses in VCAPI 1.x was the querybuilder. It was an awkward custom job that introduced a few thousand lines of code and was quite difficult to extend. So we replaced the whole thing using the DBTNG backport.
  • In tandem with the conversion to DBTNG, we did a partial backport D7's entities. All of the classes from marvil07's original OO refactor (VersioncontrolRepository, VersioncontrolItem, etc.) are now instances of VersioncontrolEntity. Their loading is managed by a family of classes descended from VersioncontrolEntityController; all that can be seen in includes/controllers.inc. This is a great conceptual step forward - it makes a TON of sense to treat most of the objects VCAPI handles as entities.
  • We took another bite out of the identity crisis by definitively separating mass-loading for listings from targeted loading for data manipulation. Mass-listings are Views' responsibility, pure and simple. Only when you're actually _doing_ something with the API will objects get built from the complex Controller loaders.
  • We introduced a VersioncontrolBackend class, replacing the array returned from hook_versioncontrol_backend(). This class will increasingly replace procedural logic as a unified behavior object governing everything that VCAPI expects a backend to implement. To that end, the backend acts as a factory for turning data loaded by the VersioncontrolEntityController family into instantiated VersioncontrolEntity objects.

In short, we totally rebuilt VCAPI's plumbing, and with quite an eye towards the future - using DBTNG and Entities will make the D7 port very manageable. And now we're in the final phase of work with VCAPI - fleshing out entity methods, tweaking the datastructure, and dealing with the UI. All the stuff motivating me to write this article, as a way to force myself to think through it all properly.

Looking Forward

First, let's do a quick revisit of VCAPI & backends' purpose. These proceed roughly in order from plumbing -> API -> UI.

  • Maintain a list of repositories known by the system.
  • Maintain a mapping between Drupal users and the users known to the repositories.
  • Maintain ACLs pertaining to those users & repositories, and make the data readily accessible to the hook scripts that actually enforce the ACLs.
  • Track the contents/activity of a repository into an abstracted, cross-vcs format.
  • Link repository activity with users.
  • Provide sane default behaviors that can then be easily adapted to a specific VCS' requirements by the backend module.
  • Provide sane API to third-party (non-backend) client code for using or extending VCAPI's data.
  • Provide overridable & retool-able UIs for administrative functionality.
  • Provide portable, overridable & retool-able UI elements for listing & statistical information, like commit activity streams.

Now, let's run through that list to see how 1.x stacks up:

  • Maintain repository list - check, but CRUD is awkward.
  • User mapping - check, but CRUD is awkward.
  • ACLs - check.
  • Repository content tracking - check, but confusing & awkward through over-abstraction.
  • Repo content<->user link - check.
  • Sane defaults + backend overridability - nope. 1.x worked mostly by overstuffing logic into the API, and allowed backends to interact by flipping toggles. The rest was done with confusing hooks.
  • Third-party utility - nope. Third-party code just has the same set of confusing hooks, and not a lot of helpful API.
  • Admin UI - sorta. Static UI, even hard-coding some assumptions about data sources (e.g., repository "authorization methods"), but with some control afforded to the backends.
  • Portable UI elements - sorta. Blocks were used, but because there was no Views 2 when 1.x was written, there's just those hardcoded blocks. Moving to Views makes creating portable UI elements FAR easier.

Many of the problems in 1.x are helped, or even solved, by the architectural improvements I've been talking about throughout the article. Now let's break out our current work, the 2.x branch, into the same bullets. And forgive me, but I'm going to break narrative here and mention some details that I haven't previously explained. This IS supposed to be a list to help us actually finish up the work, after all :)

  • Maintain repository list - check. VersioncontrolRepository(Controller) has probably gotten more love than any other class. One major addition would be support for incorporating a backend-specific repo interaction classes, along the lines of svnlib or glip. That would make VCAPI into an excellent platform for doing repository interactions that are way outside the original scope; just load up the repository object from VCAPI, then go to town.
  • User mapping - unfinished - VersioncontrolAccount is one of the classes has barely been touched thus far.
  • ACLs - unchanged since 1.x, and in need of revisiting in light of all the other changes; best addressed at the same time we're revisiting VersioncontrolAccount.
  • Repository content tracking - almost there. We're going to undo a conflation made in 1.x; see these two issues. VersioncontrolOperation will go away in favor of VersioncontrolCommit, and we'll introduce a separate system for tracking activity (i.e., network operations) that is clearly separated from tracking repository contents.
  • Repo content<->user link - check. Despite the need for cleanup on VersioncontrolAccount, I believe this linkage is 100%.
  • Sane defaults + backend overridability - check, thanks to the move to good OO patterns.
  • Third-party utility - getting there. The advent of the OO API makes navigating VCAPI's internal datastructures much easier, but we still need to think about where & how we allow for alteration. Y'know, where we put our alter hooks.
  • Admin UI - not yet. We've backtracked from 1.x a bit, taking out some of the more hardcoded UI elements and are fixing to replace them with more flexible pieces. For the most part, that means building lots of Views, e.g., this issue. As with everything else in VCAPI, some of the difficulty comes in offering a dual-level API - one to the backends, the other to third parties.
  • Portable UI elements - zero. We're not going to provide a single block via hook_block() if we can at all avoid it. Views-driven all the way. Complicated, though, because the 'dual-level API' problems mentioned under Admin UI very much apply.

What's now emerging in 2.x is a layered, intelligible API that is thoroughly backend-manipulable, while still presenting third-party code with a consistent, usable interface. And with a repo interaction wrapper like I described above, VCAPI would be a launching point for the "deep integration" we all want. We're not there yet, but we're getting close. There's a general, central push to get a LOT more test coverage (especially testing sample data & standard use cases), without which we'll just never _really_ be sure how well the monstrosity works. There are still some crufty areas - "source item" tracking, "authorization method" for repository account creation - that we need to decide whether we discard, leave, or improve. And we need to come up with a consistent pattern for implementing dual-level Views: every backend needs to be able to generate a list of repository committers or an activity stream, for example, but each backend may be a bit different. So VCAPI provides a sane default, which can then be optionally replaced by a backend-'decorated' version.

I'm hoping this article helps put the VCAPI & family segment of Drupal's git migration in perspective. With any luck, it also gives enough of a sense of the problems we're grappling with that more folks might want to hop in and help us move everything along. Input on these plans are MORE than welcome.

Jun 11 2010
Jun 11

It's official - the Drupal Association has selected me to be the 'Git Migration Lead.' I'm tremendously excited, and can't wait to knuckle down and get this migration DONE. I'll be launching full-tilt into the list of issues that stand between us and git goodness, but before I do, I want to take a minute to clarify how I understand and will be approaching this position.

It's not the DA's role to determine the direction of drupal.org, let alone Drupal itself. Rather, the DA exists to support and facilitate efforts that the community has already decided are worth pursuing. At least, that's how I understand it. Consequently, my role as git lead is primarily about ensuring this migration happens to the satisfaction of the community - not merely my own satisfaction. It helps that we've already got a well-established todo list, but that also requires I be open to input throughout the process. And that's the plan. In fact, I can't think of any part of this project that I don't plan on conducting in public, through a combination of the g.d.o group, in the issue queues, on twitter (I've started a new account just for this), over the dev list, and occasionally on this blog. There will be no shortage of means by which you can get information, give feedback, or - please! - help out (if nothing else, my contact form works).

I think publicizing this process is crucial because it's the best way to make sure we have the energy and participation necessary to ensure it actually happens. And at the end of the day, that's the crux of my responsibility. So I'll be doing a mix cheerleading, organizing volunteer energy, and when necessary, coding - whatever it takes to ensure that the migration is always moving forward. Which is exactly why the DA created this paid position: historically, a collective desire for big infra changes hasn't been enough. Someone's ass needs to be contractually on the line.

Of course, my position is temporary, and will only last through the initial migration (Phase 2). At that point, we're back to all-volunteer energy for further git improvements. So I have another goal for the migration process: we need to grow the group of people familiar with and responsible for our project infrastructure. My hope is that we can take all the interest and excitement over switching to git and cultivate that wider group. So make no mistake, if I get my hooks into you over the next few months, I won't be letting go when the DA stops signing my checks :) And besides, the reality is that those who participate most during phase 2 will have the most clout during phase 3

Anyway - we all know how long this move from CVS has been coming. Now that it's here, let's not let make our community wait a day longer than it has to :)

Feb 10 2010
Feb 10

Last week, Matt Farina tossed me a question about the best approach to introspecting code in PHP, particularly in relation to whether or not the situation was a good candidate for using PHP's Reflection API. The original (now outdated) patch he gave me as an example had the following block of code in it:

= class_implements($class);
      if (isset(
$interfaces['JSPreprocessingInterface'])) {
$instance = new $class;
      else {
        throw new
Exception(t('Class %class does not implement interface %interface', array('%class' => $class, '%interface' => 'JSPreprocessingInterface')));

I've used Reflection happily in the past. I've even advocated for it in situations where I later realized it was the totally wrong tool for the job. But more importantly, I'd accepted as 'common knowledge' that Reflection was slow. Dog-slow, even. But Matt's question was specific enough that it got me wondering just how big the gap ACTUALLY was between the code he'd shown me, and the Reflection-based equivalent. The results surprised me. To the point where I ended up writing a PHP microbenching framework, and digging in quite a bit deeper.

My hope is that these findings can help us make more educated judgments about things - like Reflection, or even OO in general - that are sometimes unfairly getting the boot for being performance dogs. But let's start with just the essential question Matt originally posed, and I'll break out the whole framework a later.

FYI, my final and definitive round of benchmarks were performed on a P4 3.4GHz with HyperThreading riding the 32-bit RAM cap (~3.4GB), running 5.2.11-pl1-gentoo, with Suhosin and APC. With Linux kernels, I strongly prefer single core machines for microbenching; I'm told that time calls on 2.6-line kernels get scheduled badly, and introduce a lot of jiggle into the results.

Is Reflection Really That Slow?

NO! In this case, a direct comparison between reflection methods and their procedural counterparts reveals them to be neck in neck. Where Reflection incurs additional cost is the initial object creation. Here's the exact code that was benchmarked, and the time for each step:

function _do_proc_interfaces() {
class_implements('RecursiveDirectoryIterator'); // [email protected]
}function _do_refl_interfaces() {
$refl = new ReflectionClass('RecursiveDirectoryIterator'); // [email protected]
$refl->getInterfaceNames(); // [email protected]

The comparison between these two functions isn't 100% exact, as ReflectionClass::getInterfaceNames() generate an indexed array of interfaces, whereas class_implements() generates an associative array where both keys and values are the interface names. That may account for the small disparity.

While it wasn't part of Matt's original question, curiosity prompted me to test method_exists() against ReflectionClass::hasMethod(), as it's the only other really direct comparison that can be made. The results were very similar:

function _do_proc_methodexists() {
method_exists('RecursiveDirectoryIterator', 'next'); // [email protected] iterations
}function _do_refl_methodexists() {
$refl = new ReflectionClass('RecursiveDirectoryIterator'); // [email protected] iterations
$refl->hasMethod('next'); // [email protected] iterations

These direct comparisons are interesting, but simply not the best answer to Matt's specific question. Although the procedural logic can be mirrored with Reflection, Reflection provides a single step to achieve the exact same answer as took several procedurally:

// Original procedural approach in patch: [email protected] iterations
function do_procedural_bench($args) {
$interfaces = class_implements($args['class']);
  if (isset(
$interfaces['blah blah'])) {
// do stuff
// Approach to patch using Reflection: [email protected] iterations
function do_reflection_bench($args) {
$refl = new ReflectionClass($args['class']);
  if (
$refl->implementsInterface('blah blah')) {
// do stuff

This logic achieves the same goal more directly, and so is more appropriate for comparison. It's also a nice example of how the Reflection system makes up for some of its initial object instanciation costs by providing a more robust set of tools. Now, the above numbers don't exactly sing great praises for Reflection, but given all the finger-wagging I'd heard, I was expecting Reflection to do quite a bit worse. As it is, Reflection is generally on par with its procedural equivalents; the big difference is in object instanciation. It's hard to say much more about these results, though, without a better basis for comparison. So let's do that.

More Useful Results

Benchmarking results are only as good as the context they're situated in. So, when I cast around in search of a baseline for comparison, I was delighted to find a suitable candidate in something we do an awful lot: call userspace functions! That is:

// Define an empty function in userspace
function foo() {}
// Call that function

Because foo() has an empty function body, the time we're concerned with here is _only_ the cost of making the call to the userspace function. Note that adding parameters to foo()'s signature has a negligible effect on call time. So let's recast those earlier results as numbers of userspace function calls:

  1. Checking interfaces
    • class_implements(): 3.6 function calls
    • ReflectionClass::getInterfaceNames(): 3.7 function calls
  2. Checking methods
    • method_exists(): 2.0 function calls
    • ReflectionClass::hasMethod(): 2.7 function calls
  3. Logic from Matt's original patch
    • Approach from original patch: 4.5 function calls
    • Approach using reflection: 8.7 function calls (3.6 if ReflectionClass object instanciation time is ignored)

These numbers should provide a good, practical basis for comparison; let 'em percolate.

Let's sum up: as an introspection tool, Reflection is roughly as fast as its procedural equivalents. The internal implementations seem to be just as efficient, as the primary cost seems to have more to do with the overhead of method calls and object creation. Though creating a ReflectionClass object is fairly cheap as object instanciation goes, the cost is still non-negligible.

My interpretation of these results: Given that Reflection offers more tools for robust introspection and is considerably more self-documenting than the procedural/associative arrays approach (see slide 8 of http://www.slideshare.net/tobias382/new-spl-features-in-php-53), I personally will be defaulting to using Reflection in the future. And, if using the additional introspective capabilities of a system like Reflection early on Drupal's critical path (bootstrap, routing, etc.) means we can make a more modular, selectively-loaded system, then their use is absolutely justified. At the end of the day, Reflection should be an acceptable choice even for the performance-conscious.

...With an important caveat: The thing to avoid is the runaway creation of huge numbers of objects. Many reflection methods (ReflectionClass::getInterfaces(), for example) create a whole mess of new objects. This IS expensive, although my benchmarks indicate each additional object instanciation is roughly 1/3 to 1/2 the cost of instanciating ReflectionClass directly. So be sensible about when those methods are used.

My Little Framework

To do all this benchmarking, I wrote a small framework that does four crucial things:

  1. Allows the function to be benchmarked to be specified externally
  2. Runs two loops for each benchmarking run - an inner loop containing the actual function to be benchmarked, which is iterated a configurable number of times, and an outer loop that creates an sample set (of configurable size) with each entry being the result of the inner loop
  3. Processes results, calculating standard deviation & coefficient of variance; additional mean result values are also calculated by factoring out both a configurable time offset, as well as the time offset incurred by processing overhead for the framework itself (the internal offset is calculated on the fly)
  4. Repeats a benchmarking run if the result set's coefficient of variance > a configurable target value

Since I had the framework already together, I ran some more tests in addition to the ones above, mostly focusing on object instanciation costs. The results are in this Google Doc. In addition to the results from the Reflection Comparisons tab (which are from the first part of the blog post), there's also data on the costs for most other Reflection types with a wide range of arguments under Reflection Instanciation. The Object Instanciation tab, there is data on the instanciation time for a small variety of classes; the range of times they require is quite interesting.

Some oddities

Though I put forward static calls as a baseline before, if you look at the framework, you'll notice that it uses a dynamic call. Interestingly, dynamic function calls work almost exactly as fast:

// Define an empty function in userspace
function foo() {}
// Call our foo() userspace function dynamically
$func = 'foo';

I glossed over this earlier because, within the confines of the framework, these two have almost exactly the same execution time (variations are totally within error ranges), whether or not an opcode cache is active. This strikes me as strange, as there's no way dynamic function calls can be known at compile-time...not that that's the only relevant consideration. But I don't know the internals of PHP, let alone APC, well enough to grok how that all works. So for the purposes of these benchmarks, I assumed the two to be interchangeable for the purposes of results-gathering. However, because I don't trust those results to be accurate without confirmation from someone with greater expertise, I'd rather people not make that assumption when writing real code.

Also, there is one case where Reflection differs notably from its procedural counterparts: object instanciation. While the other methods were generally on par, the cost of $refl->newInstance() vs. new $class() consistently differed by approx [email protected], or around 3 function calls (see the results for _do_refl_instanciate() vs. _do_proc_instanciate() under the Reflection Comparisons data). I suspect this is a result of the difference between a method call vs. a language construct, as the difference is similar to that of the difference between a static function call and call_user_func().

Sep 30 2008
Sep 30

User Panels shoots pretty high: it aims to provide a consistent, easy platform for the handling of user-centric information display. It's not about new storage mechanisms or anything like that - just about marshaling all that user content together in a sane, easy-to-use way. I'm hoping that this blog post can be a semi-official shout-out to the drupal community - an RFC, I guess.

user_panels comes out of a basic observation about Drupal: virtually every Drupal site has a different conceptualization of who users are, what their role is within the site, how they should be interacting with one another, etc. This observation helps explain why it's been so difficult to settle the question of how 'users' ought to be handled - most folks, myself absolutely included, are coming from the perspective of a particular use case. Even with deliberate effort, it's been pretty tough to get into a genericized head-space with respect to users. There are, I think, a couple reasons for this.

IMPORTANT NOTE: much of this probably sounds like a proposal for core. It's not. While I'd personally like to see a solution along these lines integrated into core, this solution requires Panels, which would mean getting the Panels engine into core - and that's a whole other bag of chips. This idea CAN be implemented entirely in contrib, and that's what my focus is on here.

The 'User-Centric' Challenge

The first problem is tied to data storage: almost all of the profile solutions that have arisen store user data in nodes. Specifically, there's bio and nodeprofile for D5, which have merged (and an enormous kudos to everyone involved in that effort!) into content profile for D6. Now, many debates have been had about the appropriateness of nodes as a data storage mechanism, and let me be clear that, while it's an important debate, it's not the topic at hand. There's also core's profile.module...but that's it's own whole can of worms that needn't be opened right now.

With the node-based solutions, the problem is at render-time: if you're storing a whole bunch of data about users in their corresponding node, then you've got to pick a render-time strategy for teasing out the particular subsets of data you want and arranging them on the page. Which either means doing it in the theme layer, or handing it off to another module first. Pushing the responsibility directly to the theme layer is just wrongheaded, in my opinion - it means that for any site implementing user-centric data pages, there's a task to be done which sits uncomfortably between a typical drupal dev and themer's toolset. Handing it off to another module first is the better option, because that module can make data-organizational level decisions, then present a consistent package to the theme layer. As far as I'm aware, Advanced Profile Kit is the only module that's really directly focused on that kind of logic for users. (Note that Michelle and I have been talking about this general idea for a while, and the long-term plan is to deprecate APK in favor of user_panels, which she and I would co-maintain)

But Users != Profiles, which begs the question: where does MySite fit in to the above discussion? It doesn't, really. MySite doesn't use nodes as a data storage mechanism, and it's not about building user profiles. It's more analogous to something like an iGoogle homepage - which in turn entails that it provide its own rendering logic for a DND interface. But it's still very much within the 'user-centric' scope. This disjointedness of these connections points to what I believe to be the second major problem with drupal's user handling: there are different conceptual axes along which any given user-centric page can be organized, and we're not always clear on which one we're talking about. Specifically, I see there being three axes: the user profile (bio/nodeprofile/content_profile, APK), the user homepage (mysite), and the user account page, which the core user module currently provides. Three, because of basic structural differences at the access level:

  • The account page is strictly user-facing; menu callback-level access is private.
  • The homepage is typically user-facing, with the potential for exceptions; menu callback-level access is semi-private.
  • The profile is public-facing, with the potential for restricting access to sub-components; menu callback-level access is public.

Note: There are some common exceptions to these access settings, but I'm not aware of any that can't be handled easily.

I don't know if this division has been explicitly articulated anywhere else, but its basic tenets strike me as being implicit in almost all of the discussions about Drupal's user handling. It's the conceptual underpinning over which many such discussions break down, because folks tend to (quite understandably) build a conceptual model of users based on the use case they've worked/are working from. Breakdown tends to occur over this problem: a piece of content that clearly belongs on the private account section/axis for Site A equally clearly belongs in the public profile section/axis for Site B. I think that core's existing system of providing user categories probably gets the gold star for best recognizing this reality, as it opens up the potential for implementing the user as a dynamic platform, viewable/interactable through many different lenses. Unfortunately, the core system crashes and burns on implementation.

A Platform: User Panels

There are a number of problems that a user platform has to solve if it's going to to better than core, more than just the ones I've described above. But they're a decent starting point, so I'll tackle the two major issues - node data retrieval & display, and handling of different user 'axes' - to begin with. A quick excerpt from earlier in the article:

...if you're storing a whole bunch of data about users in their corresponding node, then you've got to pick a render-time strategy for teasing out the particular subsets of data you want and arranging them on the page...Handing it off to another module first is the better option, because that module can make data-organizational level decisions, then present a consistent package to the theme layer.

That.Is.Panels. Minus the fact that Panels is not even remotely restricted to node data, it's a passable description of what Panels does: it grabs a specific bit of data and arranges it with respect to all the other pieces of data, all the while interacting with and presenting a consistent package to the theme layer. Problem 1, check.

The second issue is a little more complex, as it has to do with the way that Panels' context system works. But it's also the essence of user_panels-as-platform. I don't want to digress into the depths of the Panels engine, though, so I'll start with the final vision. PLEASE note that this description simplifies on a number of levels concepts for clarity & brevity:

  • Modules such as like nodeprofile, bio, content_profile, mysite, etc., would provide the content they create and store as pane types to be used by the Panels engine. (Things provided by core can be packaged into the user_panels module itself).
  • Through an administrative GUI, site admins can choose which (if any) of the different axes - private, semi-private, and public - get to use which of the various pane types provided by those modules.
  • Site admins can choose [system] paths at which each of these axes should reside, as well as whether or not to enable the semi-private or public axes at all.
  • Site admins can also set up how each of the displays for the axes should look, and set the override mode for each axis: either 'blueprints' or panels_page-style.

I'm hoping that the only particularly difficult thing to grok in that bullet list is the 'override mode'. The mechanics are pretty abstract and arcane, but in application, it's really pretty simple: imagine that we're overriding node/% with panels, and we're not doing any funky stuff with different displays for different node types. In this case, panels_page does overrides by using a single display for ALL those callbacks. That means there'll be exactly one row in the {panels_display} table, with one configuration, and EVERY single page request of the form node/% will call up the data from that row. Even if you've got 10 million nodes, they're all rendered through that one display.

In Blueprints mode, however, having 10 million nodes would mean that you also have 10 million displays. The difference is significant because it means that for each of those nodes, the node's owner is able to control how his/her node looks without affecting how any other nodes look. All the site admin does is create a 'blueprint' that provides all new nodes with a pre-configured display, that the owner can then change at will. In other words, everyone gets to control the appearance of their own node - or for our case, their profile, or homepage, etc. This is the paradigm under which og_panels operates.

Hopefully I'll have time to write up a little more about this paradigmatic difference, and potentially some efforts towards abstracting the process of writing a blueprints-based system (panels_page-style overrides are fairly straightforward by comparison), but that's all a separate blog post. For our purposes here, the bottom line is: Site admins can decide whether all user_panels are identical (created by the site admin), or if the users should be able to modify them.

Most of what needs to be done to make this a reality isn't actually that hard. We'd need to stitch together an admin interface, and pull in pieces of code that have already been written and tested in og_blueprints and panels_page. Abstracting the blueprints paradigm would be nice, too, but it isn't strictly necessary and can be done later.The only part of this whole idea that I think would be difficult is the very first bullet point in the list - writing the Panels integration for each of those modules. That's the part that'll depend on interest in this idea by the rest of the community.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web