Jan 04 2008
Jan 04

This saturday (January 5th), I'm going to be hosting a Drupal Dojo on getting Drupal to communicate with desktop applications entitled Drupal and the Desktop. We'll be using Mono and C# to make an application to read and write stories on a Drupal website. It'll be pretty neat, so be there!

I'll make all resources available after the presentation.

Dec 30 2007
Dec 30

As I was setting up a new Drupal site, I decided to try out the hilariously named phpass module. (I cannot read that as “PH Pass” to save my life. It always comes out as… something else.)

The good news about this module is that it builds upon a PHP project called, um, phpass to add better password hashing to Drupal. The traditional way to handle passwords is to ask the user for one, compute a hash function on it, and store the hashed version. Unix systems use the Unix crypt utility to make the hash. Some newer and more naive systems, like core Drupal, use MD5 hashing, presumably because it’s newer (and, therefore, niftier by definition) and also because it’s faster.

Unfortunately, it’s a bad idea to use a fast hash function to hash your passwords, because the speed makes the brute-force attacks that much more efficient. The Right Thing to do, if we trust the professionals at Matasano Chargen, is to use an adaptive hashing scheme like bcrypt, which is tricky and slow, and can be made slower and trickier as computers get faster and faster.

Unfortuately, neither my Mac nor my Ubuntu deployment box supports CRYPT_BLOWFISH, the encryption scheme that’s needed for all-out bcrypt support. So I am using the phpass fallback scheme for now. I could try to install CRYPT_BLOWFISH using the Suhosin PHP hardening extension, but would need to test this carefully to make sure I don’t break Drupal in the process.

In the meantime, I got halfway into the Suhosin downloading process before I decided to put it off until tomorrow. Part I of the process was to finally install Gnu Privacy Guard, which I have always resisted because it seemed to be a usability horrorshow with few actual uses. I only know two people who really seem to use GPG-signed mail, let alone GPG-encrypted mail. But it turns out that there’s now a handy set of instructions for installing GPG on a Mac using the MacGPG project. And I might even get GPG working with Mail once the GPGMail utility finishes being ported to Mac OS 10.5.

Dec 29 2007
Dec 29

Volunteering for an open source project has been a learning experience.

I have experienced the kindness of strangers
I have been thanked for helping others

I have had the privilege to meet a wide variety of people scattered around this world. This shows me that people have an intense drive to communicate and build.

I have also experienced the cruelty of those protected by their keyboards. This has been the most difficult experience to face. You do something routine, send an off line message you hope to clear things up and then you and others who have spent countless unpaid hours get accused of lying, deception and other random falsehoods from someone who makes money and benefits from your and countless others contributions.

This is the most difficult part of volunteering in an online community. Realizing that those with an ax to grind, an agenda, a fear will lash out and slander you in an attempt to justify their fears and desire for some goal or desire or power. It is even more difficult when these accusations come from someone whom you have provided help and trust in the past but has not been active for the last year suddenly do this.

It is particularly difficult when people use rhetorical devices to cloak their attack. When challenged, they accuse others of falsehoods and consistently ignore questions but answer with oblique mis-directions. It is much like the tactics made famous by Karl Rove.

It makes continuing to contribute difficult, but perhaps that is the goal, to drive out, poison collaboration, distort and cause confusion. That seems to paranoid. Perhaps that is the goal. I don’t know, it’s hard dealing with people’s fears and cruelty.

I am going to try and ignore the attacks and the cruelty but am not sure I will succeed. I don't know that I have that much patience and understanding.

Dec 28 2007
Dec 28

Over the past year I’ve devoted myself to drupal development and a new site made in drupal has been long overdue.

This site is freshly built in Drupal 6 (currently in Release Candidate 1). It took about three hours to build (the majority of that was the design work) - Drupal 6 is even faster to set up than Drupal 5.

I plan to bring in the best of my music and comics from previous incarnations of my website, and also use this site as part random blog and part drupal-related work portfolio.

Dec 28 2007
Dec 28

Over the past two years, Drupal's wiki capabilities have expanded exponentially. Yet, still we get support requests on the forums, "How can I make a wiki with Drupal?" Well, here is a detailed plan that gives wiki functionality to Drupal. This tutorial assumes you're starting with an installed version of Drupal 5.x, and that you're familiar with installing modules.

Absolute Wiki Essentials:

  1. Step 1: Allow for categorization of wiki pages.

    There is often this request: I want to be able to categorize my wiki pages into a hierarchy. Well, with Drupal core's book module, you can do just that! First, enable the book module. Then, go to the admin/content/types page to view your content types. Delete any content types you don't want. Then, rename the "Book page" content type to "Wiki page" or something similar. Also, in the "workflow" fieldset, make sure to check the "create new revision" checkbox. This will make it so that, by default, every edit of a page is done in a revision.

  2. Step 2: Install the wikitools module.

    The wikitools module is an essential for all Drupal wikis. From deletion protection, to move protection, to ensuring that there is only one page for each title, wikitools is the future of Drupal wikis. Install it, and then configure the settings on the admin page to fit into your idea of how your wiki should work.

  3. Step 3: Create a wiki-style filter.

    There are several ways of doing this. I recommend the pearwiki filter module, as it allows for much flexibility. However, also consider the freelinking module, which is easier to install. If you use the freelinking module, you will have to check the "Hijack freelinking module" checkbox on the wikitools install page. Then, configure your input format with the appropriate filters.

  4. Step 4: Configure permissions.

    At the admin/user/access page, configure user permissions. Here are some recommended settings:

    Of course, feel free to alter these settings as appropriate to your site.

Cool Drupal wiki tricks

What's next?

Now that you've read this article, do you have any more ideas? Perhaps you have another cool wiki Drupal tip to share? Or maybe you want to implement some wiki-style Drupal features, but aren't sure how. Or else you're trying out some of the things I suggested, but having trouble? Feel free to post a comment below; I'd love to hear what you have to say.

Dec 22 2007
Dec 22

Drupal 5 scales really well

This surprised me. Having run into many of the common Drupal scalability problems on smaller sites with shared hosting, I expected much worse. In the process of migrating National Novel Writing Month to Drupal 5, I discovered that there are number of contrib modules, patches, hacks, and techniques that can be applied to allow Drupal 5 to scale to handle a medium-traffic, high authenticated/anonymous ratio, web site like ours, as long as you can live without modules that use the node_access table.

I don't know of any CMS that scales well right out of the box. That fact that someone with my (relatively low) skill set, working alone, was able to resolve major scalability issues in about a month really speaks to the quality of the Drupal codebase and, especially, the community.

Starting points

  • At the start of October we had 2 webservers running apache and a single MySQL server.
  • httpd.conf and my.cnf were reasonably well tuned
  • Drupal's htaccess was moved into an Apache conf file so that we could set AllowOverride none.
  • Boost was installed and configured so that cached static pages would be served from the filesystem instead of the database.

The first wall - node_access

Every year NaNoWriMo gets a small traffic peak on October 1 when the new site launches, and an enormous traffic peak on November 1. October is an order of magnitude more traffic than we get for the rest of the year, and November 1 is an order of magnitude higher than that (which quickly ramps down to around double the October traffic level for the rest of November)

After the 10/1 uptick, it became clear that at ~300 logged in users the site would slow dramatically. This meant a slow site for most of the daytime hours in the US.

The webservers were underutilized, and the DB server's CPU was pinned. Looking at the slow query log, we saw lots of node access queries.

I was aware that node access isn't known to scale gracefully. We were using 3 nodeaccess modules - forum_access, og, and nodeaccess. Since we also use node profile, and have over 100,000 entries in the user table, that meant a lot of rows in the node_access table (by then end of the event we had ~100K forum nodes, so disabling node profile wouldn't have helped in the long run).

It was easy for us to live without nodeaccess, and without the node_access features of og. However, forum_access was necessary for us. Fortunately, forum_access uses ACL for much of its access control - including its taxonomy query rewrite - only using node_access for access to individual nodes and to rewrite queries for other modules like search.

We removed all of the node_access code from forum_access, replacing it with hook_nodeapi for individual nodes, and patching node_search to repect forum_access's ACL entries when searching. I'm still working on getting the node_access-free version of forum_access to work without patching core, and will submit patches when it's ready.

Once we disabled node_access, things became much, much faster, and we were able to get to ~600 logged-in users without punishing slowness.

The second wall - LOWER

As November 1 approached, traffic slowly grew until the site was slow again for most of the US daylight hours. We found a lot of small changes that helped keep things reasonably accepatable:

  • Thanks to an unsolicited offer of help from Merlinofchaos and additional support from Robert Douglass, we were able to successfully set up the advcache and memcache modules to cache data for logged in users. This resulted in a noticeable increase in speed.
  • Indexes were added to the privatemsg author field and the buddylist buddy field which cleared up some slow queries.
  • Sessions, history, and users tables were switched to INNODB (our users update their profiles constantly) to try and avoid locking issues.
  • Search and tracker were disabled, as we were seeing many of their queries in the slow log.
  • Watchdog was disabled by commenting out it's SQL query, as we didn't feel it's benefit was worth the database usage.
  • We wrote an alternate set of forum_cache routines for advcache (we couldn't use advcache's standard forum_patch due to forum_access)
  • We patched boost to cache URLs with query strings
  • .

Things were faster, but they were still slow. It was a faster slow. Finally a couple of slow queries caught my attention:


SELECT * FROM 'users' WHERE mail = '[email protected]'
SELECT name FROM users WHERE LOWER(name) LIKE LOWER('username')

The first query, issued by the 'Request new password' feature, was resolved by adding an additional index on the users.mail field.

The second query was coming from user.module (when checking the uniqueness of usernames on account signup) and a few other places. Although name is indexed, if you ask for LOWER(name), the index doesn't help. We have a very large user table and lots of sign-ups in October.

I patched user.module to remove the LOWER (it's unnecessary for MySQL and only exists for PostgreSQL compatibility), but later discovered a thread with some better options for fixing this: http://drupal.org/node/181625.

Removing LOWER resulted in a dramatic speed increase. Our Queries-per-second on the database server jumped dramatically and the CPU was never at 100% usage anymore. Finally, as traffic increased and November 1 drew closer, the webservers were the bottle neck instead of the database.. We were able to support ~1500 logged in users without slowness, and as many as 2000 without punishing slowness.

November

NaNoWriMo has a long and impressive history of crashing hard on November 1. It's really impossible for an organization with our meagre resources to afford the server and human resources necessary to cope with a massive traffic spike that happens one day a year. We didn't crash hard this year, but the web servers were swamped enough that some people got timeouts.

We did crash on 11/3, but this was due to a bug. I had written some bad date handling code that led to an infinite loop, and eventually an out of memory error for PHP. When many many apache children ran out of memory at once, apache was unable to recover.

We added a third webserver and put all the webservers behind a squid reverse-proxy. This combined with the decline in traffic after 11/1 put an end to our performance issues.

Another slow query caught our attention in the days after 11/1. It was the sess_gc() query from session.inc. Our sessions table was enormous and the query to delete old sessions was taking a long time. We reduced the gc_maxlifetime setting from the default 55 hours to 2 hours (we use Persistent Login, so anyone who wants to remain logged in after 2 hours of idle can use PL) and noticed another small improvement in response time.

Towards the end of the month I wrote some caching for our custom user search module that allowed us to reenable part of the site's search functionality.

Going forward

Our next steps will be to reenable search and tracker.

  • For search we're hoping to be able to write some caching that won't violate forum access.
  • We'll also look at views_fastsearch.
  • Finally, we'll apply one of the patches from the discussion at http://drupal.org/node/105639 to optimize the tracker module.

MySQL replication is another technique that would have saved us a lot of pain this year. We expect to switch to a replicated set-up for next year's event, and perhaps upgrade to 64-bit servers. I believe that this will allow us to remain responsive even with the November 1 traffic spike.

Notes

Advcache

Boost

  • Boost's .htaccess disables browser caching of images and other static files. It's worth patching this on a high-traffic site. http://drupal.org/node/185075
  • Boost doesn't cache pages with query strings by default (for example, node/123?page=1). Caching these helps a lot on a high-traffic forums site, where many nodes have many comments. http://drupal.org/node/182687
  • Boost doesn't play nicely with Persistent Login out of the box. http://drupal.org/node/186716
  • Boost's cron hook, as with any cron hook that deletes files on the webserver, doesn't work well if there are lots of files cached, and has trouble if you have multiple webservers that don't share a filesystem. It's worth writing your own cron job to delete expired Boost cache files.

Memcache

  • You should set up a separate bin for each cache_table, otherwise one module's 'cache_clear_all' will affect everything cached.
  • Memcache handles serialization differently than the core cache.inc. Memcache has a patch to handle changes to core for this, but any contrib module that caches will probably need to be patched to work correctly with Memcache - including Advcache, although there are plans to make Advcache aware of Memcache out of the box.

MySQL

  • The MySQL query_cache doesn't do a lot for us. The tables that would benefit from query cache are so large and get updated so often that, if we were to increase query_cache_size enough to hold a reasonable amount of data, we actually suffer worse performance. I plan to benchmark with query cache disabled, as I think it might be a liability for an active site with constant updates to node and user tables. http://www.mysqlperformanceblog.com/2007/03/23/beware-large-query_cache-...

Squid Reverse-proxy

  • We don't use a shared filesystem for our web servers. We replicate files with unison. In this case it's helpful to use the sourcehash option to cache_peer so that each user remains on the same webserver for their whole session - otherwise users who upload files might not see the file on the subsequent request unless they wait for the next sync to occur.
  • Squid respects the cache headers sent by the upstream web server, so it's worth thinking about which files should be cached for how long and editing your confs appropriately

Dec 22 2007
chx
Dec 22

There are many useful editors and debuggers useable to churn out Drupal code. However, most popular IDEs are written in some non-native language, so they are so very resource intensive and because of that, so slow that I just can't use them. This includes Eclipse, Zend and Komodo. So, this article will not be about these. Neither I will write about the virtues of vi or emacs -- I use much simpler tools.

To begin with, I run Kubuntu Linux. Why that? Mac OS X does not recognize the WiFi in my laptop (and it's not the kind of laptop where you can just switch miniPCI cards). Next up would be maybe Windows, but I want to work instead of wrestling with an operating system that never does exactly what I wanted -- it sometimes does less, sometimes more and sometimes simply does not let me do certain things. I happen to have quite some experience with Linux on servers, so running Linux on the desktop is not that hard either. Which Linux, though? I very much like the APT package manager and out of the distros built around it, I like Ubuntu most (and I am not alone...). The choice between KDE and Gnome was easy, Linus Torvalds famous letter puts this into words better than I could. So we arrived to Kubuntu Linux. In general, the OS behaves rather well with my laptop (aside from a small issue with brightness), and all the issues that were a nightmare for years (WPA for wifi...) got really smooth and simple this year. So Kubuntu Linux is now a pleasure to use. Aside from the operating system, I only deal with source code, so I need something with which I can edit, view, control and search.

I almost always work remote, so easily accessing files remotely are very important to me. It so happens that KDE has an excellent "io slave" subsystem. It means I can open a dir like fish://[email protected]/home/user/ and it'll SSH in to [email protected] and show the dir as if it would be local. There is a webdav kioslave, too. I am using the native KDE editor, KATE. KATE has a nice syntax highlight and a rather primitive, string based autocomplete, but that's often enough -- curiously, given how Drupal uses many indirect ways to call a function, a string based autocomplete is often better than one based on actual PHP syntax. It also has a script debugger (see my blog post) which I do not use that much, my debugging needs are nicely covered with the occassional print, var_export etc. statement. Quite primitive, I know, but it works always -- by not getting accustomed to a debugger I can easily work in a terminal window over SSH with whatever editor I have access on whatever website I need to work on at the moment (I prefer nano in a terminal). Primitve tools have their own uses :)

My browser of choice is Opera, it's just faster and does not eat up resources. I might change to Firefox 3, the jury is still out there -- they claim to fix the memory leaks, we shall see. However, I already use FF as an HTTP debugger with Firebug and the Web Development Toolbar plugins.

I use bzr as the source control system for core. Again, I do not use much of its features, but as there is a big company (Canonical) behind it, the mirror (https://launchpad.net/drupal/main/) not just works rather well but I can expect it to be up for a long time -- it's now up for more than a year now. It's very easy to copy the mirror to the local machine -- and now it's very fast too, in the past it was like ten minutes, now it's about one minute to get the 8000-something revisions. The same bzr branch command lets me create as many branches as I want. I keep around a handy and simple alias (B='bzr diff --diff-options -up') which rolls my patches. If I just want to work on something quick, I do not create a separate branch, rather revert, change, roll the patch and then repeat. I only use branches when I have some patch which takes a lot of time. (This is why you sometimes see accidentally merged patches from me.)

Next up is search. The basic Unix tools are find, grep (and occassionally, cut). Another very useful tool is cscope. I keep a cscope.files in the Drupal directory listing all the .module, .inc, .php, .engine, .theme files and then cscope can quickly show me where a function is defined and where it is used.

In the miscellaneous section, the first mention goes the trusty companion of the well known SSH utility: ssh-agent, which can store you private key in memory, totally avoding typing in any passwords. Even less known is ssh-copy-id which copies your pubkey to a server in one quick step.

Another important utility is kcachegrind which tells me about slow parts of some by visualizing the output of the xdebug profiler.

Dec 20 2007
Dec 20

We'll as a consultant I build webpages mostly, but because it is drupal, this is not average brochure website stuff. I actively build modules, and improve existing modules as well. Currently I am a project maintainer on drupal.org for the quiz module, what this means is that I do some active development on the module, as well as check other people's code for submission to the module.

The quiz module has been something I have been working on for about 6 months now, and I can say that I am happy that a lot of work has gone into it, and it is near a 2.0 launch with many improvements over the first quiz module.

I won't go into details here, but it's really exciting that we can bring some of the features that make drupal so great and make the quiz module work with them, to create a really robust and usable system for testing and quizzing people. A must for any social website.

Along with developing modules, I also build websites on the drupal framework. I have had a lot of experience optimizing these websites for search engines, and building websites using drupal modules as building blocks, and putting them all together with a little magic and theming glue.

That said, my latest project has been a real eye opener. I had the experience of integrating civicrm with drupal, and configuring the webserver from scratch so that we could send mail using civicrm. It is great that I was able to pull this all together with a website that is really functional on top of that. This website just does so much at once, it's really mind boggling. But with my help it should be an easy, intuitive experience with few bumps along the way.

So the state of things in the what I do with drupal world is quite vast actually. And I always keep up to date with all the latest offerings and what have yous inside the drupal community, and trends in web design, just so I can really offer the absolute best websites with the newest bleeding edge functionality.

Related Posts

This will be my first blog entry. In my blog I plan to talk about drupal, and technology. Drupal, for those of you who don't know is a Content Management System, or CMS. What this allows you to do is easily create automated websites which allow users to interact with one another, and for website owners, and users to post data without necessarily knowing html, php or any web language.

While working on various projects with drupal I realized that there is really no one definitive place which lists the concepts that you need to become familiar with when dealing with drupal. So I decided that I would put them in one place for people who maybe are experienced php developers who are trying to learn to work with drupal for the first time.

Content Management systems are the easy end all solution for creating content for the web, a magical thing which makes it easy to create awesome websites which get tons of visitors, right?
Dec 19 2007
Dec 19
css has vastly improved the quality of html markup on the web. however, given its complexity, it has some astounding deficiencies.

one of the biggest problems is the lack of constants. how many times have you wanted to code something like this? light_grey = #CCC. instead you are forced to repeat #CCC in your css. this quickly creates difficult-to-maintain and difficult-to-read code.

an elegant solution to the problem is to use a general purpose preprocessor like m4. m4 gives you a full range of preprocessing capability, from simple constants to sophisticated macros.

traditional css

consider the following example css:

.codeblock code { font:95% monospace; color: #444;}
div.codeblock {
   padding: 10px;
   border: 1px solid #888;
}

applying some m4 macros to this code, not only makes the css more maintainable, it also makes it more readable by increasing its semantic quality.

below, we add constants: mid_grey, dark_grey and std_padding.

the same css with m4

.codeblock code { font:95% monospace; color: dark_grey;}
div.codeblock {
   padding: std_padding;
   border: 1px solid mid_grey;
}

trying it it out

if you'd like to give this a try, m4 is usually available as a standard package e.g. on debian style linux flavors do:

# apt-get install m4 m4-doc

now copy the following code into a file called example.css changequote(^,^)dnl                change quotes to something safe
changecom(^/*^, ^*/^)dnl           change comments to css style

define(dark_grey, ^#444^)dnl       define a dark grey color
define(mid_grey, ^#888^)dnl        define a middle grey color
define(std_padding, ^10px^)dnl     define the standard padding

.codeblock code { font:95% monospace; color: dark_grey;}
div.codeblock {
   padding: std_padding;
   border: 1px solid mid_grey;
}

and now run the preprocessor:

$ m4 example.css

you should see output like:

.codeblock code { font:95% monospace; color: #444;}
div.codeblock {
   padding: 10px;
   border: 1px solid #888;
}

notes on the example:
  • dnl tells m4 to "discard to next line" i.e. ignore everything after it. useful for comments.
  • i use changequote and changecom to change quoting and commenting characters to be more css compliant than the defaults.

using include statements

in practice, you'll often want to use your definitions in several css files. to do this, place your definitions into an external file. in our example, we split our code into two files, definitions.m4 and example.css as follows:

definitions.m4

define(dark_grey, ^#444^)dnl       define a dark grey color
define(mid_grey, ^#888^)dnl        define a middle grey color
define(std_padding, ^10px^)dnl     define the standard margin

example.css

changequote(^,^)dnl                change quotes to something safe
changecom(^/*^, ^*/^)dnl           change comments to css style
include(^definitions.m4^)dnl       include the definitions file

.codeblock code {font:95% monospace; color: dark_grey;}
div.codeblock {
   padding: std_padding;
   border: 1px solid mid_grey;
}

note: my choice of filenames and extensions (definitions.m4, example.css) is arbitrary.

once you've split the files, you can run the preprocessor as before:

$ m4 example.css

further thoughts

this article describes a tiny subset of the power of the m4 language. for more information, take a look at the gnu manual.

one thing that i don't discuss here is integrating a preprocessor into your development / build environment. more on that later.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Dec 18 2007
Dec 18
#!/bin/bash

# guardian - a script to watch over application system dependences, restarting things
#            as necessary:  http://www.johnandcailin.com/john
#
#            this script assumes that at, logger, sed and wget are available on the path.
#            it assumes that it has permissions to kill and restart deamons including
#            mysql and apache.
#           
#            Version: 1.0:    Created
#                     1.1:    Updated logfileCheck() not to assume that files are rotated
#                             on restart.

checkInterval=10                         # MINUTES to wait between checks

# some general settings
batchMode=false                          # was this invoked by a batch job
terminateGuardian=false                  # should the guardian be terminated

# setting for logging (syslog)
loggerArgs=""                            # what extra arguments to the logger to use
loggerTag="guardian"                     # the tag for our log statements

# the at queue to use. use "g" for guardian. this queue must not be used by another
# application for this user.
atQueue="g"

# the name of the file containing the checks to run
checkFile="./checks"

# function to print a usage message and bail
usageAndBail()
{
   cat << EOT
Usage:guardian [OPTION]...
Run a guardian to watch over processes. Currently this supports apache and mysql. Other
processes can be added by simple modifications to the script. Invoking the guardian will run
an instance of this script every n minutes until the guardian is shutdown with the -t option.
Attempting to re-invoke a running guardian has no effect.

All activity (debug, warning, critical) is logged to the local0 facility on syslog.

The checks are listed in a checkfile, for example:

   #check type, daemonName, executableName, checkSource, checkParameters
   logfileCheck, apache2,       apache2,        /var/log/apache2/mainlog, "segmentation fault"

This checkfile specifies a periodic check of apache's mainlog for a string containing
"segmentation fault", restarting the apache2 process if it fails.

This script should be run on each host running the service(s) to be watched.

  -i        set the check interval to MINUTES
  -c        use the specified check file
  -b        batch mode. don't write to stderr ever
  -t        terminate the guardian
  -h        print this help

Examples:
To run a guardian every 10 minutes using checks in "./myCheckFile"
$ guardian -c ./myCheckFile -i 10

EOT

   exit 1;
}

# parse the command line arguments (l,s and t, each of which take a param)
while getopts i:c:hbt o
do     case "$o" in
        i)     checkInterval="$OPTARG";;
        c)     checkFile="$OPTARG";;
        h)     usageAndBail;;
        t)     terminateGuardian=true;;
        b)     batchMode=true;;        # never manually pass in this argument
        [?])   usageAndBail
       esac
done

# only output logging to standard error running from the command line
if test ${batchMode} = "false"
then
   loggerArgs="-s"
fi

# setup logging subsystem. using syslog via logger
logCritical="logger -t ${loggerTag} ${loggerArgs} -p local0.crit"
logWarning="logger -t ${loggerTag} ${loggerArgs} -p local0.warning"
logDebug="logger -t ${loggerTag} ${loggerArgs} -p local0.debug"

# delete all outstanding at jobs
deleteAllAtJobs ()
{
   for job in `atq -q ${atQueue} | cut -f1`
   do
      atrm ${job}
   done
}

# are we to terminate the guardian?
if test ${terminateGuardian} = "true"
then
   deleteAllAtJobs

   ${logDebug} "TERMINATING on user request"
   exit 0
fi

# check to see if a guardian job is already scheduled, return 0 if they are, 1 if not.
isGuardianAlreadyRunning ()
{
   # if there are one or more jobs running in our 'at' queue, then we are running
   numJobs=`atq -q ${atQueue} | wc -l`
   if test ${numJobs} -ge 1
   then
      return 0
   else
      return 1
   fi
}

# make sure that there isn't already an instance of the guardian running
# only do this for user initiated invocations.
if test ${batchMode} = "false"
then
   if isGuardianAlreadyRunning
   then
      ${logDebug} "guardian invoked but already running. doing nothing."
      exit 0
   fi
fi

# get the nth comma seperated token from the line, trimming whitespace
# usage getToken line tokenNum
getToken ()
{
   line=$1
   tokenNum=$2

   # get the nth comma seperated token from the line, removing whitespace
   token=`echo ${line} | cut -f${tokenNum} -d, | sed 's/^[ \t]*//;s/[ \t]*$//'`
}

# check http. get a page and look for a string in the result.
# usage: httpCheckImplementation sourceUrl checkString
httpCheck ()
{
   sourceUrl=$1
   checkString=$2

   wget -O - --quiet ${sourceUrl} | egrep -i "${checkString}" > /dev/null 2>&1
   httpCheckResult=$?
   if test ${httpCheckResult} -eq 0
   then
      ${logDebug} "PASS: found \"${checkString}\" in ${sourceUrl}"
   else
      ${logWarning} "FAIL: could NOT LOCATE \"${checkString}\" in ${sourceUrl}"
   fi

   return ${httpCheckResult}
}

# check to make sure that mysql is running
# usage: mysqlCheck connectString query
mysqlCheck ()
{
   connectString=$1
   query=$2

   # get the connect params from the connectString
   userAndPassword=`echo ${connectString} | sed "s/.*\/\/\(.*\)@.*/\1/"`
   mysqlUser=`echo ${userAndPassword} | cut -f1 -d:`
   mysqlPassword=`echo ${userAndPassword} | cut -f2 -d:`
   mysqlHost=`echo ${connectString} | sed "s/.*@\(.*\)\/.*/\1/"`
   mySqlDatabase=`echo ${connectString} | sed "s/.*@\(.*\)/\1/" | cut -f2 -d\/`

   mysql -e "${query}" --user=${mysqlUser} --host=${mysqlHost} --password=${mysqlPassword} --database=${mySqlDatabase} > /dev/null 2>&1
   mysqlCheckResult=$?
   if test ${mysqlCheckResult} -eq 0
   then
      ${logDebug} "PASS: executed \"${query}\" in ${mysqlHost}"
   else
      ${logWarning} "FAIL: could NOT EXECUTE \"${query}\" in database ${mySqlDatabase} on ${mysqlHost}"
   fi

   return ${mysqlCheckResult}
}

# check to make sure that a logfile is clean of critical errors
# usage: logfileCheck errorString logFile
logfileCheck ()
{
   logFile=$1
   errorString=$2
   logfileCheckResult=0
   marker="__guardian marker__"
   mark="${marker}: `date`"

   # make sure that the logfile exists
   test -r ${logFile} || { ${logCritical} "logfile (${logFile}) is not readable. CRITICAL GUARDIAN ERROR."; exit 1; }

   # see if we have a marker in the log file
   grep "${marker}" ${logFile} > /dev/null 2>&1
   if test $? -eq 1
   then
      # there is no marker, therefore we haven't seen this logfile before. add the
      # marker and consider this check passed
      echo ${mark} >> ${logFile}
      ${logDebug} "PASS: new logfile"
      return 0
   fi

   # pull out the "active" section of the logfile, i.e. the section between the
   # last run of the guardian and now i.e. betweeen the marker and the end of the file

   # get the last marker line number
   lastMarkerLineNumber=`grep -n "__guard" ${logFile} | cut -f1 -d: | tail -1`

   # grab the active section
   activeSection=`cat ${logFile} | sed -n "${lastMarkerLineNumber},$ p"`

   # check for the regexs the logFile's active section
   echo ${activeSection} | egrep -i "${errorString}" > /dev/null 2>&1
   if test $? -eq 1
   then
      ${logDebug} "PASS: logfile (${logFile}) clean: line ${lastMarkerLineNumber} to EOF"
   else
      ${logWarning} "FAIL: logfile (${logFile}) CONTAINS CRITICAL ERRORS"
      logfileCheckResult=1
   fi

   # mark the newly checked section of the file
   echo ${mark} >> ${logFile}

   return ${logfileCheckResult}
}

# restart deamon, not taking no for an answer
# usage: restartDamon executableName, initdName
restartDaemon ()
{
   executableName=$1
   initdName=$2
   restartScript="/etc/init.d/${initdName}"

   # make sure that the daemon executable is there
   test -x ${restartScript} || { ${logCritical} "restart script (${restartScript}) is not executable. CRITICAL GUARDIAN ERROR."; exit 1; }

   # try a polite stop
   ${restartScript} stop > /dev/null

   # get medieval on it's ass
   pkill -x ${executableName} ; sleep 2 ; pkill -9 -x ${executableName} ; sleep 2

   # restart the deamon
   ${restartScript} start > /dev/null

   if test $? -ne 0
   then
      ${logCritical} "failed to restart daemon (${executableName}): CRITICAL GUARDIAN ERROR."
      exit 1
   else
      ${logDebug} "daemon (${executableName}) restarted."
   fi
}

#
# things look good, let's do our checks and then schedule a new one
#

# make sure that the checkFile exists
test -r ${checkFile} || { ${logCritical} "checkfile (${checkFile}) is not readable. CRITICAL GUARDIAN ERROR."; exit 1; }

# loop through each of the daemons that need to be managed
for daemon in `cat ${checkFile} | egrep -v "^#.*" | cut -f2 -d, |  sed 's/^[ \t]*//;s/[ \t]*$//' | sort -u`
do
   # execute all the checks for the daemon in question
   cat ${checkFile} | egrep -v "^#.*" | while read line
   do
      getToken "${line}" 2 ; daemonName=${token}

      if test ${daemonName} = ${daemon}
      then
         # get the check definition
         getToken "${line}" 1 ; checkType=${token}
         getToken "${line}" 3 ; executableName=${token}
         getToken "${line}" 4 ; checkSource=${token}
         getToken "${line}" 5 ; checkParams=${token}

         # remove quotes
         checkSourceQuoteless=`echo ${checkSource} | sed "s/\"//g"`
         checkParamsQuoteless=`echo ${checkParams} | sed "s/\"//g"`

         # call the appropriate handler for the check
         ${checkType} "${checkSourceQuoteless}" "${checkParamsQuoteless}"

         if test $? -ne 0
         then
            ${logCritical} "CRITICAL PROBLEMS with deamon (${daemonName}), RESTARTING."
            restartDaemon ${executableName} ${daemonName}
         fi
      fi
   done
done

# delete all at jobs (race conditions)
deleteAllAtJobs

# schedule a new instance of this sucker
${logDebug} "scheduling another check to run in ${checkInterval} minutes"
at -q ${atQueue} now + ${checkInterval} minutes > /dev/null 2>&1 << EOT
$0 $* -b
EOT

Dec 17 2007
Dec 17

I am almost sorry to see winter break drawing near. For the past two weeks, I've had the privilege of introducing an amazing group of kids to open source software. Inspired by GHOP, Google's pilot Highly Open Participation contest, I've put together an extracurricular computer club for interested students at nearby Sandridge Elementary. We meet twice per week after school for an hour and a half.  I came into this with the slim hope that the school's new administration would let me shepard a couple of students through GHOP. Mr. Hollingsworth's (Sandridge's principal) and Dr. Sawyer's (Sandridge's superintendent) enthusiasm took me by surprise, and became a catalyst for the growth of a program that I hope will someday serve as a model for other schools.

Of the club's eight active participants (not counting occassional attendees), seven are trying their hands at GHOP projects, alongside high school students from around the world. Most have chosen to work on projects for Drupal, which makes for good crossover activities with our two under-13s who have taken charge of creating the club's web site.

So far, the biggest challenge for me is leading a group like this in an essentially unwired community. Students generally only have computer access at school. and few parents have regular access to email. Simple things like password resets can take days, and my students are at a disadvantage for things like GHOP. I must give them all credit, though -- despite never before being exposed to basics like FTP, they are all jumping in the deep end and making amazing progress.

I have been lucky enough to round up some great speakers and corporate donations. We are still looking for a speaker in a graphic design field (preferably someone familiar with GIMP, Inkscape, Blender, or other open source tools), and donations of hardware (especially laptops, thin clients, servers, USB keys, blank CDs, and a tablet). If you can help with any of this, please drop me a line.

The open source world is as much a true meritocracy as I think I'll ever see. No one cares who you are, where you are from, or what you have -- they just want to see your code. Anyone can do anything.

Dec 12 2007
dag
Dec 12

Today I experienced a problem with Drupal's one time URL behaviour when resetting your password. I am a fan of the way Drupal does this, which I think is much better than most sites are handling password resets.

However in this particular case Drupal's solution failed to work. What happens is that the transparent proxy is making a connection to the URL (that I got by mail) and then does a second request that is actually going back to the browser. In a log, it looks like this:


65.160.238.180 - - [12/Dec/2007:16:09:23 +0100] "GET /user/reset/1/1197472122/1a021e957a8040149660bcec8d77e3e5 HTTP/1.1" 200 2810 "-" "Mozilla/4.0"
125.24.198.3 - - [12/Dec/2007:16:09:24 +0100] "GET /user/reset/1/1197472122/1a021e957a8040149660bcec8d77e3e5 HTTP/1.1" 200 1278 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20071130 CentOS/1.5.0.12-7.el5.centos Firefox/1.5.0.12"

NOTICE: the source address is also different. I suspect that this is a seperate content-filter, but being ignorent about how the setup is done at this company I can only guess why this happens.

And the result, of course, is that the one time URL fails the second time for the user. It would be nice if somehow the one-time URL works for eg. 10 seconds after the first hit instead, although that is much harder to implement.

Dec 11 2007
Dec 11

the blessing and curse of cck is the ability to quickly create very complex node types within drupal. it doesn't take very long before the input form for a complex node type has become unmanageably long, requiring your user to do a lot of scrolling to get to the bottom of the form. the obvious solution is to break your form into multiple pages, but there is no easy way to do this. there do exist two proposed solutions to this, the cck wizard module and a drupal handbook entry. however, the well-intentioned cck wizard module doesn't seem to work, and the example code in the drupal handbook becomes tedious to repeat for each content type. to fill the void, i bring you cck witch

cck witch is based on the same premise as the handbook entry : the most natural way to divide a cck form into pages is to use field groups. from there, however, cck witch diverges, taking a relatively lazy, yet effective approach to the problem of multi page forms: on every page we render the entire form, but then simply hide the fields and errors that do not belong to the current step. it also offers an additional feature : when the form is complete and the node is rendered, an individual edit link is provided for each step - allowing the user to update the information only for a particular page in the form, without having to step through the entire wizard again.

if you've now read enough to be curious to see the goods, then please, be my guest and skip straight to the live demo.

the demo

in the demo, you will walk through a three step multi page cck form that invites you to specify your dream house. before proceeding to the next step in the form, the user must complete the required fields on the previous steps. on all steps other than the first step, the user may go back and edit their data for the previous step.

when the form is complete and the node is viewed, we add an edit link inside each field group. clicking this link allows the user to edit only the fields within that group, rather than requiring the user to step through the entire form again.

disclaimer

be warned, this is a pre-alpha release. also, this wizardly wonder is not meant for the drupal novitiate. before using it you must

  • patch drupal core, adding 4 lines to the form_set_error function.
  • override two form related theme functions
  • follow a simple set of conventions when configuring your cck content type

manual

step zero - download cck witch

get your copy of this pre-alpha release here

step one - patch drupal core

in forms.inc replace the form_set_error with the following method. this exposes an option to remove errors from the list. it also stops drupal from adding form errors to the drupal message list. do not perform this step without also performing step two. if you do, all form errors will mysteriously vanish.

function form_set_error($name = NULL, $message = '', $remove = FALSE) {
  static $form = array();
 
  if(!$remove) {
    // Set a form error
    if (isset($name) && !isset($form[$name])) {
      $form[$name] = $message;
    }
  }
  else {
    // Remove a form error
    if (isset($name) && isset($form[$name])) {
      unset($form[$name]); 
    }
  }
 
  return $form;
}

step two - override two form theme functions

next, you need to override the theme function for a form element to display the form error messages inline in your form, instead of in a big blob of messags at the top. this is a nice thing to do regardless of whether or not you want multi page cck forms. do this by overriding theme_form_element method and then adding the following in the location of your choice. (right at the bottom, immediately before the closing div will do fine.)   if($element['#parents']) {
    $form_element_error = form_get_error($element);
  }

  if ($form_element_error && $element['#type'] != 'radio') {
    $output .=' <div class="form-error">' . $form_element_error . "</div>\n"; 
  }

and, if you want all the buttons at the bottom of the form to line up nicely, override the theme_node_form method with the following

function theme_node_form($form) {
  $output = "\n<div class=\"node-form \">\n";
  $output .= "  <div class=\"standard\">\n";
  $output .= drupal_render($form);
  $output .= "  </div>\n";
  $output .= "</div>\n";
  return $output;
}

step three - configure your cck content type

when configuring your cck content type, create one group per page. you must name the groups "step 1", "step 2", etc. also, you must visit the display fields tab and submit the form there. you don't have to change anything, just submit the form. (this is clearly a cck bug, but we'll just work around it for now.)

see below for an example configuration:

step four - configure cck witch

finally, visit admin -> content -> multi-page form settings and set the number of pages for each cck content type. the cck witch module will only interact with those content types where the number of pages is greater than one.

future improvements

  • currently, cck witch presumes that your content type does not include a body field. complex cck node types rarely do. handling the body field is easy, it's just not obvious to me which page the body should appear on.
  • if there are other, non-cck options on your form (for example, the administrative meta tags or menu settings) these currently appear on all pages of the form. you can set them whenever you please. possibly, these should all be moved to an implied final page in the flow?
Dec 10 2007
Dec 10

This will be my first blog entry. In my blog I plan to talk about drupal, and technology. Drupal, for those of you who don't know is a Content Management System, or CMS. What this allows you to do is easily create automated websites which allow users to interact with one another, and for website owners, and users to post data without necessarily knowing html, php or any web language.

I happened upon the Drupal CMS a while back after I had gotten frustrated using the unorganized behemoth that is known as xoops. I had seen that this was the CMS which the getfirefox campaign had run off of, and I thought that the feature set was quite appealing as well.

I have now been designing drupal websites since version 4.7. What's great about these websites is they are great in scope and do very different things. I have created a movie website which allows users to rate movies and share lists of movies with one another. And websites which enable users to sign up to take tests and based on these tests, users are then able to check out equipment. And also blog websites, such as this one, project management websites, such as one I have built in order to manage a television show, and deeply integrated organizational websites which manages the everyday functions of a non-profit organization.

Without drupal none of this would have been possible, so thank you drupal for allowing me to create such an array of great websites.

Now aside from all this gloating, I'd like to share some of my expertise, some of my experiences, and some of the challenges that have come up during my work with the drupal platform. Because drupal really is more than a content management system but a platform which can deeply integrate with almost anything you throw at it.

My name is Joshua Ellinger, but my pseudonym goes back to a nickname I had about 10 years ago. I didn't pick it but it stuck. And then about 3 years ago I started an electronic music group of the same name, and decided to buy this domain name, when i got a free name at this host, well the name stuck, again, and even though electronic music is much more of an occasional hobby than a source of income, the domain is here to stay. So I occasionally get a lot of hits from myspace, and thats great. Extra web traffic for me.

Related Posts

We'll as a consultant I build webpages mostly, but because it is drupal, this is not average brochure website stuff. I actively build modules, and improve existing modules as well. Currently I am a project maintainer on drupal.org for the quiz module, what this means is that I do some active development on the module, as well as check other people's code for submission to the module.

Dec 07 2007
Dec 07

To patch your Drupal installation follow UPGRADING.txt up to and including

  • for Drupal 6: 5. Disable all custom and contributed modules.
  • for Drupal 7: 2. Go to Administration > Configuration > Development > Maintenance mode...

Now go on using this commands:

  • cd DRUPAL-ROOT
  • Dry run for testing without modifying anything: patch -p1 --dry-run < PATCHFILE
  • Do the real patching: patch -p1 < PATCHFILE

Your Drupal installation is now upgraded. proceed with UPGRADING.txt from

  • for Drupal 6: 9. Verify the new configuration file to make sure it has correct information.
  • for Drupal 7: 5. Re-apply any modifications to files such as .htaccess or robots.txt.

Note: Most important step after upgrading is to run update.php as described in UPGRADING.txt above.

Warning

If you get errors like Reversed (or previously applied) patch detected or 1 out of 2 hunks FAILED while running the patch dry run (second command above) immediately interrupt patching and upgrade following the steps explained in UPGRADING.txt.

Use this patch files on your own responsibility. I don't guarantee the proper function of the patch files on Drupal installations other than my own.

Note: If the patch process gets interrupted and leaves a mix of patched and unpatched files you may re-run it by ignoring already patched files after eliminating the reason of the interruption:

  • patch -p1 -N < PATCHFILE

You may savely remove reject files created during that process:

  • find . -name "*.rej" | xargs rm

Drupal 7

To verify the integrity of the patch files use this MD5 hashes:

MD5 (drupal-7.0-to-7.19.patch) = f71c39629e5fa3eb66a4b6c8adfa1b9d
MD5 (drupal-7.1-to-7.19.patch) = 947920d69672d2ffe33a25d40b61220a
MD5 (drupal-7.2-to-7.19.patch) = c39bcdf6633fcb31fb5d42588bf39e01
MD5 (drupal-7.3-to-7.19.patch) = a2a705bed1bf6a6d393bf3f14d9f9ee5
MD5 (drupal-7.4-to-7.19.patch) = 6008ebe577590a929ce252d763e6420e
MD5 (drupal-7.5-to-7.19.patch) = 6c7f60b2e5e16fb41207d1076215dc93
MD5 (drupal-7.6-to-7.19.patch) = 378230fb18c9e016bab23d4da91e255a
MD5 (drupal-7.7-to-7.19.patch) = 225b77751d4bf5e2f84cbee237bc42f6
MD5 (drupal-7.8-to-7.19.patch) = 4f2699f3e12cb8c2b2fb0e3395045bbc
MD5 (drupal-7.9-to-7.19.patch) = f3c10602d44a20a1095680ac3da503d6
MD5 (drupal-7.10-to-7.19.patch) = 7c89fec2e2ba751d942a7e08ef110104
MD5 (drupal-7.11-to-7.19.patch) = d0aecbd0111cc9b47827d48df3756019
MD5 (drupal-7.12-to-7.19.patch) = fe13eadd224b21b563d6307fd2d0a049
MD5 (drupal-7.13-to-7.19.patch) = e25aeebbf04cc1e57dad0f1fb38ab3e1
MD5 (drupal-7.14-to-7.19.patch) = 648bf12bbc425af3b28a4cc27b56a04a
MD5 (drupal-7.15-to-7.19.patch) = e8ef7662cb9408283c2c777cfbcfa3cd
MD5 (drupal-7.16-to-7.19.patch) = 735297fe64f89ed8d3a83c401d662fcc
MD5 (drupal-7.17-to-7.19.patch) = 0624db2ef8f9053481d2386cec2f86f3
MD5 (drupal-7.18-to-7.19.patch) = ca6220b6ffc6042972860ed7f6cd2348

Drupal 6

To verify the integrity of the patch files use this MD5 hashes:

MD5 (drupal-6.0-to-6.28.patch) = 71454c94fb277f67e5f583197e529ae5
MD5 (drupal-6.1-to-6.28.patch) = 03983fde932b427ff42feb8a37ec7f8c
MD5 (drupal-6.2-to-6.28.patch) = 1f6fe2c8e3a06f04dc7fed509ca09d65
MD5 (drupal-6.3-to-6.28.patch) = b84157d7f039532e0625b3ee475cdab5
MD5 (drupal-6.4-to-6.28.patch) = 2940f854d0d426d368dfc87ab8aae51a
MD5 (drupal-6.5-to-6.28.patch) = cc66ab5088a31efd0cbc30cc5410e269
MD5 (drupal-6.6-to-6.28.patch) = 1a3512977fcfbaa5441b6fef77b6daa4
MD5 (drupal-6.7-to-6.28.patch) = 1ec01a6a9ba330e7ce688b9cd3fc241f
MD5 (drupal-6.8-to-6.28.patch) = 67873c920f69c7ffe8e0c3dab2bf8cf8
MD5 (drupal-6.9-to-6.28.patch) = e5516f900e90a58b60eb0102c72f0f83
MD5 (drupal-6.10-to-6.28.patch) = 224958120c7ea7f4a0b5c1c7af0de02a
MD5 (drupal-6.11-to-6.28.patch) = 2820dd6fbab3349d59a3046ab5d8d852
MD5 (drupal-6.12-to-6.28.patch) = 1bfd1dfe72dd6c7aa3716d5cc179e6e0
MD5 (drupal-6.13-to-6.28.patch) = b25cf8f848c39ab1f3b107642a928eb0
MD5 (drupal-6.14-to-6.28.patch) = 7bf9fedc6ef6cd4c6199f3219ca71ddb
MD5 (drupal-6.15-to-6.28.patch) = 309ad484f2821d59fc366bc24aeb7f14
MD5 (drupal-6.16-to-6.28.patch) = d625f3212a540a3c583ca27682f82f46
MD5 (drupal-6.17-to-6.28.patch) = 6a2610f7b6b93db37a149dcfbbd14932
MD5 (drupal-6.18-to-6.28.patch) = 898f9f387421f61ae4563e061fde0dc2
MD5 (drupal-6.19-to-6.28.patch) = 17bb24f097f6f128565a46c9daa0cc07
MD5 (drupal-6.20-to-6.28.patch) = 82f31b62d7dd07b70d3b20625cf64dbc
MD5 (drupal-6.21-to-6.28.patch) = dcfcbaf76ec3a13b1ff4bd4a648b451e
MD5 (drupal-6.22-to-6.28.patch) = d82ae95e6090819d281cdd29f039b46d
MD5 (drupal-6.23-to-6.28.patch) = c2d6a4d178a69edf1229a82d3c510cb6
MD5 (drupal-6.24-to-6.28.patch) = f30eaafbf031a92c1ae66cc15fc76250
MD5 (drupal-6.25-to-6.28.patch) = ac79292f71075185dc5f6b35d50d52ed
MD5 (drupal-6.26-to-6.28.patch) = 1557f233510a1c4694abe2ba6e084da3
MD5 (drupal-6.27-to-6.28.patch) = bf1078bd6231f896ae41f8e117f6ca34

Drupal 5

To verify the integrity of the patch files use this MD5 hashes:

  MD5 (drupal-5.0-to-5.23.patch) = 13235f0c50caf2f0366403563053fbba
  MD5 (drupal-5.1-to-5.23.patch) = e2d5fc4ec6da1f1db2f83204eef03160
  MD5 (drupal-5.2-to-5.23.patch) = 13da34e36fb58f422c86c1574e26719b
  MD5 (drupal-5.3-to-5.23.patch) = dd826e692ab5e9e50ce55feac0b82673
  MD5 (drupal-5.4-to-5.23.patch) = e44a1f00549c5d39bbe359772db4ec9d
  MD5 (drupal-5.5-to-5.23.patch) = 4f06344f52f3c476e458f01c4925e987
  MD5 (drupal-5.6-to-5.23.patch) = bd31200144a9b716e4a1cad1930796f1
  MD5 (drupal-5.7-to-5.23.patch) = 598e037a8840d79509ea5c247dff975e
  MD5 (drupal-5.8-to-5.23.patch) = 2a53dedeb3b00c679ccb0dae44379789
  MD5 (drupal-5.9-to-5.23.patch) = 8dd63096cf7c5dd73e968f770f56301b
  MD5 (drupal-5.10-to-5.23.patch) = a6127a53d945659efde17a31e8037b79
  MD5 (drupal-5.11-to-5.23.patch) = 841eabce62cac99f98e77de733aeb7c6
  MD5 (drupal-5.12-to-5.23.patch) = ddb82f96ad7915e34111df4706237c11
  MD5 (drupal-5.13-to-5.23.patch) = 41bf265e25a1d6c9324e4f6c7b5ff067
  MD5 (drupal-5.14-to-5.23.patch) = 7c48dca7dd10533fe65c895d33c7be56
  MD5 (drupal-5.15-to-5.23.patch) = ae1a31e80c3b24dfa1710adecbd1cce9
  MD5 (drupal-5.16-to-5.23.patch) = 72e25e1c680b75cbc1f8b303c4d97cba
  MD5 (drupal-5.17-to-5.23.patch) = 0fee19e0808ec863284618ce8f506d6c
  MD5 (drupal-5.18-to-5.23.patch) = 997f35d8372277203e5129e9bc684f81
  MD5 (drupal-5.19-to-5.23.patch) = 6189d7c3c3139647dfd519b727d8f12f
  MD5 (drupal-5.20-to-5.23.patch) = 33d48157e036411fd336a5d9023c8644
  MD5 (drupal-5.21-to-5.23.patch) = 86cb8be7e01f576177765d670332e4fb
  MD5 (drupal-5.22-to-5.23.patch) = 94488c667c2c68d48438d81129e3edca

Upgrade major releases of Drupal

Experimental

Below are patch files to upgrade major releases of Drupal. Please try with caution.

To verify the integrity of the patch files use this MD5 hashes:

MD5 (drupal-6.28-to-7.19.patch) = d2af47458336563b9b6ecc284f4d887b
MD5 (drupal-5.23-to-6.28.patch) = 6c9caae0399f5f47922da6825059b5ed

Dec 04 2007
jh
Dec 04

Planet is some RSS aggregator written in Python we're using over at Planet Inkscape. But maybe I should start at the beginning.

Lots of weird requests which resulted in 404s showed up in the log. Things like:

XX at http:/kaioa.com
XX at
XX
XXathttp:/kaioa.com

Where "XX" stands for the node id. E.g. "36" for this blog post.

After investigating it for a bit I found the shocking reason behind this. Well, not really that shocking... it's more on the silly side, really. ;)

My RSS 2.0 feeds look like this:

[...]
<link>http://kaioa.com/node/36</link>
[...]
<guid isPermaLink="false">36 at http://kaioa.com</guid>
[...]

The RSS 2.0 feeds from Planet look like this, however:

[...]
<guid>http://kaioa.com/36 at http://kaioa.com</guid>
<link>http://kaioa.com/node/36</link>
[...]

As you can see the isPermaLink attribute is missing. If it's missing it defaults to true, which in turn causes other readers/aggregators to treat that guid as URL. Ironically Planet does interpret that attribute for itself, but strips it from its own feeds.

isPermaLink="false" is used by Drupal and and WordPress. However, it only negatively affects Drupal's feeds, because WordPress' feeds happen to use guids which are identical to link. But that isn't a given and may change at some point in the future (well, it's unlikely).

Either way it's totally Planet's fault. I tried to track down the issue, but Planet's source is pretty hard to follow. Additionally "rss20.xml.tmpl" and the template stuff in general lack support for the isPermaLink attribute, which means that fixing it won't be that easy.

If you're wondering why I'm blogging about this instead of posting it on Planet's bug tracker... well, they don't have one. D'oh. I already contacted one of the authors, but so far I got no reply.

Nov 29 2007
Nov 29

using the term "content management system" to describe the drupal cms understates it's full potential. i prefer to consider drupal a web-application development-system, particularly suitable for content-heavy projects.

what are the fantastic four?

drupal's application development potential is provided in large-part by a set of "core" modules that dovetail to provide an application platform that other modules and applications build on. these modules have become a de-facto standard: drupal's fantastic four. our superheros are cck, views, panels and cck field types and widgets. if you are considering using drupal to build a website of any sophistication, you can't overlook these. note that cck field types and widgets isn't a real module, but rather a set of related modules.

flying with the four

getting a feel for how these modules work and interact isn't trivial, so i'll give you a brief introduction to the super-powers of each of them, and then take you step-by-step through an example, with enough detail that you can easily get it working on your system. or, if you want to see a professional implementation built on the same principles, check out the zicasso photo competition.

meet our heros

the content construction kit or as it's more commonly referred to, cck, provides point-and-click attribute extensibility to drupal's content-types. for example, if you site is about photography, you could define a type of page on your site called "photograph" and then add typed attributes to it, shutter-speed (integer), flash (boolean) etc. cck then automagically creates forms for you (or your users) to create and edit these types of pages, providing suitable validation, gui controls etc.

the cck fieldtype modules each define a new type of field that can be used in your cck content types. one example is the imagefield module, allowing your cck types to have fields of type image. this allows your "photograph" page to contain the actual photograph itself. there are many more types that you can find in the cck modules download area.

the views module allows simple point and click definition of lists of drupal nodes, including your cck nodes. you can control not only what is in the list, but how the list is displayed, including sorting, pagination etc. these lists can be conveniently displayed as blocks, full blown pages or even rss feeds. for example, you could define a list of photographs that had been highly rated by users on your photography site.

the panels module allows you to create pages divided into sections, each section containing a node, block, view or any custom content. so without any knowledge of html or css you can create complicated and powerful layouts. for example, you could create a page with two views, one showing a list of recently submitted photographs and one showing a list of highly ranked photographs. this module is currently undergoing a huge facelift and panels2 is in alpha at the time of writing

an example

to illustrate how the fantastic four can be put to good use, let's continue with our photography theme and create a simple photo-competition application. this application (shown to the right) allows the creation of a simple photo competition entry using a form. the main page shows two lists, one of recent entries and of "featured" entries. the application also has a detail page for each photograph where anonymous users can leave comments.

step one - install the modules

i'm going to assume that you've got a basic drupal install up-and-running. if you haven't, please refer to one of my previous blogs, easy-peasy-lemon-squeezy drupal installation on linux. once you've done this, you should install 6 modules. cck, views, panels2, imagefield, email field and imagecache. on linux, you can do this as follows. cd to your drupal directory (the one containing cron.php etc.), create the directory sites/all/modules if necessary, and download the modules:

# wget http://ftp.drupal.org/files/projects/panels-5.x-2.0-alpha14.tar.gz \
http://ftp.drupal.org/files/projects/views-5.x-1.6.tar.gz \
http://ftp.drupal.org/files/projects/cck-5.x-1.6-1.tar.gz \
http://ftp.drupal.org/files/projects/imagefield-5.x-1.1.tar.gz \
http://ftp.drupal.org/files/projects/imagecache-5.x-1.3.tar.gz \
http://ftp.drupal.org/files/projects/email-5.x-1.x-dev.tar.gz

then unzip them and set the permissions properly:

# for file in *.gz; do tar xvfz $file; done
# chown -R www-data.www-data *

now to to the administrative interface, http://example.com/drupal/admin/build/modules and enable the modules in question.

finally, now go to http://example.com/drupal/admin/user/access and grant access to the panels and views module features to the role you are using e.g. "access all views" to "authenticated user" and "administer views" to your "developer" or "admin" roles. also grant "post comments without approval" and "post comments" and "access comments" to the anonymous user.

note we're using the alpha panels version, panels2. it's not quite ready for prime time, but it's hard to resist. it kicks ass.

step two - create a new content type

now it's time to create a new content type. navigate to the content types page at http://example.com/drupal/admin/content/types, and create the "photo competition entry" as shown below.

now let's add two new custom fields to our photo competition type: email and photograph. these fields make use of the new cck field type modules we just installed.

create the email field as follows:

create the photograph field as follows:

now go to http://example.com/drupal/admin/user/access and allow anonymous users to "create photo_entry content" and "edit own photo_entry content"

step three - setting our themes

because i'm bored with garland, let's change the default theme to "minnelli" in http://example.com/drupal/admin/build/themes, change the administratin theme http://example.com/drupal/admin/settings/admin back to garland.

step four - create some content

now that we've defined our new content type, we can go ahead and create some new content. navigate to http://[...]/node/add/photo-entry and fill out a few entries. you can see your new create form in action, complete with validation (shown to the right).

it's best to do this as the anonymous user to see the usual user experience. it's convenient to stay logged in as admin and use another browser e.g. internet explorer (bleah) for your regular (anonymous) user.

step five - configure imagecache

the imagecache module allows you to define an arbitrarily large number of image transformations (presets) including scaling, resizing and cropping. let's define two transformations, one preview to create a 200px wide scaled down preview. the second transformation, thumbnail is slightly more complex, and creates a square image, 120px by 120px that is a scaled, centered crop of the original. rockin.

create the thumbnail preset as follows:

create the preview preset as follows:

you should now be able to test your presets with the content you created e.g. if you uploaded an image called myImage.jpg, you can view your transformed images at:

step six - create our views

the views module allows you to create lists of nodes. we're going to create two views:
  1. recent_photo_entries, a list of the five most recently submitted entries. the list shows a thumbail of the image and the email address of the creator.
  2. featured_images, a list of the two most recently commented on images. this list shows a preview of the image, the image title and the email address of the creator.

create the recent view as follows:

create the featured view as follows:

step seven - create the panel page

the last step is to create the panel page to host our content and views. go to http://example.com/drupal/admin/panels/panel-page and create a new "two column stacked" layout, as shown below:

put custom content in the top panel, your recent view in the left panel and the featured view in the right panel. for the views, be careful to select a "view type" of block.

the following image shows the custom content you should create in the top panel:

the final image shows the configuration screen for the recent view (left panel). the right panel is very similar:

finally go to the "site information" administrative section: http://example.com/drupal/admin/settings/site-information and set your new panel as the home page i.e. put "photo-competition" in the default front page box.

you are done and your site should look something like:

further work

there is a lot that you could simply do to enhance this example, for example:
  • installing the jrating or fivestar module and allowing users to vote on photographs using a nice javascript control.
  • creating a view that implements an rss feed for photo competition entries.
  • using css to style your views and nodes.

check out a professional drupal photo competition based on these same principles at zicasso

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 27 2007
Nov 27

A little while ago, I ran into a problem with a Drupal site I was working on. When the client administrator visited a page, they'd have the option to View or Edit the content. The problem was that they wanted "Edit" to display something other than "Edit".

Although this seems like an easy thing to do, it turned into a huge search and a waste of two hours of my time. I got fed up and violated rule #1 of Drupal, I opened up its files and changed "Edit" to the new string. This turned into a huge mess later on when I updated the system to Drupal 5.3. The right way to do this is to create a new mock-up-English language, and then translate individual strings to whatever you wanted. Even this seems kind of overkill for what you're doing, so there's a better way over doing it in Drupal 6...

Drupal 6 comes with the ability to override and string using settings.php. All you have to type in is:

$conf['locale_custom_strings_en']['My String'] = 'My New String';

This would, evidently, replace all instances of "My String" to "My New String", in the English language. This is a very handy tool, and was missing a user interface. That's where String Overrides came into play.

String Overrides provides a user interface to easily create replacement strings for existing content. The idea came from, not only my horrible "Edit" experience, but also the Lullabots, mentioning this new functionality in Drupal 6.

Drupal is extremely expandable, flexible and customizable, so there is absolutely no reason why you shouldn't abide to Rule #1: Do not hack core.

Nov 23 2007
Nov 23

We interviewed Drupal core developer and Lullabot team member Angela Byron, apropos of the upcoming O'Reilly Drupal Book titled 'Practical Drupal'.

You can visit Angela Byron's drupal.org profile by clicking here.

In this interview, you will learn about the upcoming Drupal book, and how to learn Drupal most effectively.

How did you become a member of the Drupal community? How did you get to know Drupal?

I was a Google Summer of Code student for Drupal back in 2005, and developed the Quiz module for Drupal 4.7. I hadn't even installed Drupal before that (I had only vaguely heard of it because of the SpreadFirefox.com project), so I needed to jump up a pretty steep learning curve very, very quickly -- my project had to be completed in only 2 months, which meant I needed to not only understand Drupal but also its APIs, how the hook system worked, what the heck CVS was, the whole shebang.

Hands-down, the only way this was even possible was for me to dig in and get involved in the community. It sounds a bit backwards -- how can I get involved in the community when I don't know anything yet? -- but it really worked for me.

  • I would idle in #drupal-support and on the forums, reading the questions people posted, and then try and figure them out. This exposed me to a variety of Drupal modules, administration areas, and problems I was likely to encounter, right away.
  • I also hung out on the Drupal issue queue and in #drupal, taking the same approach of looking for things I could possibly help with and digging in and trying to figure them out. Like with support questions, this exposed me to much of the Drupal API and internals very quickly.
  • Every time I came across something that wasn't documented and I had to figure out myself, I'd write it up in a handbook page. This both cemented the knowledge in my head (since I had to explain it well enough that other people would get it), and made it so that I'd never have to figure that stuff out again. ;)

These things together accomplished what was the most important thing for launching me on my way up the learning curve: they established me as a contributor to the project, rather than a user. This meant that people would spend a lot more time helping me when I had a question, because they knew that the knowledge imparted would end up funneled back into the project in some way. The contribution aspect also made learning Drupal a lot more fun (almost addictive), as I felt that with every new thing I learned, I gained more power to improve things, and I was also making lots of new friends along the way. :)

What major roles are you taking in the community at the moment?

Let's see... I code and test/review patches, I develop modules and themes, I do community outreach kinds of stuff, some graphic design and usability stuff, training and developer/user support, I'm on the site admin team, the documentation team, the security team, and the Drupal Association Board of Directors. Basically, if there's a way to beinvolved in Drupal, I'm doing it, or at least trying. :)

Of those, the two biggest general roles I guess would be quality assurance for Drupal core (I'm that annoying person who finds bugs inperfectly good patches, and chimes in about missing documentation or lack of coding standards ;)), and organizing various efforts that help get new contributors involved, such as Drupal's participation in Google Summer of Code.

What topics are to be dealt with in your upcoming book and from what aspects? Who will be the authors?

Previous Drupal books have dealt with core, either from a super beginner standpoint or from a super developer standpoint. Practical Drupal will be aiming at the middle segment: people who already somewhat familiar with Drupal (though there's the token chapter for those who aren't) and want to know how to extend it with the rich library of contributed modules. It's a hands-on recipe-driven book, showcasing various contributed modules in each chapter like CCK, Views, and Organic Groups, and how to combine them together in order to solve “real world” problems.

Almost all of the Lullabot team is co-authoring the book: Nate Haug, Addison Berry, James Walker, Jeff Robbins, Jeff Eaton, and myself, along with Robert Douglass and Matt Westgate acting as technical authors. We each have expertise in different parts of Drupal and the goal is to combine that collective experience together in one place.

What will be the level of difficulty of the book? Will it be appropriate for beginners too or only for advanced people?

This book is mainly geared towards beginner-to-intermediate Drupal folks, but there are some developer tips and tricks, too. The subject matter is of interest to pretty much everyone though, since the book intends to answer the question, “What modules should I use to do X?” which everyone from absolute newbies to super hackers need to know.

How many pages will the book have?

We're shooting for around 500. Big enough to fend off intruders with a good thwap to the head, while small enough to carry around in a backpack without a great deal of aches and pains. :)

When will it be published? Is there a possibility to order it in advance?

Our final deadline is summer of 2008, though we're hoping to get the book finished sooner than that. I believe it'll go to print a month or two after O'Reilly receives the final manuscript. I'm not sure if it's possible to order in advance, but I'd suggest keeping an eye on The Lullabot Blog where we'll be posting updates as we know more.

Are you planning the actualisation of your book titled 'Pro Drupal Development' to the Drupal 6?

That's actually not my book, that's Matt Westgate's book. ;) But I spoke to him and he said that it's still a bit up in the air whether or not there'll be a Drupal 6 version.

If someone will start learning Drupal now what method and sources would you recommend for them? How much time is it to get to a level where one can take easier Drupal tasks? And what sources would you recommend to an advanced developer who want to get to a higher level?

For developers with a PHP background, I would definitely recommend Pro Drupal Development. This book does a tremendous job of imparting architectural things that are very hard to grasp otherwise. The api.drupal.org site is also invaluable.

For new users, there's the new “Getting Started” guide in the Drupal handbooks which is a nice, concise collection of all the stuff you need to know to start understanding how Drupal works. The handbook in general has some great information in it, though sometimes you have to hunt for it a bit.

As far as a time line for learning all this goes, it's really up to the individual, what previous experience they have, and what they are trying to do with Drupal. I think most people spend a few weeks being really frustrated before they get a nice “ah-HA!” moment and start understanding it and getting excited.

But I can guarantee that whatever your personal time line for learning is, getting involved in the community will shorten it dramatically. See question #1 for tips. ;)

Have videorecordings been made of the Lullabot trainings that had been held earlier? If not, are you planning to produce such videorecordings? The number of the participants of the course is limited but anyone could get access to the recording.

We've had video cameras at our workshops before, but the thing is that watching 60+ hours' worth of video from one single vantage point at the back of the room is not quite as fun, nor as educational, as you'd ideally like to think. :) Training DVDs that are more condensed versions of stuff that the workshops cover are on our radar, however.

What is your favourite new feature in Drupal 6?

Wow, this one is hard. But I guess I'd have to say the new Schema API. This both opens the door for contributed modules to be used with multiple database platforms with minimal effort by the maintainer (no more messy code in install hooks that checks if the database type is pgsql and then runs a different CREATE TABLE statement.. yuck!). And because we now have meta data about tables, we were able to document the entire Drupal 6 database schema right in core, which means we can auto-generate documentation which will greatly increase developer understanding of the internal workings of Drupal going forward.

Thank you very much for the interview! I hope you will remain a member of the Drupal community for a long time.

Thank you! And yep, I'm not planning on going anywhere until they get sick of me. ;)

Nov 22 2007
Nov 22

The press release announces it and the announcement over at my company's website confirms it: starting January 1st, I'm going to be an employee of Raincity Studios. On Tuesday RCS acquired Bryght, the Vancouver-based Drupal-based hosting and hosted service company I've been working for since 2004. My role, currently as community support guy, will change slightly, details we'll work out in the following days and weeks. I'm excited and nervous at the same time, both usual for me when change like this happens. To say this came at the right time for me, however, is an understatement.

It maybe took a little longer than it needed to, but it really hit me how impressive their design chops were when Mark Yuasa's offered to redesign my blog back in 2005. 2005? Seems longer ago. I remember meeting him at BCIT and going through what theme I wanted for the look. Since it was around March, cherry blossom trees started losing their petals and the smell and sight of pink leaves all over the streets of the Lower Mainland filled my senses. With a little trepidation—pink not being the manliest colour—that I asked him use that as the concept, and his two original concepts floored me. One, while beautiful, was a little too white for what I thought of, but the other, overwhelmingly pink design made the choice obvious. I was impressed with his holistic approach (he asked me to write down what music I liked as part of the design consideration) and his attention to detail and his flexibility in the changes I requested. I've since reverted back to a default theme for the site (changing the colours to match the previous look), and I've committed to releasing the theme to the Drupal community.

Raincity is cataloging the social web's reaction to their announcement, and if you visit that link, you'll get to see me a little more than halfway down at the Cambie Pub, during my first week officially working for Bryght, in September 2004. That's a fairly iconic photo of me, so much so that Karen has taken to calling me a "support cowboy". Think I can get away with calling myself that? Probably not: I'm the strong, silent type, and besides, I don't like horses that much. I'm still going to celebrate by buying another cowboy shirt.

Nov 19 2007
Nov 19

I've made a complete migration from Drupal 5 to Drupal 6 with my website. You'll see a lot of changes on here and overall I think it's a fantastic change and will last a lot longer in the end.

The primary motivation for me to do this was that I didn't like how dependent it was on contributed modules. When the beta versions of Drupal 6 came out, I wanted to make the upgrade, but couldn't since I was using a hilarious amount of contributed modules. These contributed modules, of course, had not made the update to 6. Now I'm just using Drupal core, which will make it much easier to update to future versions of Drupal as contributions will always be behind.

The majority of the features in Drupal 6 are in the administration and its API, so you guys won't really see anything new by visiting this site. The things you will notice, however, are new looks for the articles and projects sections, the gallery now aggregates from Flickr, and the links page aggregates from Delicious. As more contributed modules get ported to Drupal 6, more and more things will appear. I'm really liking this release so far.

Nov 17 2007
Nov 17

So, I really wanted to try and get into the world of testing patches in order to help get Drupal 6 to the next beta stage but I was in a world of unknown. The root of the problem was the procedure that was needed in order to apply a patch to a file or group of files. See, I am using a windows machine and by default the Linux bash commands are not available. The patch procedures were showing “patch filename < patchname” and this was not going to work on my machine.

Luckily I found the http://drupal.org/node/60179 which talked about just this problem. I read through this page and decided to try out the Cygwin solution. This basically emulates a Linux environment on my window machine. This will be my solution until I am able to gather enough cash for my mac book pro! I went to the http://www.cygwin.com/ website and clicked on the install or update now link in order to put the setup.exe file on my desktop.

I followed the steps on the http://drupal.org/node/32875 node and everything was pretty easy. The gist of what happened is that on the root of my windows there is now a cygwin folder that houses a Linux directory structure. The install created a shortcut to a “terminal window” that accesses this directory structure in a Linux emulated terminal window. I did a cvs checkout of the drupal head into the c:/cygwin/var/www/drupal using Tortoise CVS (http://www.tortoisecvs.org/) and now I have a fresh copy of Drupal Head version.

For fun I edited the UPGRADE.txt file and then used the right click tortoiseCVS menu to create a cvs patch file in the same directory. I then un-did the UPGRADE.txt changes to get the file back to its original state. Remember, I have my changes in my patch file. Then I used the Cygwin terminal shortcut to traverse to /var/www/drupal and was able to run the linux patch command of “patch UPGRADE.txt < UPGRADE.txt.patch”. Worked like a charm! Now I am going to test some patches and see if I can help out the Drupal cause, just a bit more.

Nov 15 2007
Nov 15

if you've setup a clustered drupal deployment (see scaling drupal step three - using heartbeat to implement a redundant load balancer), a good next-step, is to scale your database tier.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice scalability: good redundancy: fair ease of setup: poor

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 data server drupal-data-server1.mydomain.com192.168.1.26 data server drupal-data-server2.mydomain.com192.168.1.27 data server drupal-data-server3.mydomain.com192.168.1.28 mysql load balancer mysql-balance-1.mydomain.com192.168.1.94

first steps first - optimizing your database and application

the first step to scaling your database tier should include identifying problem queries (those taking most of the resources), and optimizing them. optimizing may mean reducing the volume of the queries by modifying your application, or increasing their performance using standard database optimization techniques such as building appropriate indexes. the devel module is a great way to find problem queries and functions.

another important consideration is the optimization of the database itself, by enabling and optimizing the query cache, tuning database parameters such as the maximum number of connections etc. using appropriate hardware for your database is also a huge factor in database performance, especially the disk io system. a large raid 1+0 array for example, may do wonders for your throughput, especially combined with a generous amount of system memory available for disk caching. for more on mysql optimization, take a look at the great o'reilly book by jeremy zawodny and derek balling on high performance mysql.

when it's time to scale out rather than up

you can only (and should only) go so far scaling up. at some point you need to scale out. ideally, you want a database clustering solution that allows you do exactly that. that is, add nodes to your database tier, completely transparently to your application, giving you linear scalability gains with each additional node. mysql cluster promises exactly this. it doesn't offer full transparency however, due to limitations introduced by the ndb storage engine required by mysql cluster. having said that, the technology looks extremely promising and i'm interested if anyone has got a drupal application running successfully on this platform. you can read more on mysql clustering on the mysql cluster website or in the the mysql clustering book by alex davies and harrison fisk.

less glamorous alternatives to mysql cluster

without the magic of mysql cluster, we've still got some, admittedly less glamorous, alternatives. one is to use traditional mysql database cluster, where all writes go to a single master and reads are distributed across several read-only-nodes. the master updates the read-only-nodes using replication.

an alternative is to segment read and write requests by role, thereby partitioning the data into segments, each one resident on a dedicated database.

these two approaches are illustrated below:

there are some significant pitfalls to both approaches:

  • the traditional clustering approach, introduces a replication lag i.e. it takes a non-trivial amount of time, especially under load, for writes to make it back to the read-only-nodes. this may not be problematic for very specific applications, but is problematic in the general case
  • the traditional clustering approach scales only reads, not writes, since each write has to be made to each node.
  • in traditional clustering the total effective size of your memory cache is the size of a single node (since the same data is cached on each node), whereas with segmentation it's the sum of the nodes.
  • in traditional clustering each node has the same hardware optimization pattern, whereas with segmentation, it can be customized according to the role it's playing.
  • the segmentation approach reduces the redundancy of the system, since theoretically a failure of any of the nodes takes your "database" off line. in practice, you may have segments that are non essential e.g. logging. you can, of course, cluster your segments, but this introduces the replication lag issue.
  • the segmentation approach relies on a thorough understanding of the application, and the relative projected load on each segment to do properly.
  • the segmentation approach is fundamentally very limited, since there are a limited number of segments for a typical application.

more thoughts on database segmentation

from one perspective, the use of memcache is a database segmentation technique i.e. it takes part of the load on the database (from caching) and segments this into a specialized and optionally distributed caching "database". there is a detailed step-by-step guide on lullabot on doing this on debian etch and drupal module.

you can continue this approach on other areas of your database, dedicating several databases to different roles. for example, if one of the functions of your database is to serve as a log, why not segment all log activity onto a single database? clearly, it's important that your segments are distinct i.e. that applications don't need joins or transactions between segments. you may have auxiliary applications that do need complex joins between segments e.g. reporting. this can be easily solved by warehousing the data back into a single database to serve specifically this auxiliary application (warehousing in this case).

while i'm not suggesting that the next step in your scaling exercise necessarily should be segmentation, this clearly depends on your application and preferences, we're going to explore the idea anyway. it's my blog afterall :)

what segmentation technologies to use?

there are several open source tools that you can use to build a segmentation infrastructure. sqlrelay is a popular database-agnostic proxying tool that can be used for this purpose. mysql proxy is, as the name suggests, a mysql specific proxying tool.

in this article i focus on mysql proxy. sqlrelay (partly due to it's more general purpose nature) is somewhat difficult to configure, and inherently less flexible than mysql proxy. mysql proxy on the other hand is quick to setup and use. it has a simple, elegant and flexible architecture that allows for a full range of proxying applications, from trivial to uber-complex.

more on mysql proxy

jan kneschke's brainchild, mysql proxy is a lightweight daemon that sits between your client application (apache/modphp/drupal in our case) and the database. the proxy allows you to perform just about any transformation on the traffic, including segmentation. the proxy allows you to hook into 3 actions; connect, query and result. you can do whatever you want to in these steps, manipulating data and performing actions using lua scripts. lua is a fully featured scripting language, designed for high performance. clearly a key consideration in this application. don't worry too much about aFsc (another scripting language). it's easy to pick up. it's powerful and intuitive.

even if you don't intend to segment your databases, you might consider a proxy configuration for other reasons including logging, filtering, redundancy, timing and analysis and query modification. for example, using mysql proxy to implement a hot standby database (replicated) would be trivial.

the mysql site states clearly (as of 09Nov2007); "MySQL Proxy is currently an Alpha release and should not be used within production environments". Feeling lucky?

a word of warning

the techniques described below, including the overall method and the use of mysql proxy, are intended to stimulate discussion. they are not intended to represent a valid production configuration. i've explored this technique purely in an experimental manner. in my example below i segment cache queries to a specific database. i don't mean to imply that this is a better alternative to memcache. it isn't. anyway, i'd love to hear your thoughts on the general approach.

don't panic, you don't really need this many servers

before you get yourself into a panic over the number of boxes i've drawn in the diagram, please bear in mind that this is a canonical network. in reality you could use the same physical hardware for both loadbalancers, or, even better, you could use xen to create this canonical layout and, over time, deploy virtual servers on physical hardware as load necessitated.

down to business - set up and test a basic mysql proxy

o.k., enough of the chatter. let's get down to business and setup a mysql proxy server. first, download and install the latest version of mysql proxy from http://dev.mysql.com/downloads/mysql-proxy/index.html.

tar xvfz mysql-proxy-0.6.0-linux-debian3.1-x86.tar.gz

make sure that your mysql load balancer can access the database on your data server i.e. on your data server, run mysql and enter:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE
TEMPORARY TABLES, LOCK TABLES
ON drupaldb.*
to [email protected]'192.168.1.94' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

check that your load balancer can access the database on your data server i.e. on your load balancer do:

# mysql -e "select * from users limit 1" --host=192.168.1.26 --user=drupal --password=password drupaldb

now do a quick test of the proxy, run the proxy server, pointing to your drupal database server:

./mysql-proxy --proxy-backend-addresses=192.168.1.26 &

and test the proxy:

echo "select * from users" |  mysql --host=127.0.0.1 --port=4040 --user=drupal --password=password drupaldb

now change your drupal install to point at the load balancer, rather than your data server directly i.e. edit your settings.php on your webserver(s) and point your drupal install to the mysql load balancer, rather than at your database server:

$db_url = 'mysql://drupal:[email protected]:4040/drupaldb';

asking mysql proxy to segment your database traffic

the best way to segment a drupal databases depends on many factors, including the modules you use and the custom extensions that you have. it's beyond the scope of this exercise to discuss segmentation specifics, but, as an a example i've segmented the database into 3 segments, a cache server, a log server and a general server (everything else).

to get started segmenting, create two additional database instanaces (drupal-data-server2, drupal-data-server3), with a copy of the data from drupal-data-server3. make sure that you GRANT the mysql load balancer permission on to access each database as described above.

you'll now want to start up your proxy server, pointing to these instances. below, i give an example of a bash script that does this. it starts up the cluster and executes several sql statements, each one bound for a different member of the cluster, to ensure that the whole cluster has started properly. note that you'd also want to build something similar as a health check, to ensure that they kept functioning properly and stopping the cluster (proxy) as soon as a problem was detected.

here's the source for runProxy.sh:

:
BASE_DIR=/home/john
BIN_DIR=${BASE_DIR}/mysql-proxy/sbin

# kill the server if it's running
pkill -f mysql-proxy

# make sure any old proxy instance is dead before firing up the new one
sleep 1

# run the proxy server in the background
${BIN_DIR}/mysql-proxy \
--proxy-backend-addresses=192.168.1.26:3306 \
--proxy-backend-addresses=192.168.1.27:3306 \
--proxy-backend-addresses=192.168.1.28:3306 \
--proxy-lua-script=${BASE_DIR}/databaseSegment.lua &

# give the server a chance to start
sleep 1

# prime the pumps!
# execute some sql statements to make sure that the proxy is running properly
# i.e. that it can establish a connection to the range of servers in question
# and bail if anything fails
for sqlStatement in \
   "select cid FROM cache limit 1" \
   "select nid FROM history limit 1" \
   "select name FROM variable limit 1"
do
   echo "testing query: ${sqlStatement}"
   echo ${sqlStatement} |  mysql --host=127.0.0.1 --port=4040 \
       --user=drupal --password=password drupaldb || { echo "${sqlStatement}: failed (is that server up?)"; exit 1; }
done

you'll notice that this script calls references databaseSegment.lua, this is the a script that uses a little regex magic to map queries to servers. again, the actual queries being mapped serve as examples to illustrate the point, but you'll get the idea.. jan has a nice r/w splitting example, that can be easily modified to create databaseSegment.lua.

most of the complexity in jan's code is around load balancing (least connections) and connection pooling within the proxy itself. jan points out (and i agree) that this functionality should be made available in a generic load-balancing lua module. i really like the idea of having this in lua scripts to allow others to easily extend it, for example, by adding a round robin alternative. keep an eye on his blog for developments. anyway, for now, let's modify his example, add a some defines and a method to do the mapping:

local CACHE_SERVER = 1
local LOG_SERVER = 2
local GENERAL_SERVER = 3

-- select a server to use based on the query text, this will return one of
-- CACHE_SERVER, LOG_SERVER or GENERAL_SERVER
function choose_server(query_text)
   local cache_server_strings = { "FROM cache", "UPDATE cache",
                                  "INTO cache", "LOCK TABLES cache"}
   local log_server_strings =   { "FROM history", "UPDATE history",
                                  "INTO history" , "LOCK TABLES history",
                                  "FROM watchdog", "UPDATE watchdog",
                                  "INTO watchdog", "LOCK TABLES watchdog" }

   local server_table = { [CACHE_SERVER] = cache_server_strings,
                          [LOG_SERVER] = log_server_strings }

   -- default to the general server
   local server_to_use = GENERAL_SERVER

   -- find a server registered for this query_text in the server_table
   for i=1, #server_table do
      for j=1, #server_table[i] do
         if string.find(query_text, server_table[i][j])
         then
            server_to_use = i
            break
         end
      end
   end

   return server_to_use
end

and then call this in read_query:

-- pick a server to use
proxy.connection.backend_ndx = choose_server(query_text)

test your application

now test your application. a good way to see the queries hitting your database servers, is to (temporarily) enable full logging on each of them and watch the log.edit /etc/mysql/my.cnf and set:

# Be aware that this log type is a performance killer.
log             = /var/log/mysql/mysql.log

and then:

# tail -f /var/log/mysql/mysql.log

further work

to develop this idea further:
  • someone with better drupal knowledge than me could define a good segmentation structure for typical drupal application, with the query fragments associated with each application.
  • additionally, the scripts could handle exceptional situations better e.g. a regular health check for the proxy.
  • clearly we've introduced another single-point-of-failure in the database load balancer. the earlier discussion of heartbeat applies here.
  • it would be wonderful to bypass all this nonsense and get drupal running on a mysql cluster. i'd love to hear if you've tried it and how it went.

references and documentation

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 13 2007
Nov 13

UPDATE: for the drupal 6 version, please go here.

if your career as a developer has included a stay in the j2ee world, then when you arrived at drupal one of your initial questions was "where's the log file?". eventually, someone told you about the watchdog table. you decided to try that for about five minutes, and then were reduced to using a combination of <pre> and print_r to scrawl debug data across your web browser.

when you tired of that, you learned a little php, did a little web research and discovered the PEAR log package and debug_backtrace(). the former is comfortably reminiscent of good old log4j and the latter finally gave you the stacktrace you'd been yearning for. still, separately, neither gave you quite what you were looking for : a log file in which every entry includes the filename and line number from which the log message originated. put them together though, and you've got log4drupal

log4drupal is a simple api that writes messages to a log file. each message is tagged with a particular log priority level (debug, info, warn, error or emergency) and you may also set the overall log threshold for your system. only messages with a priority level above your system threshold are actually printed to your log file. the system threshold may changed at any time, using the log4drupal administrative interface. you may also specify whether or not a full stack trace is included with every message. by default, a stack trace is included for messages with a priority of error and above. the administrative options are illustrated below :

log4drupal admin screen

now, on to the examples. suppose you had the following ridiculous block of code.

  $i = 0;
  while($i <= $user->profile_age) {
    log_debug("The user at least $i years old");
    $i++;
  }
 
  log_info("The user is $user->profile_age years old");
 
  if($user->profile_age < 2) {
    log_warn("User may be too young");
  }

  if($user->profile_age == 1) {
    log_error("Security violation, user much too young!");
  }

if your log threshold is set to debug then all messages will be shown in your log file as follows :

[20:23:23 11/12/07] [debug] [example.module:47] The user at least 0 years old
[20:23:23 11/12/07] [debug] [example.module:47] The user at least 1 years old
[20:23:23 11/12/07] [info] [example.module:51] The user is 1 years old
[20:23:23 11/12/07] [warning] [example.module:54] User may be too young!
[20:23:23 11/12/07] [error] [example.module:57] Security violation, user much too young!
  at /var/www/drupal/sites/all/modules/example/example.module:57
  at /var/www/drupal/sites/all/modules/example/example.module:71
  at /var/www/drupal/includes/module.inc:406
  at /var/www/drupal/modules/node/node.module:692
  at /var/www/drupal/modules/node/node.module:779
  at /var/www/drupal/modules/node/node.module:2462
  at /var/www/drupal/includes/menu.inc:418
  at /var/www/drupal/index.php:15

if your log threshold is set to warning then only the warning and error messages will be shown.

[20:27:52 11/12/07] [warning] [example.module:54] User may be too young
[20:27:52 11/12/07] [error] [example.module:57] Security violation,user much too young!
  at /var/www/drupal/sites/all/modules/example/example.module:57
  at /var/www/drupal/sites/all/modules/example/example.module:71
  at /var/www/drupal/includes/module.inc:406
  at /var/www/drupal/modules/node/node.module:692
  at /var/www/drupal/modules/node/node.module:779
  at /var/www/drupal/modules/node/node.module:2462
  at /var/www/drupal/includes/menu.inc:418
  at /var/www/drupal/index.php:15

you may download and test a copy of log4drupal here. suggestions for improvement or additional features are welcome. future improvements i've been thinking about include :

  1. integration with watchdog
  2. automatic recursive printing of any complex type messages

it's worth noting that all logging comes with a performance cost. i haven't done any serious calculations yet, but here is some ballpark data. on an unloaded server, with an average page load time of around 1.5 seconds, it takes about 0.3 milliseconds to print out one message. it takes about 0.008 milliseconds to not print out a message that is below your current system threshold.

if people are interested, i'll add this as a module to drupal.org

thanks to a former colleague, alex levine, for the original inspiration.

Nov 11 2007
Nov 11

i got some good feedback on my dedicated data server step towards scaling. kris buytaert in his everything is a freaking dns problem blog points out that nfs creates an unnecessary choke point. he may very well have a point.

having said that, i have run the suggested configuration in a multi-web-server, high-traffic production setting for 6 months without a glitch, and feedback on his blog gives example of other large sites doing the same thing. for even larger configurations, or if you just prefer, you might consider another method of synchronizing files between your web servers.

kris suggests rsync as a solution, and although luc stroobant points out the delete problem, i still think it's a good, simple solution. see the diagram above.

the delete problem is that you can't simply use the --delete flag on rsync. since in an x->y synchronization, a delete on node x looks just like an addition to node y.

i speculate that you can partly mitigate this issue with some careful scripting, using a source-of-truth file server to which you first pull only additions from the source nodes, and then do another run over the nodes with the delete flag (to remove any newly deleted files from your source-of-truth). unfortunately you can't do the delete run on a live site (due to timing problems if additions happen after your first pass and before your --delete pass), but you can do this as a regularly scheduled maintenance task when your directories are not in flux.

i include a bash script below to illustrate the point. i haven't tested this script, or the theory in general. so if you plan to use it, be careful.

you could call this script from cron on your data server. you could do this, say, every 5 minutes for a smallish deployment. even though that this causes a 5 minute delay in file propagation, the use of sticky sessions ensures that users will see files that they create immediately, even if there is a slight delay for others. additionally, you could schedule it with the -d flag during system downtime.

the viability of this approach depends on many factors including how quickly an uploaded file must be available for everyone and how many files you have to synchronize. this clearly depends on your application.

synchronizeFiles -- a bash script to keep your drupal web server's files directory synchronized

#!/bin/bash

# synchronizeFiles -- a bash script to keep your drupal web server's files directory
#                     synchronized - http://www.johnandcailin.com

# bail if anything fails
set -e

# don't synchronize deletes by default
syncDeletes=false

sourceServers="192.168.1.24 192.168.1.25"
sourceDir="/var/www/drupal/files"
sourceUser="www-data"
targetDir="/var/drupalFiles"

# function to print a usage message and bail
usageAndBail()
{
   echo "Usage syncronizeFiles [OPTION]"
   echo "     -d       synchronize deletes too (ONLY use when directory contents are static)"
   exit 1;
}

# process command line args
while getopts hd o
do     case "$o" in
        d)     syncDeletes=true;;
        h)     usageAndBail;;
        [?])   usageAndBail;;
       esac
done

# do initial addition only schronization run from sourceServers to targetServer
for sourceServer in ${sourceServers}
do
   echo "bi directionally syncing files between ${sourceServer} and local"

   # pull any new files to the target
   rsync -a ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}/..

   # push any new files back to the source
   rsync -a ${targetDir} ${sourceUser}@${sourceServer}:${sourceDir}/..
done

# synchronize deletes (only use if directory contents are static)
if test ${syncDeletes} = "true"
then
   for sourceServer in ${sourceServers}
   do
      echo "DELETE syncing files from ${sourceServer} to ${targetDir}"

      # pull any new files to the target, deleting from the source of truth if necessary
      rsync -a --delete ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}
   done
fi

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 10 2007
Nov 10
if you felt a waft of cold air when you read the recent highly critical drupal security announcement on arbirary code execution using install.php, you were right. your bum was hanging squarely out of the window, and you should probably consider beefing up your security.

drupal's default exposure of files like install.php and cron.php present inherent security risks, for both denial-of-service and intrusion. combine this with critical administrative functionality available to the world, protected only by user defined passwords, broadcast over the internet in clear-text, and you've got potential for some real problems.

fortunately, there are some easy and practical things you can do to tighten things up.

step one: block the outside world from your sensitive pages

one easy way to tighten up your security, is to simply block access to your sensitive pages from anyone outside your local network. this can be done by using apache's mod_rewrite. for example, you could block access to any administrative page by adding the following into your .htaccess file in your drupal directory (the one containing sites, scripts, modules etc.). the example only allows access from IPs in the range 192.*.*.* or 200.*.*.*: <IfModule mod_rewrite.c>
  RewriteEngine on

  # Allow only internal access to admin
  RewriteCond %{REMOTE_ADDR} !^(192|200)\..*$
  RewriteRule   ^admin/.*  - [F]
  [...]
</IfModule>

step two: tunnel into your server for administrative access

now that you've locked yourself out of your server for remote administrative access, you'd better figure how to get back in. SOCKS-proxy and ssh-tunneling to the rescue! assuming that your server is running an ssh server, setup a ssh tunnel (from the machine you are browsing on) to your server as follows:

ssh -D 9999 [email protected]

now go to your favorite browser and proxy your traffic through a local ssh SOCKS proxy e.g. on firefox 2.0 on windoze do the following:
  1. select the tools->options (edit->preferences on linux) menu
  2. go to the "connections" section of the "network" tab, click "settings"
  3. set the SOCKS host to localhost port 9999
now simply navigate to your site and administer, safe in the knowledge that not only is your site's soft-underbelly restricted to local users, but all your traffic (including your precious admin password) is encrypted in transit.

your bum should be feeling warmer already.

some more rules

some other rules that you might want to consider include (RewriteCond omitted for brevity) # allow only internal access to node editing
RewriteRule   ^node/.*/edit.*  - [F]

# allow only internal access to sensitive pages
RewriteRule   ^update.php  - [F]
RewriteRule   ^cron.php  - [F]
RewriteRule   ^install.php  - [F]

debugging

can't get your rewrite rules to work? shock! ... consider adding this to your vhost configuration (e.g. /etc/apache2/sites-available/default) to see what (the hell) is going on.

RewriteLog /var/log/apache2/vhost.rewrite.txt
RewriteLogLevel 3

thanks

thanks to curtis (madman) hilger and paul (windows is not your friend) lathrop for help with this.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 09 2007
chx
Nov 09

Let's see a brief list of the most important features of Drupal as accessible from a browser ie. not by creating and editing program files.

  • Node system. A node is the fundamental piece of Drupal, it holds content.
  • The nodes can have revisions. It's possible to track the time and the author of every change along with some log message about the change. It's possible revert to earlier revisions. There is a contributored module which shows the difference between two revisions.
  • Content is organized by a full, hiearchical taxonomy system. One taxonomy term can be applied to many nodes and one node can belong to many taxonomy terms. Taxonomy terms can form a tree (or an even complex structure where a term can have multiple parents) and several such trees can exist, we call them vocabularies. Every term provides an RSS feed of the nodes belonging to it.
  • Can aggregate RSS feeds.
  • Search enginge friendleness. It's not just that the system does not use ? in most paths but the webmaster can set a visitor- and search engine alias for every page.
  • Distributed authentication. Drupal system can trust each other and with contributed modules you can authenticate against LDAP, OpenID etc.
  • Role and permission based user management. Each role can contain any number of permissions and the user can be in any number of roles and gets the sum of the permissions belonging to these roles.

I already mentioned contributed modules, named two extremely important ones in the history and some more in the features section. More than a thousand of those can be found at drupal.org/project/Modules. It's impossible to list all of them, a few more examples from the most popular modules: image, event, gallery, ecommerce and calendar (I guess the names make trivial what these do). One more important module is i18n -- while Drupal core supports the translation of the interface and there are many translation packs, you need i18n module for user supplied content translation. Drupal 6 will make big inroads to this area.

Another download category are themes. Everything that Drupal outputs can be customized by themes. Again, there are a huge number of themes downloadable from drupal.org/project/Themes. I would like to draw attention to Friends Electric and Bluebreeze.

Out of these modules and themes rise a number of popular, high traffic Drupal-based sites. Again, just a few examples:

Nov 09 2007
chx
Nov 09

The beginning of Drupal history is very well documented on Drupal.org itself, I will augment the beginning of the story.

In 1999, a University of Antwerp student, Dries Buytaert was quite interested in wireless networking, he was maintaing the relevant FAQ for Linux. Wireless networking was so new (802.11b was standardized in 1999 October) that the FAQ contained “Why would I want a wireless LAN?” to which the longish answer closed with “Not to mention the fact it will make the geek in you go nuts”. In 2000, he put this knowledge into practical use: he and Hans Snijder shared Hans' ADSL connection among eight students in their dorm. The community needed a website to share information about the status of the network, about dinner... When Dries moved out after graduation, the website moved on the Internet, it was to be named “dorp.org” after the Dutch word for “village” but Dries made a typo, so the website became “drop.org”. The focus, of course changed -- you obviously would not read this if they would have only talked about dinner. No, the group began to talk about new web technologies, such as moderation, syndication, rating, and distributed authentication. To continue the Dutch-English play with words, when the software behind the site is released in January 2001, it's named Drupal as that's the English pronunciation of the Dutch translation of drop (druppel). It's very important to note motivitation for this software: it was a technology playground for a community lead by a hardcore geek who already had quite an experience from his Linux years about what could become of an open source software written by a community. Commercial gain of any sorts was not a goal and there were no pre-determined set of features.

Writing the chronicle gets harder and harder as the years pass because so many contributors joined the community and a lot of people would deserve to get his story known. And yet, we want to keep this article somewhat short, so I will jump many years, until May 2004.

First, Zack Rosen and Neil Drumm founds CivicSpace (formerly known as Hack4Dean and then DeanSpace). The importance of DeanSpace/CivicSpace is awareness -- while Drupal is no doubt the best already, it's almost unknown to the world at this time. If we need to name one thing that changed this, then DS/CS is it. (I can't resist to mention that un the same month, at the other end of the world, in a small rural town in Hungary, Karoly Negyesi, who becomes the most active developer of Drupal for many years to come, hears about Drupal for the first time...) 2004 summer sees the foundation of Bryght in Vancouver. Bryght is one of the first Drupal consultancy companies and their team very actively participate in the community. Drupal is now poised for world domination -- James Walker, one of the Bryght founders registers drupal-world-domination.com in 2004 november 1.

In October 18, 2004 Drupal 4.5 gets released -- the changes are bigger than ever: menu becomes editable, custom profile fields get introduced, attachments are now possible, multiple input formats are possible and the UI is translatable through the administration interface and via .po files. From an enduser standpoint, not until Drupal 5.0 will see as big changes as in this release.

Let's jump again, to the first developer meeting in Antwerp, 2005 February. Screennames got faces, friendships born and the idea of the security team start, to be realized in two months. Big, serious websites began to appear using Drupal and Drupal 4.6.0 gets released in April. This is the last release for a very very long time -- it will take more than a year for another Drupal to appear. Meanwhile, there are no less than three more DrupalCons: one in Portland in 2005 August, one in Amsterdam in 2005 October and one in Vancouver 2006 February. Later on, there will only be two DrupalCons a year -- Brussels 2006 September, Sunnyvale 2007 March and Barcelona 2007 September.

In 2005 summer, Google holds the first Summer of Code, where Drupal gets 11 slots. Of the 11 students, Fabiano Parolin Sant'Ana and Angela Byron is still active (and somewhat Steven Wittens). Angie (aka webchick) becomes one of the most important contributors for Drupal, ever since we participate in SoC in the (vain) hope of scoring another win like her. Also, we get some unit testing during SoC. 2005 summer sees both CCK (by Jon Van Dyk and Jonathan Chaffer) and Views (by Earl Miles) modules committed into Drupal.org CVS. While Drupal core itself is a great community tool on one hand, on the other hand it's a clean, lean, extensible framework that lets you code pretty much any website you want, these two modules lets you create extremely complex websites without much coding: CCK lets you define custom content types and Views lets you create complx listings -- both just with a few clicks.

Since then, the most important change in Drupal -- from an enduser's point of view -- was the acceptance of jQuery JavaScript library in Drupal 5.0. True to the spirit of Drupal, this library is small, modular, fast and does things right :) This greatly helped the usability of Drupal. In 2007 november, Packt Publishing gives Drupal the best CMS award.

Nov 07 2007
Nov 07

don't get me wrong, i'm a happy customer of the drupal hovertip module. everything worked out of the box, and i've enjoyed using it to cram even more pictures into my website. however, the included default css leaves a little to be desired for the following reasons :

  1. it's too specific. it assigns a very particular look and feel to your tooltips, complete with background colors, fixed widths and font sizes. sure, in theory, you can override all that in your theme css. but if css specificity is not your thing, you're going to be tearing your hair out trying to figure how to do it.
  2. the ui element chosen to indicate "hover here" is non-standard. the "hover here" directive is admittedly fairly new, but the emerging standard seems to be the dashed-underline (certainly not the italic font used in the drupal hovertip module).
  3. the clicktip css does not work on ie6. the link to close the clicktip has mysteriously gone missing.

you can download a more generic, flexible version of the necessary hovertip module css that solves all these issues here. here are some examples of how to use it.

hovertips

a hovertip causes a floating div to appear, just below and to the right of your cursor. the floating div can contain anything you like.

the simplest example of a hovertip might contain some plain text. in this example, and all that follow, the supporting html is shown in a code block. notice that you need to assign the floating div a background color. the default background color is transparent.

These are some explanatory words

the simplest example of a hovertip might contain some <span hovertip="text">plain text</span>. 

<div class="hovertip" id="text" style="background-color:#DDD;">
<p>These are some explanatory words</p>
</div>

a more entertaining hovertip might reveal a picture.

a more entertaining hovertip might reveal a  <span hovertip="picture">picture</span>.

<div class="hovertip" id="picture" style="background-color:#FFF;" >
<img src="http://gallery.johnandcailin.com/d/9144-2/ava+ladybug+109.JPG">
</div>

and finally, a hovertip may also contain a link

<p>and finally, a hovertip may also contain a <span hovertip="link">link</span></p>

<div class="hovertip" id="link" style="background-color:#DDD">
Visit our <a href="http://www.johnandcailin.com/tech">tech blog</a>
</div>

clicktips

a clicktip causes a previously invisible div to suddenly reveal itself. the clicktip div comes with a close link that makes the clicktip disappear again.

here is a clicktip that contains some text

These are some explanatory words

<p>here is a clicktip that contains some <span clicktip="text">text</span></p>

<div class="clicktip" id="text" style="background-color:#DDD;padding:5px;">
<p>These are some explanatory words</p>
</div>

Nov 05 2007
Nov 05

OpenSocial is a new API to build a bridge between many different social networks. This means it will help integrate different websites including Facebook, Hi5, LinkedIn and much more. This is really exciting and I'm really looking forward to helping out with the Drupal integration.

Watch the intoduction video.

Nov 02 2007
Nov 02

Small non-profits are in a bind. Despite the fact that they are small, they often need skilled technical services be it for computers or the need of a website. They are often staffed by volunteers who may have limited time or technical skills yet they need access to people with those technical skills and the time to help them.

Generic history

In small groups, their first website exists because someone figures out how to get a basic website up and they go on from there. Traditionally if someone wants an update, they need 'Bob' or 'Jane', the web person to hand edit and update the sites content. In smaller groups, often the person who is responsible and first put up the site is busy so things get delayed then missed, etc.... There is a bottleneck. Real life emergencies, drifting interests, volunteer burn out, all effect the life of a site. Neglect builds up and deterioration sets in.

If a group gets lucky and a skilled professional helps them out then they can have some valued services. But what happens when it's a custom CMS? What happens when that developer moves on? Then the sites custom features no longer get updated, the next person may not be familiar with the language, back end.... The site again suffers from neglect. The group fails to get it's message out.

Back to present

Recently some friends took their groups (KHTI) site off line, they felt their content was so outdated that it was better to be off line then on.

I saw it was off line so I offered to help. They said they'd keep the offer in mind. I tossed together a quick demo site using Drupal.

It took me only a an hour to create the demo site. I added a sub domain off my existing domain in DNS, leveraged my existing Drupal code base install with the multi-site capabilities, added an entry in Apache, ran through the Drupal install wizard, turned on a few already installed modules... That took 15 minutes. The rest of the time was spent digging up old content from archive.org, a quick modification of old faithful Blue Marine and sent a link with my thoughts on what I could provide.

What Drupal allows me to offer them;

  • They own the content and content can be updated from a web browser. No need to wait on a webmaster to update, change or modify content, the responsibility can be distributed to multiple people.
  • I will send full monthly back ups to someone so that if I get hit by a bus, distracted, wander off, they should be able to get their site with their content back up in a short amount of time. After all, the content is the most important part of a site.
  • A Content Management System that doesn't have a license cost and that is maintained by hundreds, if not thousands of people with documentation instead of custom code.

A few weeks later, they took me up on my offer. I spent some time discussing what they wanted and we settled on getting started with a basic site with their information and a FAQ. We will add more features later.

Modules used

  • CCK
  • Date
  • Help
  • Menu
  • Path
  • Statistics
  • Taxonomy
  • Tracker
  • Upload
  • BUEditor
  • Diff
  • SimpleMenu (for admin view only)
  • Update Status
  • Views
  • Views UI

They had a FAQ on their old site which was several static HTML, hand edited pages. I used CCK and views to replicate and automate this somewhat. I created a new node type called FAQ;
Description: This content type is for Frequently Asked Questions.
Title field label: Question
Body field label: Answer
Work flow is Published with Create new revision checked.

I created a taxonomy of terms, made the FAQ content type required to have a term. Now the non-profits members can add to the FAQ at anytime. I created a several views for each category, aliased the pages and sent them a link to review. They were able to log in and correct somethings right then with out needing to wait. Created a role for the contributors so they can modify content but not the structure of the site.

I get to present this to the rest of the group this weekend and then get feedback on what features they would like added on next. My guess is a way for people to submit land leads that can then have a follow up publicly available to them. Maybe see if anyone is artistic and wants to suggest an updated theme too.

For a few hours work the KHTI site is back online. Due to Drupal's flexibility, I can add content now and if necessary re-arrange, un-publish, add content/features and change the theme with very little effort. But for now, at least their site is back online.

Drupal is a great content management system. It's most powerful ability is that it allows you to give people control of their content and remove that old traditional bottleneck role of the web master.

Nov 02 2007
Nov 02
  • cyberteacher
  • cyberresearcher
  • cyberlearner
cel4145 | Fri, 11/02/2007 - 09:52 keywords: Powered by Drupal get Firefox
Oct 29 2007
Oct 29
the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands of a small website. my four year old pc in my garage running a full lamp install, will happily serve up 50,000 page views in a day, providing solid end-user performance without breaking a sweat.

when the times comes for scalability. moving of of the garage

if you are lucky, eventually the time comes when you need to service more users than your system can handle. your initial steps should clearly focus on getting the most out of the built-in drupal optimization functionality, considering drupal performance modules, optimizing your php (including considering op-code caching) and working on database performance. John VanDyk and Matt Westgate have an excellent chapter on this subject in their new book, "pro drupal development"

once these steps are exhausted, inevitability you'll start looking at your hardware and network deployment.

a well designed deployment will not only increase your scalability, but will also enhance your redundancy by removing single points of failure. implemented properly, an unmodified drupal install can run on this new deployment, blissfully unaware of the clustering, routing and caching going on behind the scenes.

incremental steps towards scalability

in this article, i outline a step-by-step process for incrementally scaling your deployment, from a simple single-node drupal install running all components of the system, all the way to a load balanced, multi node system with database level optimization and clustering.

since you almost certainly don't want to jump straight from your single node system to the mother of all redundant clustered systems in one step, i've broken this down into 5 incremental steps, each one building on the last. each step along the way is a perfectly viable deployment.

tasty recipes

i give full step-by-step recipes for each deployment, that with a decent working knowledge of linux, should allow you to get a working system up and running. my examples are for apache2, mysql5 and drupal5 on debian etch, but may still be useful for other versions / flavors.

note that these aren't battle-hardened production configurations, but rather illustrative minimal configurations that you can take and iterate to serve your specific needs.

the 5 deployment configurations

the table below outlines the properties of each of the suggested configurations: step 0step 1step 2step 3step 4 step 5 separate web and db no yes yes yes yes yes clustered web tier no no yes yes yes yes redundant load balancer no no no yes yes yes db optimization and segmentation no no no no yes yes clustered db no no no no no yes scalabilty poor- poor fair fair good great redundancy poor- poor- fair good fair great setup ease great good good fair poor poor-
in step 0, i outline how to install drupal, mysql and apache to get a get a basic drupal install up-and-running on a single node. i also go over some of the basic configuration steps that you''ll probably want to follow, including cron scheduling, enabling clean urls, setting up a virtual host etc.
in step 1, i go over a good first step to scaling drupal; creating a dedicated data server. by "dedicated data server" i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment.
in step 2, i go over how to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment.
in step 3, i discuss clustering your load balancer. one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling.


the holy grail of drupal database scaling might very well be a drupal deployment on mysql cluster. if you've tried this, plan to try this or have opinions on the feasibility of an ndb "port" of drupal, i'd love to hear it.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 29 2007
Oct 29

out of the box, the views module allows you to specify access to the view according to user role. this is a critical feature, but sometimes it's not enough. for example, sometimes you may want the view access to depend on the arguments to the view.

specifically, let's suppose that we have implemented facebook-style threaded mail, and we want to use a view to display all the messages in a thread. the thread id is an argument passed to the view. we only wish to allow the view to be accessed by one of the authors of the thread, or users with the 'administer messages' permission.

here's a three step approach to resolving this dilemna :

step one. create a new access hook in the views module

right after

  // Administrator privileges
  if (user_access('access all views', $account)) {
    return TRUE;
  }

add

  // Call a hook that lets a module define access permissions for the view
  $access_func = "views_access_$view->name";
  if(function_exists($access_func))
  {
  return $access_func($view);
  }

step two. implement your new hook

if your view is called message_thread then create a function views_access_message_thread($view) method.

step three. force views to NOT cache the access control settings for this view

okay, this part is a little hokey. the easiest way to do this is to tell the views module that your view has inline arguments. when you are defining the URL for your view in the views setting explicitly include the arguments, even if they occur at the end of the URL.

for example, if your page URL is view/message and then you are passing the thread id as an argument, define the page URL as view/message/$arg.

if you don't perform this step, then the views module will evaluate the access control for view/message/10 for a user, cache that result, and use that result for a subsequent request to view/message/34.

Oct 29 2007
Oct 29

previously, we discussed implementing all of the node hooks for CCK content types except hook_access. unfortunately, there is no access op for hook_nodeapi. adding this to drupal core is the topic of much discussion on drupal.org. so far a resolution to the issue has failed to be included in drupal 5 and drupal 6, and is now on deck for consideration in drupal 7.

this is a complicated issue, and the experts are debating with good cause. in the meantime though, if you need to move on, here's what you can do.

  • install this patch to node.module
  • you now have an access op exposed in hook_nodeapi

one reason that the debate is dragging on on this topic, is that the drupal developers are concerned that access control is already too complicated, and this addition will simply make drupal access control incomprehensible. this is a valid point, and to use this patch properly you do need to understand the access control order of evaluation. when determining whether or node a user may view a node, here are the questions drupal asks

  1. does the user have 'administer nodes' permission. if yes, always return true
  2. does the user have 'access content' permission. if no, always return false
  3. invoke the new hook_nodeapi methods.
    1. if no hook_nodeapi returned an explicit opinion on the matter, keep going.
    2. otherwise, if any hook_nodeapi returned true, return true
  4. invoke the hook_access method, if any. (note, there may be only one of these!)
    1. if no hook_nodeapi returned an explicit opinion, keep going
    2. if an opinion was returned, return that
  5. now check was the node_access table has to say. if no opinion, keep going
  6. is the user the author of the node? if yes, return true
  7. give up, return false

phew, that's a complicated flow of execution. are there any easy guidelines we can draw from this? yes . . .

one downside of granting access control to hook_nodeapi is that there may now be multiple modules with an opinion on the matter. this forces the drupal core developer to make a choice as to what to do when there are multiple, conflicting answers. in this patch, they have chosen to allow positive responses to dominate over negative responses. i'm personally not convinced they will stick with this decision, so, in the meantime, if you're using this patch, try and stick to a convention in which you implement only one hook_nodeapi access control method per content type. in doing this, you're simply allowing your CCK content types to function like any other content type, rather than opening a huge kettle of access control worms.

Oct 26 2007
Oct 26

a common path followed by advanced drupal developers using cck is the following

  1. create a content type using cck
  2. create a supporting custom module to handle advanced customizations. typically, the module is given the same name as the content type

in this custom module, developers then attempt to implement standard drupal hooks like hook_access and hook_submit. much confusion then arises as to why the drupal hook is not firing for the cck content type.
the reason is the following. hook_access, hook_insert, hook_submit, hook_update and hook_view only fire for the module that owns the content type. for cck content types, the module that owns the content type is content (e.g. cck) not your supporting custom module. therefore, drupal leaves your supporting custom module totally out of the loop!

so what's a developer supposed to do? for hook_insert, hook_submit, hook_update and hook_view use hook_nodeapi instead. let's say your content type is called food, and your module is also called food. then, the hook_nodeapi method might look something like

function food_nodeapi(&$node, $op, $teaser = NULL, $page = NULL) {
  switch ($op) {
    case 'view':
      if($node->type == 'food') {
        food_view($node);
      }
      break;
    case 'validate':
      if($node->type == 'food') {
         food_validate($node);
      }
      break;
  }
}

in which food_view and food_validate are just normal methods that you also implement in your module.

unfortunately, hook_access is an entirely different story, now addressed in a later chapter.

Oct 25 2007
Oct 25

This is part 1 of the wrap-up for the Online News Association workshop on Citizen Media I spoke at last week in Toronto. See the introductory post for more information and links.

This will necessarily be a combination of what I said at the workshop and what I wanted to say. The principle lesson learned over the three years plus at Urban Vancouver is that we found it hard to convince people to post to Urban Vancouver if they already have their own blog. Some do it, like Dave Olson, Stewart Marshall, Roland, myself, and others (yes, I'm aware of the poetry and real estate posts), but for the most part, people figure if they already have a blog, then there's no point in publishing it elsewhere. We syndicate most Vancouver-based blogs anyway using their RSS feeds, so it doesn't matter too much. The other lesson from Urban Vancouver is that editing is a full-time job for at least one person done currently by 4 people who already have full-time jobs. The duties of Urban Vancouver include moderating comments and posts according to the terms of service; gardening the aggregator (adding, removing, updating feeds), responding to the emails we get, mostly mistakenly; and encouraging people to participate on the site. We've been happy with the high search engine ranking Urban Vancouver enjoys, and discussed SEO briefly during my session at the workshop. I suggested that writing for people, enabling comments, and having an RSS feed will get people to link to you (or even syndicate you) and therefore drive up your ranking.

An audience member suggested headlines as a determining factor: it's one thing to have a savvy and witty headline, but being briefly descriptive instead helps people get an immediate sense for the individual story's topic and helps people who are looking for such a thing in Google. I could have, but didn't, mention tags. At my session and as a follow-up to a comment in someone else's session, I tried to work in Urban Vancouver's aggregtor effectively being a new type of newswire (at least one blogger uses Urban Vancouver's RSS feed to end all RSS feeds as fodder for a regular column), but couldn't fit it in. I mentioned that it was okay to promote your wares (or others') on Urban Vancouver as long as it wasn't press release style, i.e. more conversational and less like a pitch. Also, copyright owned by the original author both encourages people to post their stuff and limits the work we have to do: since we can't sublicense any of the works, we don't.

Along with Lisa, I don't think Urban Vancouver competes with sites like Metroblogging Vancouver, Beyond Robson, and neighbourhood-specific blogs like Kitsilano and Carrall Street, since we syndicate and directly link to their sites often. An audience member suggested that we don't "compete" because Urban Vancouver doesn't sell advertising—at least not yet—and therefore doesn't compete for the pool of ad dollars.

See also: "What If You Created A Community Site and Nobody Came?", my November 2006 article in which I talk about Urban Vancouver and community sites in general.

Oct 21 2007
Oct 21

if you've setup a clustered drupal deployment (see scaling drupal step two - sticky load balancing with apache mod_proxy), a good next-step, is to cluster your load balancer.

one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice scalability: fair redundancy: good ease of setup: fair

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 web server drupal-lb2.mydomain.com192.168.1.25 data server drupal-data-server1.mydomain.com192.168.1.26 load balancer apache-balance-1.mydomain.com192.168.1.34 load balancer apache-balance-2.mydomain.com192.168.1.35 load balancer cluster apache-balance-cluster.mydomain.com192.168.1.51

network diagram


install and setup heartbeat on load balancers

setup two load balancers, apache-balance-1 and apache-balance-2 as described in my previous blog. install heartbeat and it's dependencies on both:

# apt-get install heartbeat-2

configure /etc/ha.d/ha.cf (see http://www.linux-ha.org/GettingStarted for more info) identically on each of the load balancers as follows:

logfile /var/log/ha-log
bcast eth0
keepalive 2
warntime 10
deadtime 30
initdead 120
udpport 694
auto_failback yes
node apache-balance-1
node apache-balance-2
uuidfrom nodename
respawn hacluster /usr/lib/heartbeat/ipfail

note:
  • the nodenames must be the output of uname -n
  • i had problems with my nodenames flapping, using uuidfrom fixed this.
configure /etc/ha.d/haresources. this must be identical on apache-balance-1 and apache-balance-2. really! . this file should look like:

apache-balance-1 192.168.1.51 apache2

note:
  • apache-balance-1 here refers to the "preferred" host for the service
  • 192.168.1.51 is the vip that your load balancer will appear to be on
  • apache2 here refers specifically to the name of the script in the directory /etc/init.d
configure /etc/ha.d/authkeys on both load balancers. if you're paranoid, see more secure options, in "configuring authkeys" here. this fiile should look like:

auth 2
2 crc

set authkeys permissions on both load balancers:

# chmod 600 /etc/ha.d/authkeys

configure apache to listen on the vip. edit /etc/apache2/ports.conf on both load balancers:

Listen 192.168.1.51:80

note: after this change, apache won't start on the load balancer in question, unless it has the vip. relax. that's as it should be.

theoretically you should configure each load balancer to stop apache2 starting on boot. this allows the ha daemon to take full control of starting and stopping apache. in practice i didn't need to. you might want to.

restart the ha daemons and test

restart the ha daemon on both load balancers and test:

# etc/init.d/heartbeat restart

keep an eye on the apache and heartbeat logfiles on both servers to see what is going on when you shut either loadbalancer down.

# tail -f /var/log/apache2/access.log
# tail -f /var/log/ha-log

final word

this is a fairly simplistic configuration. there is more you can do on detecting abormal situations and failing over. for more information, visit http://www.linux-ha.org

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 21 2007
Oct 21

if you've setup your drupal deployment with a separate database and web (drupal) server (see scaling drupal step one - a dedicated data server), a good next step, is to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

one way to do this is to use a dedicated web server running apache2 and mod_proxy / mod_proxy_balancer to load balance your drupal servers.

deployment overview

this table summaries the characteristics of this deployment choice scalability: fair redundancy: fair ease of setup: fair

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 web server drupal-lb2.mydomain.com192.168.1.25 data server drupal-data-server1.mydomain.com192.168.1.26 load balancer apache-balance-1.mydomain.com192.168.1.34

network diagram


load balancer setup: install and enable apache and proxy_balancer

create a dedicated server for load balancing. install apache2 (apt-get install apache2) and then install mod proxy_balancer and proxy_http with dependencies

# a2enmod proxy_balancer
# a2enmod proxy_http

enable mod_proxy in mods-available/proxy.conf. note that i'm leaving ProxyRequests off since we're only using the ProxyPass and ProxyPassReverse directives. this keeps the server secure from spammers trying to use your proxy to send email.

<IfModule mod_proxy.c>
        # set ProxyRequests off since we're only using the ProxyPass and ProxyPassReverse
        # directives. this keeps the server secure from
        # spammers trying to use your proxy to send email.

        ProxyRequests Off

        <Proxy *>
                AddDefaultCharset off
                Order deny,allow
                Allow from all
                #Allow from .example.com
        </Proxy>

        # Enable/disable the handling of HTTP/1.1 "Via:" headers.
        # ("Full" adds the server version; "Block" removes all outgoing Via: headers)
        # Set to one of: Off | On | Full | Block

        ProxyVia On
</IfModule>

configure mod_proxy and mod_proxy_balancer

mod_proxy and mod_proxy balancer serve as a very functional load balancer. however mod_proxy_balancer makes slightly unfortunate assumptions about the format of the cookie that you'll use for sticky session handling. one way to work around this is to create your own session cookie (very easy with apache). the examples below describe how to do this

first create a virtual host or use the default (/etc/apache2/sites-available/default) and add this configuration to it:

<Location /balancer-manager>
SetHandler balancer-manager

Order Deny,Allow
Deny from all
Allow from 192.168
</Location>

<Proxy balancer://mycluster>
  # cluster member 1
  BalancerMember http://drupal-lb1.mydomain.com:80 route=lb1

  # cluster member 2
  BalancerMember http://drupal-lb2.mydomain.com:80 route=lb2
</Proxy>

ProxyPass /balancer-manager !
ProxyPass / balancer://mycluster/ lbmethod=byrequests stickysession=BALANCEID
ProxyPassReverse / http://drupal-lb1.mydomain.com/
ProxyPassReverse / http://drupal-lb2.mydomain.com/

note:
  • i'm allowing access to the balancer manager (the web UI) from any IP matching 192.168.*.*
  • i'm load balancing between 2 servers (drupal-lb1.mydomain.com, drupal-lb2.mydomain.com) on port 80
  • i'm defining two routes for these servers called lb1 and lb2
  • i'm excluding (!) the balancer-manager directory fro the ProxyPass to allow access to the manager ui on the load balancing server
  • i'm expecting a cookie called BALANCEID to be available to manage sticky sessions
  • this is a simplistic load balancing configuration. apache has many options to control timeouts, server loading, failover etc. too much to cover but read more in the apache documentation

configure the web (drupal) servers to write a session cookie

on each of the web (drupal) servers, add this code to your vhost configuration:

RewriteEngine On
RewriteRule .* - [CO=BALANCEID:balancer.lb1:.mydomain.com]

making sure to specify the correct route e.g. lb1 on drupal-lb1.mydomain.com etc.

you also probably want to setup your cookie domain properly in drupal, i.e. modify drupal/sites/default/settings.php as follows:

# $cookie_domain = 'example.com';
$cookie_domain = 'mydomain.com';

important urls

useful urls for testing are:

the balancer manager

the mod_proxy_balancer ui enables point-and-click update of balancer members.

the balancer manager allows you to dynamically change the balance factor or a particular member, change it's route or put it in the off line mode.

debugging

to debug your configuration it's useful to turn up apache's debugging level on your apache load balancer by adding this to your vhost configuration:

LogLevel debug

this will produce some very useful debugging output (/var/log/apache2/error.log) from the proxying and balancing code.

firefox's cookie viewer tools->options->privicy->show cookies is also useful to view and manipulate your cookies.

if you plan to experiment with bringing servers up and down to test them being added and removed from the cluster you should consider setting the "connection pool worker retry timeout" to a value lower than the default 60s. you could set them to e.g. 10s by changing your configuration to the one below. a 10s timeout allows for quicker test cycles.

BalancerMember http://drupal-lb1.scream.squaretrade.com:80 route=lb1 retry=10
BalancerMember http://drupal-lb2.scream.squaretrade.com:80 route=lb2 retry=10

next steps

one single-point-of-failure in this deployment is the apache load balancer. consider clustering your load balancer with scaling drupal step three - using heartbeat to implement a redundant load balancer

references and documentation

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 19 2007
Oct 19

Rosetta Stone GermanA new and very interesting project with Drupal at Munich renewed my interest in learning German, something I've had pending for too long. This time I decided to try harder and finally learn Hermann Hesse's tongue.

I already knew that finding the time and right teacher was not an easy task so I started searching online. I had already tried a few options, including the quite helpful and funny free course from Deutsche Welle, but now I wanted something that could teach me German as fast as possible and I would pay for it.

That's how I found Rosetta Stone, a company offering software used by many many people all over the world, including executives and employees from some very big businesses. The system Rosetta Stone uses is called Dynamic Immersion and works quite well, connecting you with the new language, they have thirty available, from the very first moment.

Using nice pictures, different voices and a series of interactive exercises the student can easily get the basic concepts behind the language. I have almost one week with my Rosetta Stone course and I can say I'm right on track. The project with Drupal went quite well and at some point I could just program and design without even noticing all the text was in German.

I purchased the online version of Rosetta Stone, a little more than US$ 100 for a 3 month subscription, to avoid additional shipping and custom costs, and of course to save time. The only problem I've found so far, as a Linux user, is that the software requires Adobe Shockwave, available only for Windows and Mac OS, but anyway, that's an Adobe issue actually.

Obviously this is not the end of the road for my German classes. I will take other courses in the future, probably Berlitz Online, much more expensive and, obviously, will keep listening to Rammstein and Tokio Hotel as much as possible.

Auf Wiedersehen!

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web