Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
May 22 2012
May 22

Relational databases are able to store, with minimal fuss, pretty much any data entities you throw at them. For the more complex cases – particularly cases involving hierarchical data – they offer many-to-many relationships. Querying many-to-many relationships is usually quite easy: you perform a series of SQL joins in your query; and you retrieve a result set containing the combination of your joined tables, in denormalised form (i.e. with the data from some of your tables being duplicated in the result set).

A denormalised query result is quite adequate, if you plan to process the result set further – as is very often the case, e.g. when the result set is subsequently prepared for output to HTML / XML, or when the result set is used to populate data structures (objects / arrays / dictionaries / etc) in programming memory. But what if you want to export the result set directly to a flat format, such as a single CSV file? In this case, denormalised form is not ideal. It would be much better, if we could aggregate all that many-to-many data into a single result set containing no duplicate data, and if we could do that within a single SQL query.

This article presents an example of how to write such a query in MySQL – that is, a query that's able to aggregate complex many-to-many relationships, into a result set that can be exported directly to a single CSV file, with no additional processing necessary.

Example: a lil' Bio database

For this article, I've whipped up a simple little schema for a biographical database. The database contains, first and foremost, people. Each person has, as his/her core data: a person ID; a first name; a last name; and an e-mail address. Each person also optionally has some additional bio data, including: bio text; date of birth; and gender. Additionally, each person may have zero or more: profile pictures (with each picture consisting of a filepath, nothing else); web links (with each link consisting of a title and a URL); and tags (with each tag having a name, existing in a separate tags table, and being linked to people via a joining table). For the purposes of the example, we don't need anything more complex than that.

Here's the SQL to create the example schema:

CREATE TABLE person (
  pid int(10) unsigned NOT NULL AUTO_INCREMENT,
  firstname varchar(255) NOT NULL,
  lastname varchar(255) NOT NULL,
  email varchar(255) NOT NULL,
  PRIMARY KEY (pid),
  UNIQUE KEY email (email),
  UNIQUE KEY firstname_lastname (firstname(100), lastname(100))
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

CREATE TABLE tag (
  tid int(10) unsigned NOT NULL AUTO_INCREMENT,
  tagname varchar(255) NOT NULL,
  PRIMARY KEY (tid),
  UNIQUE KEY tagname (tagname)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

CREATE TABLE person_bio (
  pid int(10) unsigned NOT NULL,
  bio text NOT NULL,
  birthdate varchar(255) NOT NULL DEFAULT '',
  gender varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (pid),
  FULLTEXT KEY bio (bio)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE person_pic (
  pid int(10) unsigned NOT NULL,
  pic_filepath varchar(255) NOT NULL,
  PRIMARY KEY (pid, pic_filepath)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE person_link (
  pid int(10) unsigned NOT NULL,
  link_title varchar(255) NOT NULL DEFAULT '',
  link_url varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (pid, link_url),
  KEY link_title (link_title)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE person_tag (
  pid int(10) unsigned NOT NULL,
  tid int(10) unsigned NOT NULL,
  PRIMARY KEY (pid, tid)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

And here's the SQL to insert some sample data into the schema:

INSERT INTO person (firstname, lastname, email) VALUES ('Pete', 'Wilson', '[email protected]');
INSERT INTO person (firstname, lastname, email) VALUES ('Sarah', 'Smith', '[email protected]');
INSERT INTO person (firstname, lastname, email) VALUES ('Jane', 'Burke', '[email protected]');

INSERT INTO tag (tagname) VALUES ('awesome');
INSERT INTO tag (tagname) VALUES ('fantabulous');
INSERT INTO tag (tagname) VALUES ('sensational');
INSERT INTO tag (tagname) VALUES ('mind-boggling');
INSERT INTO tag (tagname) VALUES ('dazzling');
INSERT INTO tag (tagname) VALUES ('terrific');

INSERT INTO person_bio (pid, bio, birthdate, gender) VALUES (1, 'Great dude, loves elephants and tricycles, is really into coriander.', '1965-04-24', 'male');
INSERT INTO person_bio (pid, bio, birthdate, gender) VALUES (2, 'Eccentric and eclectic collector of phoenix wings. Winner of the 2003 International Small Elbows Award.', '1982-07-20', 'female');
INSERT INTO person_bio (pid, bio, birthdate, gender) VALUES (3, 'Has purply-grey eyes. Prefers to only go out on Wednesdays.', '1990-11-06', 'female');

INSERT INTO person_pic (pid, pic_filepath) VALUES (1, 'files/person_pic/pete1.jpg');
INSERT INTO person_pic (pid, pic_filepath) VALUES (1, 'files/person_pic/pete2.jpg');
INSERT INTO person_pic (pid, pic_filepath) VALUES (1, 'files/person_pic/pete3.jpg');
INSERT INTO person_pic (pid, pic_filepath) VALUES (3, 'files/person_pic/jane_on_wednesday.jpg');

INSERT INTO person_link (pid, link_title, link_url) VALUES (2, 'The Great Blog of Sarah', 'http://www.omgphoenixwingsaresocool.com/');
INSERT INTO person_link (pid, link_title, link_url) VALUES (3, 'Catch Jane on Blablablabook', 'http://www.blablablabook.com/janepurplygrey');
INSERT INTO person_link (pid, link_title, link_url) VALUES (3, 'Jane ranting about Thursdays', 'http://www.janepurplygrey.com/thursdaysarelame/');

INSERT INTO person_tag (pid, tid) VALUES (1, 3);
INSERT INTO person_tag (pid, tid) VALUES (1, 4);
INSERT INTO person_tag (pid, tid) VALUES (1, 5);
INSERT INTO person_tag (pid, tid) VALUES (1, 6);
INSERT INTO person_tag (pid, tid) VALUES (2, 2);

Querying for direct CSV export

If we were building, for example, a simple web app to output a list of all the people in this database (along with all their biographical data), querying this database would be quite straightforward. Most likely, our first step would be to query the one-to-one data: i.e. query the main 'person' table, join on the 'bio' table, and loop through the results (in a server-side language, such as PHP). The easiest way to get at the rest of the data, in such a case, would be to then query each of the many-to-many relationships (i.e. user's pictures; user's links; user's tags) in separate SQL statements, and to execute each of those queries once for each user being processed.

In that scenario, we'd be writing four different SQL queries, and we'd be executing SQL numerous times: we'd execute the main query once, and we'd execute each of the three secondary queries, once for each user in the database. So, with the sample data provided here, we'd be executing SQL 1 + (3 x 3) = 10 times.

Alternatively, we could write a single query which joins together all of the three many-to-many relationships in one go, and our web app could then just loop through a single result set. However, this result set would potentially contain a lot of duplicate data, as well as a lot of NULL data. So, the web app's server-side code would require extra logic, in order to deal with this messy result set effectively.

In our case, neither of the above solutions is adequate. We can't afford to write four separate queries, and to perform 10 query executions. We don't want a single result set that contains duplicate data and/or excessive NULL data. We want a single query, that produces a single result set, containing one person per row, and with all the many-to-many data for each person aggregated into that person's single row.

Here's the magic SQL that can make our miracle happen:

SELECT        person_base.pid,
              person_base.firstname,
              person_base.lastname,
              person_base.email,
              IFNULL(person_base.bio, '') AS bio,
              IFNULL(person_base.birthdate, '') AS birthdate,
              IFNULL(person_base.gender, '') AS gender,
              IFNULL(pic_join.val, '') AS pics,
              IFNULL(link_join.val, '') AS links,
              IFNULL(tag_join.val, '') AS tags
FROM (
    SELECT        p.pid,
                  p.firstname,
                  p.lastname,
                  p.email,
                  IFNULL(pb.bio, '') AS bio,
                  IFNULL(pb.birthdate, '') AS birthdate,
                  IFNULL(pb.gender, '') AS gender
    FROM          person p
    LEFT JOIN     person_bio pb
    ON            p.pid = pb.pid
) AS person_base
LEFT JOIN (
    SELECT        join_tbl.pid,
                  IFNULL(
                      GROUP_CONCAT(
                          DISTINCT CAST(join_tbl.pic_filepath AS CHAR)
                          SEPARATOR ';;'
                      ),
                      ''
                  ) AS val
    FROM          person_pic join_tbl
    GROUP BY      join_tbl.pid
) AS pic_join
ON            person_base.pid = pic_join.pid
LEFT JOIN (
    SELECT        join_tbl.pid,
                  IFNULL(
                      GROUP_CONCAT(
                          DISTINCT CONCAT(
                              CAST(join_tbl.link_title AS CHAR),
                              '::',
                              CAST(join_tbl.link_url AS CHAR)
                          )
                          SEPARATOR ';;'
                      ),
                      ''
                  ) AS val
    FROM          person_link join_tbl
    GROUP BY      join_tbl.pid
) AS link_join
ON            person_base.pid = link_join.pid
LEFT JOIN (
    SELECT        join_tbl.pid,
                  IFNULL(
                      GROUP_CONCAT(
                          DISTINCT CAST(t.tagname AS CHAR)
                          SEPARATOR ';;'
                      ),
                      ''
                  ) AS val
    FROM          person_tag join_tbl
    LEFT JOIN     tag t
    ON            join_tbl.tid = t.tid
    GROUP BY      join_tbl.pid
) AS tag_join
ON            person_base.pid = tag_join.pid
ORDER BY      lastname ASC,
              firstname ASC;

If you run this in a MySQL admin tool that supports exporting query results directly to CSV (such as phpMyAdmin), then there's no more fancy work needed on your part. Just click 'Export -> CSV', and you'll have your results looking like this:

pid,firstname,lastname,email,bio,birthdate,gender,pics,links,tags
3,Jane,Burke,[email protected],Has purply-grey eyes. Prefers to only go out on Wednesdays.,1990-11-06,female,files/person_pic/jane_on_wednesday.jpg,Catch Jane on Blablablabook::http://www.blablablabook.com/janepurplygrey;;Jane ranting about Thursdays::http://www.janepurplygrey.com/thursdaysarelame/,
2,Sarah,Smith,[email protected],Eccentric and eclectic collector of phoenix wings. Winner of the 2003 International Small Elbows Award.,1982-07-20,female,,The Great Blog of Sarah::http://www.omgphoenixwingsaresocool.com/,fantabulous
1,Pete,Wilson,[email protected],Great dude, loves elephants and tricycles, is really into coriander.,1965-04-24,male,files/person_pic/pete1.jpg;;files/person_pic/pete2.jpg;;files/person_pic/pete3.jpg,,sensational;;mind-boggling;;dazzling;;terrific

The query explained

The most important feature of this query, is that it takes advantage of MySQL's ability to perform subqueries. What we're actually doing, is we're performing four separate queries: one query on the main person table (which joins to the person_bio table); and one on each of the three many-to-many elements of a person's bio. We're then joining these four queries, and selecting data from all of their result sets, in the parent query.

The magic function in this query, is the MySQL GROUP_CONCAT() function. This basically allows us to join together the results of a particular field, using a delimiter string, much like the join() array-to-string function in many programming languages (i.e. like PHP's implode() function). In this example, I've used two semicolons (;;) as the delimiter string.

In the case of person_link in this example, each row of this data has two fields ('link title' and 'link URL'); so, I've concatenated the two fields together (separated by a double-colon (::) string), before letting GROUP_CONCAT() work its wonders.

The case of person_tags is also interesting, as it demonstrates performing an additional join within the many-to-many subquery, and returning data from that joined table (i.e. the tag name) as the result value. So, all up, each of the many-to-many relationships in this example is a slightly different scenario: person_pic is the basic case of a single field within the many-to-many data; person_link is the case of more than one field within the many-to-many data; and person_tags is the case of an additional one-to-many join, on top of the many-to-many join.

Final remarks

Note that although this query depends on several MySQL-specific features, most of those features are available in a fairly equivalent form, in most other major database systems. Subqueries vary quite little between the DBMSes that support them. And it's possible to achieve GROUP_CONCAT() functionality in PostgreSQL, in Oracle, and even in SQLite.

It should also be noted that it would be possible to achieve the same result (i.e. the same end CSV output), using 10 SQL query executions and a whole lot of PHP (or other) glue code. However, taking that route would involve more code (spread over four queries and numerous lines of procedural glue code), and it would invariably suffer worse performance (although I make no guarantees as to the performance of my example query, I haven't benchmarked it with particularly large data sets).

This querying trick was originally written in order to export data from a Drupal MySQL database, to a flat CSV file. The many-to-many relationships were referring to field tables, as defined by Drupal's Field API. I made the variable names within the subqueries as generic as possible (e.g. join_tbl, val), because I needed to copy the subqueries numerous times (for each of the numerous field data tables I was dealing with), and I wanted to make as few changes as possible on each copy.

The trick is particularly well-suited to Drupal Field API data (known in Drupal 6 and earlier as 'CCK data'). However, I realised that it could come in useful with any database schema where a "flattening" of many-to-many fields is needed, in order to perform a CSV export with a single query. Let me know if you end up adopting this trick for schemas of your own.

Jan 06 2011
Jan 06

Congratulations to the Drupal community for getting version 7.0 released! This is a major mile stone and an excellent reason to celebrate!

If you want to give Drupal 7 a try without having to install anything, I've now updated my Drupal 7 appliances on SuSE Studio to the latest release. The appliance is based on openSUSE Linux 11.3 and is available in two variants:

  • A text-mode only appliance to which you connect using your local web browser via the network.
  • A GUI version that starts up the Firefox browser in a minimalistic GNOME desktop to perform the installation locally. Remote network access is available, too.
The database backend is MySQL 5.1, with the InnoDB plugin and strict mode enabled. phpMyAdmin has been added to support web-based administration of the MySQL server. You can access it via http://localhost/phpMyAdmin/. I also added drush, the Drupal command line shell and scripting interface and some additional packages (yast2-http-server, bind-utils, php5-ctype, patch). I also bumped up the appliance's version number to 7.0.0, to match the version number of Drupal included.

The appliance is available in various formats:

  • A live raw disk image, ready to be written to an USB stick or flash drive
  • A live ISO image, to be burned on a CD or used in a virtual machine
  • A hard disk image, to be dumped on a hard disk drive
  • Various virtual disk formats, e.g. OVF, VMWare/VirtualBox/KVM and Xen
Please see the installation instructions provided on the download pages for details on how to use the various image formats.

So congratulations to the Drupal developer community for reaching this goal and thanks to SuSE/Novell/Attachmate for providing the infrastructure for creating such appliances. I also would like to especially thank Richard Bos for the testing and many suggestions for improvement of these appliances!

Nov 01 2010
Nov 01

FOSDEM banner

It's that time of the year again — the nice folks at FOSDEM have granted us a developer room at their upcoming conference (February 5+6 2011 in Brussels, Belgium)!

As usual there were more applications than they were able to accommodate, so we are very grateful for this opportunity for collaboration. Titled "MySQL and Friends", our room next year will be H.2213 with a capacity of 100 seats. It will be at our disposal on Saturday 5th, from 13:00 till 19:00. Like last year, we would like to set up a schedule of talks related to the MySQL server and the various projects that surround it. Each talk will last 20 minutes, plus 5 minutes of Q&A and a 5 minute break for switching speakers, giving us 12 slots in total to fill with excellent tech talks. Take a look at this year's schedule for some examples! I hope we can assemble an even more exciting and interesting schedule for next year.

Quoting from my last year's call for papers:

We are looking for covering a wide range of topics that attract both MySQL DBAs as well as application developers that work with MySQL as their database of choice. Are you developing a new storage engine or other plugin? Do you want to share your experiences and best practices in administering or deploying MySQL servers? Did you develop a new method to scale a MySQL setup? Let us and the audience know about it! You can submit your talk proposal via this submission form.

The deadline for turning in your proposal is Sunday, 26th of December, 2010, after which there will be a voting and rating period to identify the most interesting and attractive topics.

Please check the FOSDEM 2011 information page on the MySQL Forge Wiki for more details and don't hesitate to contact me directly, if you have any questions or suggestions. I look forward to your proposals!

Oct 25 2010
Oct 25

Drupal logoOver the weekend I updated my Drupal 7 test appliance in SUSE Studio to the Drupal 7.0-beta2 release, which was released on Oct. 23rd. I also added phpMyAdmin upon a user request, to provide a web-based method to work with the MySQL instance, if needed.

In addition to the lightweight "headless" appliance (which can only be accessed and configured via a remote network connection), I've now also created a GUI-based version. This appliance starts a minimal GNOME desktop and a Mozilla Firefox browser, which in turn opens the Drupal installation page by default. I hope you will find this useful if you want to toy around and test Drupal 7 without having to go through the entire OS and LAMP stack configuration yourself. In fact, you can even test this appliance via the recently added test drive option from right out of your web browser!

The appliance is now also available in OVF format. SuSE Studio now also builds Amazon EC2 images, which don't seem to be available for download from the SUSE Gallery yet. I assume this is a recent addition to the continuously improving SUSE Studio functionality, hopefully these images will be made available soon.

Sep 17 2010
Sep 17

Drupal logoThe Drupal community just recently released another alpha test release of their upcoming Drupal 7 version, to shake out the remaining bugs and to encourage more users to test it.

If you would like to give it a try, but you don't have a free server handy, how about using a virtual machine instead? Using the fabolous SuSE Studio, I've created an appliance based on openSUSE 11.3, Drupal 7.0-alpha7 and MySQL 5.1 with the InnoDB plugin and strict mode enabled (both for the SQL mode and InnoDB mode. Using this configuration helps to ensure that Drupal works well with the current version of MySQL/InnoDB and does not use any "questionable" SQL statements. This might be especially interesting for additional modules - Drupal core did not reveal any problems using strict mode so far.

You can download disk images for VMware/Virtualbox/KVM or XEN from the SUSE Gallery (free login required). Just boot the appliance in your virtualization application of choice, choose your keyboard layout and step through the network configuration and Time Zone selection. Once the appliance has booted up and the login: prompt appeared, point your web browser to the appliance's IP address to start the Drupal installation/configuration. MySQL has been pre-configured, there is an empty database named "drupal" and a user "drupal" with the same password to access it. You just need to enter this information in the Drupal Database configuration dialogue during the installation. Anything else can be configured to your liking.

After you have finished the installation, you can toy around with a fresh Drupal 7 installation! Install additional modules, change the themes, add content. And make sure to report all bugs that you run into while doing so! Have fun.

Mar 19 2010
Mar 19

I have an interesting problem, on a data migration project I'm currently working on. I'm importing a large amount of legacy data into Drupal, using the awesome Migrate module (and friends). Migrate is a great tool for the job, but one of its limitations is that it requires the legacy database tables to have non-composite integer primary keys. Unfortunately, most of the tables I'm working with have primary keys that are either composite (i.e. the key is a combination of two or more columns), or non-integer (i.e. strings), or both.

Table with composite primary key.

Table with composite primary key.

The simplest solution to this problem would be to add an auto-incrementing integer primary key column to the legacy tables. This would provide the primary key information that Migrate needs in order to do its mapping of legacy IDs to Drupal IDs. But this solution has a serious drawback. In my project, I'm going to have to re-import the legacy data at regular intervals, by deleting and re-creating all the legacy tables. And every time I do this, the auto-incrementing primary keys that get generated could be different. Records may have been deleted upstream, or new records may have been added in between other old records. Auto-increment IDs would, therefore, correspond to different composite legacy primary keys each time I re-imported the data. This would effectively make Migrate's ID mapping tables corrupt.

A better solution is needed. A solution called hashing! Here's what I've come up with:

  1. Remove the legacy primary key index from the table.
  2. Create a new column on the table, of type BIGINT. A MySQL BIGINT field allocates 64 bits (8 bytes) of space for each value.
  3. If the primary key is composite, concatenate the columns of the primary key together (optionally separated by a delimiter).
  4. Calculate the SHA1 hash of the concatenated primary key string. An SHA1 hash consists of 40 hexadecimal digits. Since each hex digit stores 24 different values, each hex digit requires 4 bits of storage; therefore 40 hex digits require 160 bits of storage, which is 20 bytes.
  5. Convert the numeric hash to a string.
  6. Truncate the hash string down to the first 16 hex digits.
  7. Convert the hash string back into a number. Each hex digit requires 4 bits of storage; therefore 16 hex digits require 64 bits of storage, which is 8 bytes.
  8. Convert the number from hex (base 16) to decimal (base 10).
  9. Store the decimal number in your new BIGINT field. You'll find that the number is conveniently just small enough to fit into this 64-bit field.
  10. Now that the new BIGINT field is populated with unique values, upgrade it to a primary key field.
  11. Add an index that corresponds to the legacy primary key, just to maintain lookup performance (you could make it a unique key, but that's not really necessary).
Table with integer primary key.

Table with integer primary key.

The SQL statement that lets you achieve this in MySQL looks like this:

ALTER TABLE people DROP PRIMARY KEY;
ALTER TABLE people ADD id BIGINT UNSIGNED NOT NULL FIRST;
UPDATE people SET id = CONV(SUBSTRING(CAST(SHA(CONCAT(name, ',', city)) AS CHAR), 1, 16), 16, 10);
ALTER TABLE people ADD PRIMARY KEY(id);
ALTER TABLE people ADD INDEX (name, city);

Note: you will also need to alter the relevant migrate_map_X tables in your database, and change the sourceid and destid fields in these tables to be of type BIGINT.

Hashing has a tremendous advantage over using auto-increment IDs. When you pass a given string to a hash function, it always yields the exact same hash value. Therefore, whenever you hash a given string-based primary key, it always yields the exact same integer value. And that's my problem solved: I get constant integer ID values each time I re-import my legacy data, so long as the legacy primary keys remain constant between imports.

Storing the 64-bit hash value in MySQL is straightforward enough. However, a word of caution once you continue on to the PHP level: PHP does not guarantee to have a 64-bit integer data type available. It should be present on all 64-bit machines running PHP. However, if you're still on a 32-bit processor, chances are that a 32-bit integer is the maximum integer size available to you in PHP. There's a trick where you can store an integer of up to 52 bits using PHP floats, but it's pretty dodgy, and having 64 bits guaranteed is far preferable. Thankfully, all my environments for my project (dev, staging, production) have 64-bit processors available, so I'm not too worried about this issue.

I also have yet to confirm 100% whether 16 out of 40 digits from an SHA1 hash is enough to guarantee unique IDs. In my current legacy data set, I've applied this technique to all my tables, and haven't encountered a single duplicate (I also experimented briefly with CRC32 checksums, and very quickly ran into duplicate ID issues). However, that doesn't prove anything — except that duplicate IDs are very unlikely. I'd love to hear from anyone who has hard probability figures about this: if I'm using 16 digits of a hash, what are the chances of a collision? I know that Git, for example, stores commit IDs as SHA1 hashes, and it lets you then specify commit IDs using only the first few digits of the hash (e.g. the first 7 digits is most common). However, Git makes no guarantee that a subset of the hash value is unique; and in the case of a collision, it will ask you to provide enough digits to yield a unique hash. But I've never had Git tell me that, as yet.

Nov 05 2009
Nov 05

This blog post is a by-product of my preparation work for an upcoming talk titled "Why you should be using a distributed version control system (DVCS) for your project" at SAPO Codebits in Lisbon (December 3-5, 2009). Publishing these thoughts prior to the conference serves two purposes: getting some peer review on my findings and acting as a teaser for the actual talk. So please let me know — did I cover the relevant aspects or did I miss anything? What's your take on DVCS vs. the centralized approach? Why do you prefer one over the other? I'm looking forward to your comments!

Even though there are several distributed alternatives available for some years now (with Bazaar, git and Mercurial being the most prominent representatives here), many large and popular Open Source projects still use centralized systems like Subversion or even CVS to maintain their source code. While Subversion has eased some of the pains of CVS (e.g. better remote access, renaming/moving of files and directories, easy branching), the centralized approach by itself poses some disadvantages compared to distributed systems. So what are these? Let me give you a few examples of the limitations that a centralized system like Subversion has and how these affect the possible workflows and development practices.

I highly recommend you to also read Jon Arbash Meinel's Bazaar vs Subversion blog post for a more elaborate description of the limitations.

  • Most operations require interaction with the central repository, which usually is located on a remote server. Browsing the revision history of a file, creating a branch or a tag, comparing differences between two versions — all these activities involve communication via the network. Which means they are not available when you're offline and they could be slow, causing a slight disruption of your workflow. And if the central repository is down because of a network or hardware failure, every developer's work gets interrupted.
  • A developer can only checkpoint his work by committing his changes into the central repository, where it becomes immediately visible for everybody else working on that branch. It's not possible to keep track of your ongoing work by committing it locally first, in small steps, until the task is completed. This also means that any local work that is not supposed to be committed into the central repository can only be maintained as patches outside of version control, which makes it very cumbersome to maintain a larger number of modifications. This also affects external developers who want to join the project and work with the code. While they can easily obtain a checkout of the source tree, they are not able to put their own work into version control until they have been granted write access to the central repository. Until then, they have to maintain their work by submitting patches, which puts an additional burden on the project's maintainers, as they have to apply and merge these patches by hand.
  • Tags and branches of a project are created by copying entire directory structures around inside the repository. There are some recommendations and best practices on how to do that and how these directories should be arranged (e.g. by creating toplevel branches and tags directories), but there are several variants and it's not enforced by the system. This makes it difficult to work with projects that use a non-standard way for maintaining their branches and can be rather confusing (depending on the amount of branches and tags that exist).
  • While creating new branches is quick and atomic in Subversion, it's difficult to resolve conflicts when merging or reconciling changes from other branches. Recent versions of Subversion added support for keeping better track of merges, but this functionality is still not up to par with what the distributed tools provide. Merging between branches used to drop the revision history of the merged code, which made it difficult to keep track of the origins of individual changes. This often meant that developers avoided developing new functionality in separate branches and rather worked on the trunk instead. Working this way makes it much harder to keep the code in trunk a stable state.

Having described some downsides of the centralized approach, I'd now like to mention some of the most notable aspects and highlight a few advantages of using a distributed version control system for maintaining an Open Source project. These are based on my own personal experiences from working with various distributed systems (I've used Bazaar, BitKeeper, Darcs, git, Mercurial and SVK) and from following many other OSS projects that either made the switch from centralized to distributed or have been using a distributed system from the very beginning. For example, MySQL was already using BitKeeper for almost 2 years when I joined the team in 2002. From there, we made the switch to Bazaar in 2008. mylvmbackup, my small MySQL backup project, is also maintained using Bazaar and hosted on Launchpad.

Let me begin with some simple and (by now) well-known technical aspects and benefits of distributed systems before I elaborate on what social and organizational consequences these have.

In contrast to having a central repository on a single server, each working copy of a distributed system is a full-blown backup of the other repository, including the entire revision history. This provides additional security against data loss and it's very easy to promote another repository to become the new master branch. Developers simply point their local repositories to this new location to pull and push all future changes from there, so this usually causes very little disruption.

Disconnected operations allow performing all tasks locally without having to connect to a remote server. Reviewing the history, looking at diffs between arbitrary revisions, applying tags, committing or reverting changes can all be done on the local repository. These operations take place on the same host and don't require establishing a network connection, which also means they are very fast. Changes can later be propagated using push or pull operations - these can be initiated from both sides at any given time. As Ian Clatworthy described it, a distributed VCS decouples the act of snapshotting from the act of publishing.

Because there is no need to configure or set up a dedicated server or separate repository with any of today's popular DVCSes, there is very little overhead and maintenance required to get started. There is no excuse for not putting your work into revision control, even if your projects starts as a one-man show or you never intend to publish your code! Simply run "bzr|git|hg init" in an existing directory structure and you're ready to go!

As there is no technical reason to maintain a central repository, the definition of "the code trunk" changes from being defined by a technical requirements into a social/conventional one. Most projects still maintain one repository that is considered to be the master source tree. However, forking the code and creating branches of a project change from being an exception into being the norm. The challenge of the project team is to remain the canonical/relevant central hub of the development activities. The ease of forking also makes it much simpler to take over an abandoned project, while preserving the original history. As an example, take a look at the zfs-fuse project, which got both a new project lead and moved from Mercurial to git without losing the revision history or requiring any involvement by the original project maintainer.

Both branching and merging are "cheap" and encouraged operations. The role of a project maintainer changes from being a pure developer and committer to becoming the "merge-master". Selecting and merging changes from external branches into the main line of development becomes an important task of the project leads. Good merge-tracking support is a prerequisite for a distributed system and makes this a painless job. Also, the burden of merging can be shared among the maintainers and contributors. It does not matter on which side of a repository a merge is performed. Depending on the repository relationships and how changes are being propagated between them, some DVCSes like Bazaar or git actually provide several merge algorithms that one can choose from.

Having full commit rights into his one's own branch empowers contributors. It encourages experimenting and lowers the barrier for participation. It also creates new ways of collaboration. Small teams of developers can create ad-hoc workgroups to share their modifications by pushing/pulling from a shared private branch or amongst their personal branches. However, it still requires the appropriate privileges to be able to push into the main development branch.

This also helps to improve the stability of the code base. Larger features or other intrusive changes can be developed in parallel to the mainline, kept separate but in sync with the trunk until they have evolved and stabilized sufficiently. With centralized systems, code has to be committed into the trunk first before regression tests can be run. With DVCSes, merging of code can be done in stages, using a "gatekeeper" to review/test all incoming pushes in a staging area before merging it with the mainline code base. This gatekeeper could be a human or an automated build/test system that performs the code propagation into the trunk based on certain criterions, e.g. "it still compiles", "all tests pass", "the new code adheres to the coding standards". While central systems only allow star schemas, a distributed system allows workflows where modifications follow arbitrary directed graphs.

Patches and contributions suffer less from bit rot. A static patch file posted to a mailing list or attached to a bug report may no longer apply cleanly by the time you look into it. The underlying code base has changed and evolved. Instead of posting a patch, a contributor using a DVCS simply provides a pointer to his public branch of the project, which he hopefully keeps in sync with the main line of development. From there, the contribution can be pulled and incorporated at any time. The history of every modification can be tracked in much more detail, as the author's name appears in the revision history (which is not necessarily the case when another developer applies a patch contributed by someone else).

A DVCS allows you to keep track of local changes in the same repository, while still being able to merge bug/security fixes from upstream. Example: your web site might be based on the popular Drupal CMS. While the actual development of Drupal still takes place in (ghasp) CVS, it is possible to follow the development using Bazaar. This allows you to stay in sync with the ongoing development (e.g. receiving and applying security fixes for an otherwise stable branch) and keeping your local modifications under version control as well.

I've probably just scratched the surface on what benefits distributed version control systems provide with this blog post. Many of these aspects and their consequences are not fully analyzed and understood yet. In the meanwhile, more and more projects make the switch, gather experiences and establish best practices. If you're still using a centralized system, I strongly encourage you to start exploring the possibilities of distributed version control. And you don't actually have to "flip the switch" immediately — most of the existing systems happily interact with a central Subversion server as well, allowing you to benefit from some of the advantages without you having to convert your entire infrastructure immediately.

Here are some pointers for further reading on that particular subject:

Oct 29 2009
Oct 29

So you're a small startup company, ready to go live with your product, which you intend to distribute under an Open Source License. Congratulations, you made a wise decision! Your developers have been hacking away frantically, getting the code in good shape for the initial launch. Now it's time to look into what else needs to be built and setup, so you're ready to welcome the first members of your new community and to ensure they are coming back!

Keep the following saying in mind, which especially holds true in the Open Source world: "You never get a second chance to make a first impression!". While the most important thing is of course to have a compelling and useful product, this blog post is an attempt to highlight some other aspects about community building and providing the adequate infrastructure. This insight is based on my own experiences and my observations from talking with many people involved in OSS startups and projects.

First of all, realize that your community is diverse. They have different expectations, skills and needs. Pamper your early adopters. They are the multipliers that help you to spread the word, if they are convinced and excited about what you provide. Put some faith and trust in them and listen to their input. In the beginning, you might want to focus on your developer community and the tech-savvy early adopters, but this of course depends on the type of product you provide and on what your target audience looks like. In any case, make sure that you provide the necessary infrastructure to cater the respective needs of these different user bases.

Also remember that you can not overcommunicate with your community. Blog heavily, write documentation/FAQs/HOWTOs, build up Wiki content and structure, create screencasts. Don't rely on the community to create any of this in the early stages. But be prepared to embrace and support any activities, if they arise. Solicit input, provide opportunities and guidelines for participation!

While it's tempting to do: don't establish too many communication channels in the beginning. Keep it simple and reduce the different venues of communication to an absolute minimum at this point. A new forum with many different topics but no comments looks like an art gallery with a lot of rooms, but they are either empty or there's just a single picture hanging at the wall. Nobody wants to visit that, he'd feel lost in the void. At the early stage of a project, I think it's essential to keep the discussions in as few places as possible. This helps you to identify your key community contributors (the "regulars" aka the "alpha geeks") and to build up personal relationships with them (and among themselves).

Consider establishing a forum with only a few topics, start with one or two mailing lists. Also make sure that these are actively being followed (e.g. by yourself or your developers) and that questions are being answered! I personally prefer mailing lists over forums, but I'm probably not representative. Ideally, it would be nice if there would be a unified communication hub that supports both posting via the web site like a forum, or via email or NNTP (similar to Google Groups). This keeps the discussions on one central place (which eases searching for specific keywords/topics) and still allows users to choose their preferred means of communication. Unfortunately, I haven't really found any suitable platform for this approach yet — suggestions are welcome! And once your community grows and people start complaining about too many or off-topic discussions, you can think about further separation of the discussion topics.

Allow your users to submit and comment on issues and feature requests by providing a public bug/feature tracking system. Use this system for your release tracking and planning as well, to give your users a better insight into what they can expect from upcoming versions. Also, make it very clear to your users where bug reports and feature requests should be sent to! Should one use the Forums or the bug tracker for that? A mailing list or forum makes it easier for users to participate in these discussions, but makes it more difficult to keep track of them and to ensure they are being followed up on. For the sake of simplicity, I would actually suggest to remove any separate forums about these topics. Instead, educate your community early about which is the right tool and venue to use for such requests. This saves time and resources on your side and helps to build up an initial core of community members that can then educate others about "the ropes". Otherwise you end up with the burden of keeping track of every feature request or bug report that was posted somewhere, ensuring it has been added to the bug tracker...

If your community infrastructure consists of separate building blocks to provide the required functionality (e.g. forums, bug tracking, wiki), consider setting up a single-sign on (SSO) technology and establish a unified look and feel between these applications. Your users should not be required to log in with more than one username and password, and every application should share the same login and profile data. However, only require a login, if absolutely necessary! Many users feel alienated by having to enter their personal data, even if they only want to lurk around or browse through existing discussions or documentation. As an additional benefit, it helps you to quickly identify your "community stars" in the various sections of your site: Who reports the most bugs? Who is the most helpful person on our Forums? This information could also be published on your community site, giving users the opportunity to build up reputation and karma. Community infrastructure sites like Drupal or Joomla provide an excellent foundation to get you started, while offering enough room for improvement and additional functionality at a later point.

Lower the entrance barrier and make it as easy as possible for people to get started with your application. Don't just throw a source archive at them, hoping that someone else will take care of doing the binary builds. Put some effort into building and providing binary, ready-to-install packages for the most popular platforms that your target audience is likely to use. The three most important platforms to cover are Microsoft Windows, Mac OS X and Linux. While users of the latter usually have the required tools and experience in building stuff from source, Windows and Mac users are usually "spoiled" and don't want to be bothered with having to install a full-fledged development environment before they could eventually evaluate your application.

When it comes to Linux distributions, you should look into building distribution-specific packages. This heavily depends on the requirements for external libraries that your application is using, which might differ on the various flavours of Linux. Depending on the purpose of your application, you may either focus on the more desktop/developer-centric distributions like Mandriva, openSUSE, Ubuntu, or on the distributions commonly used in server environments, e.g. Debian, CentOS, Fedora, RHEL, SLES (Yes, I am aware that most distributions are multi-purpose and serve both tasks equally well, and it's of course possible to use each of them to get the job done — it's a matter of taste and preference). If possible, make use of existing build infrastructure like Fedora's Koji build system, Launchpad's Personal Package Archives (PPA) or the openSUSE Build Service (which even allows you to build RPMs and DEBs for non-SUSE distributions) to automate the building and provisioning of distribution-specific packages for a wide range of Linux flavours. If your application is slightly complicated to install or set up, consider providing a live demo server that people can access via the Internet to give it a try. Alternatively, create ready-to-run images for virtual machines like Parallels, VirtualBox or VMWare. Everything that makes it easier to access, install and test your software should be explored.

In closing, make community involvement a part of your company culture and make sure that you preserve enough time to take care of it. Community engagement has so many different aspects, you don't necessarily have to be a developer or a very technical person to get involved. I'm aware that doing community work can be seen as a distraction and definitely takes away time from other tasks. But community involvement should become a habit and a well-accepted part of everyone's job — this is much easier to establish while you're still small and growing.

Sep 14 2009
Sep 14

I had to move 264 databases from one Linux server (Fedora) to another (CentOS), hard copying /var/lib/mysql wasn't a good idea because version of two MySQL databases were different and i had to repair all the tables after move so i decided to write a very short and usefull bash script.

It dumps all the databases but each database into a different file :

for I in `echo "show databases;" | mysql -u myuser --password="" | grep -v Database`; do mysqldump -u myuser --password="" $I > "$I.sql"; done

As you can see you can put each .sql file into a different folder if you want, and it's possible to compress each file using piping :)

Sep 10 2008
Sep 10

MySQL UniversityTomorrow (Thursday, 11th of September) at 9:00 PST/16:00 UTC/17:00 GMT/18:00 CET, there will be an new free MySQL University Session. MySQL University started as an internal training program for MySQL engineers, to share and spread knowledge about their areas of expertise and has been available to the public for quite some time now. It covers a wide range of technical topics around the MySQL Server and usually takes place once per week.

For the first time, the presentation will not be performed by (former) MySQL employees/developers, but by two of our "Sun Classic" colleagues: Jyri Virkki (OpenSolaris Web Stack community lead) and Murthy Chintalapati (Sr Engineering Manager, Web Stack development) will talk about the OpenSolaris Web Stack:

OpenSolaris Web Stack is an OpenSolaris project and community building an integrated stack of popular open source web tier infrastructure technologies such as Apache HTTP server, MySQL, memcached, PHP and Ruby On Rails optimized for Solaris platform. This session introduces OpenSolaris Web Stack, its status and future development including addition of newer technologies such as lighttpd, Varnish etc., as well as the ease of use features for developers and deployers. We will also be discussing an experimental web stack IPS package repository and it could be leveraged to build and make available popular end user applications such as Drupal.

MySQL University sessions are free to attend - all you need is an IRC client (to post your questions and comments) and an audio player capable of playing back an OGG audio stream, so you can listen to what is being said. See the Instructions for Attendees on the MySQL University pages for more information on how to log in and attend. The audio stream will be recorded and published on the MySQL University pages for later consumption, in case you can't make it or want to listen to a previous session.

Sep 04 2008
Sep 04

One of the sessions at DrupalCon I attended was Larry Garfield's talk about "Drupal Databases: The Next Generation", which gave me a good insight into the current state of the Drupal database layer and how they plan to overhaul it for Drupal 7. The key points that I took away:

  • A new API based on PDO
  • Object-oriented, requiring PHP5
  • Support for using prepared statements
  • A unified access API
  • A query builder
  • More support for other database systems (currently Drupal supports MySQL and PostgreSQL only). In particular, they are keen on adding SQLite support, to ease local development.
  • Support for master-slave replication (by randomly distributing reads among the hosts)
  • Support for using different database types in parallel (e.g. using SQLite for read-only tables, MySQL for everything else)

The slides and a video of the presentation are available, if you want to check it out. There is a task list on the Drupal.org web site that keeps track of the ongoing activities.

Sep 03 2008
Sep 03

I had a nice chat with Kieran from Acquia at DrupalCon last week - we discussed how people running local Drupal user groups could expand their outreach into other communities, in particular into the MySQL User Groups. Scott Mattoon captured our conversation on video, which is now available on blip.tv:


Video thumbnail. Click to play
Click To Play

The gist of what we talked about: if you are organizing a local Drupal User Group Meetup, check out http://mysql.meetup.com to find out if there is a local MySQL user group nearby. Chances are high that there is! And if not, you may find at least people in the area that would be interested in meeting about this subject. We also maintain list of user groups on the MySQL Forge Wiki. Consider extending your invitation for your next meetup to these folks as well! It's very likely that someone would be interested to learn more about Drupal. The same applies to other user groups, e.g. from the PHP community.

I personally run a MySQL User Group here in Hamburg, and I usually extend my invitations to a number of channels and mailinglists, including the local PHP, Perl and Linux User Groups. Every once in a while, a new member from these communities shows up.

So this thing works the other way around, too: if you are the organizer of a MySQL Meetup, have you thought about looking at http://groups.drupal.org/ yet? Maybe you will find a Drupal User Group in your very own town that you could invite to learn more about MySQL and exchange contacts? If you are looking for more tips on how to run and expand your User Group, I've created a page with useful hints about this topic on the MySQL Forge Wiki. Your feedback and additions are very welcome!

Sep 01 2008
Sep 01

I'm back home from DrupalCon 2008 now - it has been a great event! I met a lot of nice people from the Drupal Community and learned a lot about this CMS. I've been very busy in uploading the remaining pictures from the event to my gallery - so here's for your viewing pleasure:

I also gave two talks and held a BoF there - the slides have now been attached to the session nodes, one of them (the HA session) even includes a video recording:

I've also uploaded some pictures from FrOSCon to my Gallery now, hope you enjoy them! The slides of my FrOSCon talks are now uploaded to the conference system as well:

Aug 28 2008
Aug 28

Hello and greetings from DrupalCon 2008 in Szeged, Hungary!

We (Thierry Manfé, Scott Mattoon and myself) are having a great time manning our booth and talking about Drupal, MySQL and Open [email protected] with the nice crowd of Drupal Users and Developers here. Sun is a gold sponsor of the event and we're giving a number of sessions as well.

Today I gave my first presentation about MySQL Backup and Security - Best practices - unfortunately I ran a tad bit out of time at the end... The slides have already been attached to the session page, so you can read up on the last few things I was going to talk about. Feel free to contact me, if you have further questions!

Tomorrow I'll be talking about High availability solutions for MySQL: An Overview and practical demo, which will also include a practical demonstration of a two-Node Linux Cluster, performed by Jakub Suchy. In the afternoon, I will also hold a BoF about bzr - The Bazaar source revision control system

I've also uploaded some pictures from the event (and some impressions from the city) on my gallery (more will follow later). Enjoy!

Aug 05 2008
Aug 05

I am going to Drupalcon Szeged I just got informed that two of my session proposals for DrupalCon 2008 got accepted - I will be speaking about the following topics there:

The second talk will be held in cooperation with Jakub Suchy, who will take over the practical demo. Sun Microsystems is a Gold Sponsor of the event and I am glad that we can show some support for this truly amazing and vibrant community CMS. DrupalCon 2008 will take place from August, 27th-30th in Szeged, Hungary. The list of proposed talks looks truly impressive! Among the key note speakers will be Dries Buytaert and Rasmus Lerdorf. I look very much forward to this conference. If you have a chance, make sure to attend it!

Apr 28 2008
Apr 28

This article describes how to install the Drupal 6.2 CMS on MySQL 6.0, using the Falcon Storage Engine. The operating system is a default Ubuntu 8.04 "Hardy Heron" (x86) installation.

I will make a few assumptions here, in order to keep the instructions simple: a fresh OS install, no other MySQL databases or web services are running or have already been installed. Both MySQL and the web server are installed on the same host. You should be able to become root to install packages and to have access to the local file system and the system configuration.

This article will explain how to install and configure Apache/PHP, MySQL 6.0 and Drupal 6.2.

Prerequisites

Running Drupal requires a web server (e.g. Apache) and PHP. We will use the packages as shipped with the distribution and will then install a MySQL 6.0 preview binary from http://dev.mysql.com. Other web servers like lighttpd will work equally well, but this article focuses on using the Apache web server.

Fortunately the MySQL 5.0 client applications as shipped with Ubuntu Linux are compatible with the MySQL 6.0.x client/server protocol, so we only make use of the 6.0 server and will use the installed, pre-compiled client applications and libraries to connect to it - there is no need to recompile PHP or anything to get going!

First of all you have to make sure the following packages have been installed (e.g. by using a package management tool like the Adept Package Manager, synaptic, aptitude or apt-get):

  • apache2
  • libapache2-mod-php5
  • php5
  • php5-common
  • php5-mysql
  • php5-gd
  • mysql-client-5.0

To enable the mod_rewrite Apache module (as recommended for Drupal), you need to enter the directory /etc/apache2/mods-enabled and create a symlink to the module loading instructions:

cd /etc/apache2/mods-enabled/
sudo ln -s ../mods-available/rewrite.load .

This will ensure, that mod_rewrite will be loaded when Apache starts up.

Additionally, you have to edit the file /etc/apache2/sites-available/default and make one change. In the directives for the Directory /var/www, change AllowOverride from "None" to "All". This will make sure that Drupal can enable the rewrite engine to allow nicer looking URLs.

Now restart the Apache server to apply the changes:

sudo /etc/init.d/apache2 restart

To verify that Apache is up and running, try opening http://localhost/ in a browser on the same machine that runs the web server. You should get a simple page, stating that "It works!".

Installing the MySQL 6.0 Falcon preview

Now that the web server is up and running, we need to install a MySQL database server that the Drupal installation can use. Download mysql-6.0.5-alpha-pb87-linux-x86.tar.gz (or any newer package, if available) from http://downloads.mysql.com/forge/falcon_feature_preview/

Create a /etc/mysql/my.cnf file with the following content (replacing the existing file, if necessary):

[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
basedir=/usr/local/mysql
datadir=/usr/local/mysql/data
socket=/var/run/mysqld/mysqld.sock
default-storage-engine=falcon

The default-storage-engine option will make sure that every CREATE TABLE statement will default to using the Falcon storage engine. Now extract the binary tarball distribution into /usr/local and perform the following steps to finalize the installation/configuration:

$ sudo groupadd mysql
$ sudo useradd -g mysql mysql
$ cd /usr/local
$ sudo tar zxvf ~/mysql-6.0.5-alpha-pb87-linux-x86.tar.gz -C /usr/local
$ cd /usr/local
$ sudo ln -s mysql-6.0.5-alpha-pb87-linux-x86 mysql
$ cd mysql
$ sudo chown -R mysql .
$ chgrp -R mysql .
$ scripts/mysql_install_db --user=mysql
$ sudo chown -R root .
$ sudo chown -R mysql data
$ sudo ./bin/mysqld_safe --user=mysql &

The installation procedure is outlined in more detail in the reference manual at http://dev.mysql.com/doc/refman/6.0/en/installing-binary.html

If you want to enable the automatic startup of MySQL at system bootup time, you need to install an init script in /etc/init.d/ - follow the instructions as outlined in the reference manual. Note that the mysql.server script has been moved from the directory support-files to share/mysql for the binary tarball distributions and that the current 6.0 documentation has not been updated yet (I filed BUG#36382 about this).

Now start the server using the mysqld_safe script:

$ sudo /usr/local/mysql/bin/mysqld_safe &

Next you should verify that you can connect to the server and that the Falcon storage engine is enabled:

$ mysqladmin version
mysqladmin  Ver 8.41 Distrib 5.0.51a, for debian-linux-gnu on i486
Copyright (C) 2000-2006 MySQL AB
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL license

Server version          6.0.5-alpha-pb87
Protocol version        10
Connection              Localhost via UNIX socket
UNIX socket             /var/run/mysqld/mysqld.sock
Uptime:                 43 min 6 sec

Threads: 1  Questions: 6  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.2

$ mysql -u root
[email protected]:(none) > SELECT * FROM information_schema.engines WHERE engine='Falcon';
+--------+---------+-----------------------+--------------+----+------------+
| ENGINE | SUPPORT | COMMENT               | TRANSACTIONS | XA | SAVEPOINTS |
+--------+---------+-----------------------+--------------+----+------------+
| Falcon | DEFAULT | Falcon storage engine | YES          | NO | YES        |
+--------+---------+-----------------------+--------------+----+------------+
1 row in set (0.12 sec)

Before performing the actual Drupal installation, you need to create a Drupal Database and a user account. I chose "drupal" as the user name and password, please use some more sensitive values for your own setup!

[email protected]:(none) > CREATE DATABASE drupal;
Query OK, 1 row affected (0.01 sec)

[email protected]:(none) > GRANT ALL ON drupal.* to 'drupal'@'localhost' IDENTIFIED BY 'drupal';
Query OK, 0 row affected (0.00 sec)

Now MySQL 6.0 is installed and ready!

Installing Drupal 6.2

Now that the web and database server have been set up and configured, it's time to perform the installation of our application, the Drupal content management system. Start by downloading drupal-6.2.tar.gz from http://drupal.org/ (by clicking on the Drupal 6.2 download link on the front page).

Remove the default start page /var/www/index.html and extract the content of the drupal tarball into this directory. Then change the ownerships of these files to the user that apache runs under (www-data by default):

$ sudo rm /var/www/index.html
$ sudo tar --strip-components=1 -zxvf drupal-6.2.tar.gz -C /var/www
$ sudo chown -R www-data /var/www

The Drupal package installation itself is now done, the remaining installation and configuration steps are performed in a browser by opening http://localhost/ in your browser (reload the page if you still see the "It works" default page).

DrupalInstall_1

Follow the instructions in the Drupal installation manual on how to perform the actual installation. In the "Database Configuration" dialogue, use the MySQL database username and password that you created earlier.

DrupalInstall_2

Now the Drupal installer should perform its duty and you should see you fresh Drupal installation up and running!

DrupalInstall_3

Once the installation has finished, let's verify that we're really running on Falcon by running the following query in a MySQL command line client:

mysql> SELECT TABLE_NAME, ENGINE from information_schema.tables WHERE TABLE_SCHEMA='drupal';
+-------------------------+--------+
| TABLE_NAME              | ENGINE |
+-------------------------+--------+
| access                  | Falcon |
| actions                 | Falcon |
| actions_aid             | Falcon |
| authmap                 | Falcon |
| batch                   | Falcon |
| blocks                  | Falcon |
| blocks_roles            | Falcon |
| boxes                   | Falcon |
| cache                   | Falcon |
| cache_block             | Falcon |
| cache_filter            | Falcon |
| cache_form              | Falcon |
| cache_menu              | Falcon |
| cache_page              | Falcon |
| cache_update            | Falcon |
| comments                | Falcon |
| files                   | Falcon |
| filter_formats          | Falcon |
| filters                 | Falcon |
| flood                   | Falcon |
| history                 | Falcon |
| menu_custom             | Falcon |
| menu_links              | Falcon |
| menu_router             | Falcon |
| node                    | Falcon |
| node_access             | Falcon |
| node_comment_statistics | Falcon |
| node_counter            | Falcon |
| node_revisions          | Falcon |
| node_type               | Falcon |
| permission              | Falcon |
| role                    | Falcon |
| sessions                | Falcon |
| system                  | Falcon |
| term_data               | Falcon |
| term_hierarchy          | Falcon |
| term_node               | Falcon |
| term_relation           | Falcon |
| term_synonym            | Falcon |
| url_alias               | Falcon |
| users                   | Falcon |
| users_roles             | Falcon |
| variable                | Falcon |
| vocabulary              | Falcon |
| vocabulary_node_types   | Falcon |
| watchdog                | Falcon |
+-------------------------+--------+
46 rows in set (0.00 sec)

Looks like we were successful - all Drupal tables are using the Falcon storage engine! Congratulations.

From here on, you can configure and change Drupal to your heart's content. Note however, that additional Drupal modules may contain code that is specific to the MyISAM or InnoDB storage engine, your mileage may vary. In that case it would be great to notify the module developers about these incompatibilities.

If you want to quickly populate a Drupal installation with content for testing, you can use the "Devel" module. I used it to create 500 users and 10.000 test pages on my demo installation. Even though this was performed within a virtual machine running VirtualBox, the system still was very responsive and the creation of the content proceeded amazingly fast! But I did not perform any serious benchmark or load tests (it would not make much sense in a VM anyway).

Feb 14 2008
Feb 14

Yesterday, Drupal 6.0 was officially released - check out this screencast to get a 29-minute tour on the new features in this release.

We'd like to congratulate the Drupal Developer Team and Community for reaching this milestone and are happy that the MySQL Server continues to serve well as the database backend for this awesome content management platform!

I had the pleasure of evaluating and reviewing a previous release of Drupal for the Open Source Content Management System Award from Packt Publishing and it has been one of my favourites.

Keep up the good work!
 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web