May 05 2012
May 05

Over the years, I've accumulated a large collection of e-books and digital music albums, not to mention family pictures. Information overload is not a philosophical point of view, it's a real problem that forces me to devote time, effort and money to maintain that collection.

That's probably why so many media organizers exist. Because I believe that all applications should be delivered from the Web, and because no ready-made Web media organizer struck me as fulfilling my needs, I started to write my own using Drupal 6, dubbed Mediatheque. Here are the most important design goals I had in mind:

  • The main job of the organizer is to "ingest" media files by processing them to extract metadata, which is made available for searching and browsing.
  • I should be able to point the system to different "volumes" containing my media. These volumes are essentially folders that are present somewhere on the network.
  • No extra file space (beyond database growth) should be required to ingest media files.
  • The system should be able to store arbitrary metadata about the media.
  • I should be able to add new information handlers for media at any time - the system would silently re-process the files.
  • New version of media handlers should also be supported by re-processing the files.
  • The system should be smart about recognizing files across name changes and metadata updates.
  • The system should be robust in the face of plugin errors.
  • Media display should be dependent on its type.

Based on these goals, here are the significant implementation details of the current system:

  • Drupal Queue is used to process the files in the background. In fact, I am using 3 queues as a processing pipeline:

Starting with a volume root, the folders queue finds and enqueues the files, the files queue creates Drupal documents (nodes) and enqueues them for plugin processing, and the plugins queue applies the registered plugins to extract metadata.

  • Each processing step produces a log entry. This allows to track the errors that are produced during media processing. The log is a Drupal table that is integrated with Views via hook_views_data. Each log entry contains information about the processed document, the file hash, the plugin name and version, and the status code of the processing, allowing to detect the cases where re-processing should occur. Mediatheque log

  • Plugins are metadata extractors that are associated with MIME type patterns. For example, Mediatheque currently comes with an ID3 extractor that uses the getID3 library and an e-book metadata extractor that uses the Google Books API via Zend Gdata. Mediatheque plugins

  • Metadata extracted by the plugins is stored in the document node using my CCK Metadata module. This is a simple name/value CCK field and the document node includes this field with unlimited cardinality. The metadata pairs returned by each plugin are prefixed with a unique plugin prefix to be able to handle re-processing. CCK Metadata also allows custom formatting of specific metadata entries via hook_cck_metadata_fields:

 * Implementation of hook_cck_metadata_fields().
function mediatheque_cck_metadata_fields() {
  return array(
    'isbn:thumbnail' => array(
      'formatter' => 'mediatheque_formatter_thumbnail',

Eventually, this metadata system should reuse RDF instead of using a custom design. Mediatheque plugins

  • Finally, the main mediatheque view is created as a regular view with filterable metadata. The metadata name exposed filter is converted to a drop-down via my Views Hacks' Views Selective Exposed Filters module whose job is to restrict the available filter values to those found in the view results. Mediatheque plugins

Mediatheque is still very much a work in progress. However, many conceptual challenges have already been solved, and I would love to hear your feedback!

AttachmentSize 132.92 KB 53.99 KB 50.21 KB 87.44 KB 163.26 KB

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web