Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Converting my old content to markdown

Parent Feed: 

So I have just converted all the content on this blog to markdown. It was rather painful. I had really old content ranging as far back as 2005 in here, and I went through about 3 distinct markup filters here, most of which were irregular and changing according to the position of the sun, the drupal.org releases and wind speed. Now it's all markdown. This involved patience, drush and 3 hours of wasted time. Now, the fact that Markdown picked up speed is always a little strange to me. The syntax isn't particularly complete, which leads to non-standard extension like markdown-extra popping up, with the inevitable variations according to the language. Github, for example, has its own flavor of the famous markup. Finally, Drupal's filters are kind of klunky: the usual < url > markup doesn't work. So things are a little weird, but Markdown seems to be here to stay, or anyways it's the only markup I have seen supported reliably across multiple CMS and sites. One has to wonder why we are still stuck with plain old HTML on Drupal.org...

The actual conversion

The conversion was rather annoying. I had to track down all those formats, which meant mostly converting a wiki-like syntax from the freelinking module to markdown. (It's actually more complicated than that, because there was also the simplewiki filter, but let's ignore that because they were few and I just did them by hand.)

In the end, I arrived to the following script:

 2) {
    $mdwn = "[" . $match[2] . "](" . $match[1] . ")";
  } else {
    $mdwn = "[" . $match[1] . "](" . $match[1] . ")"; # hack: drupal fails on 
  }
  print "$orig\t=>\t$mdwn\n";
  return $mdwn;
}

$q = db_query("select node.nid, format, FROM_UNIXTIME(created) AS c, body, teaser, node.title from node_revisions inner join node on node.vid = node_revisions.vid where format = 1 AND ( teaser like '%[[%' OR body like '%[[%' ) order by created LIMIT 1;");

while ($row = db_fetch_object($q)) {
  print $row->nid . " | " . $row->format . " | " . $row->c . " | " . $row->title . "\n";
  $node = null;
  foreach (array('teaser', 'body') as $part) {
    print "checking $part... ";
    $newpart = preg_replace_callback('/\[\[(\|]*)(?:\|(]*))?\]\]/', 'wiki2mdwn', $row->$part);
    if ($newpart != $row->$part) {
      print "replacement... ";
      if (is_null($node)) {
        $node = node_load($row->nid);
        print "node loaded... ";
      }
      $node->$part = $newpart;
    }
  }
  if (!is_null($node)) {
    node_save($node);
    print "node {$node->nid} saved... ";
  }
  print "\n";
}

$q = db_query("SELECT nid, cid,FROM_UNIXTIME(timestamp),format, subject, comment FROM comments WHERE format = 1 AND comment LIKE '%[[%' ORDER BY cid LIMIT 1;");

while ($row = db_fetch_object($q)) {
  print "checking comment {$row->cid} in node {$row->nid} with subject {$row->subject}... ";
  $newcom = preg_replace_callback('/\[\[(\|]*)(?:\|(]*))?\]\]/', 'wiki2mdwn', $row->comment);
  print "\nsaving... ";
  db_query("UPDATE comments SET comment = '%s' WHERE cid = %d", $newcom, $row->cid);
  print "comment {$row->cid} in node {$row->nid} saved.\n";
}

Yes. This is klunky and ugly. But it works. If you have more than... say.. 200 nodes or comments to convert, I would strongly recommend optimizing this into SQL directly, but I was worried I would break stuff so I preferred operating on a preg_replace_callback() than plain SQL.

Oh, and this is a drush snippet, for those who don't know about that (rather old) drush feature, by the way. :) To run this, you basically dump this in a file and run it:

drush @anarcat.koumbit.org wiki2mdwn.php

Notice how I use a drush alias there - this one is automatically created by the Aegir this site lives on. Time saver.

So long and annoying, but at long last done!

Created . Edited .
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web