Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Importing mailman archives into Drupal

Parent Feed: 

My client wanted to be able to search their list manager archives (uses mailman) with Solr. We already had a pretty major investment in Drupal with about 80K PDF files. In the past, each of the different databases were managed by seperate dtSearch indexes. With the new, Drupal system, we are now able to consolidate everything into one master index. With the special ‘faceting’ that is provided within Solr/Drupal, it becomes very easy to drill from the general request down to the specifics.

Well, this article is going to get a bit specific on the why and how of the integration we did between mailman data and Drupal.

Mailman keeps its archives in a directory structure that provides a single file <listname>.mbox and a directory <listname>. I selected the directory as my driver for getting all the files across. After I got everything written, some more research indicated that I might have been better to use the <listname>.mbox file, as this is ‘authoritative’ for each list that mailman handles. But, I have working code now, so I will live with this decision for the time being.

The general process is as follows:

A) One Time Procedures

1) create a directory under sites/default/files. I called mine mailman. This is where all the list subdirectories will live.

2) create a Content Type in Drupal using just Title and Body.

3) install my Python script in the directory from step A.1 above.

4) Make sure that a current release of drush is installed

5) install my drush script in the sites/default directory

B) Repeat procedures

1) rsync all of your lists that you want from the mailman server to the A.1 directory

2) run the Python script. This will create a list of all the eligible archive files, and then call the drush script once for each file found.

RSS Tags: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web