Jul 13 2017
Jul 13
July 13th, 2017

When creating the Global Academy for continuing Medical Education (GAME) site for Frontline, we had to tackle several complex problems in regards to content migrations. The previous site had a lot of legacy content we had to bring over into the new system. By tackling each unique problem, we were able to migrate most of the content into the new Drupal 7 site.

Setting Up the New Site

The system Frontline used before the redesign was called Typo3, along with a suite of individual, internally-hosted ASP sites for conferences. Frontline had several kinds of content that displayed differently throughout the site. The complexity with handling the migration was that a lot of the content was in WYSIWYG fields that contained large amounts of custom HTML.

We decided to go with Drupal 7 for this project so we could more easily use code that was created from the MDEdge.com site.

“How are we going to extract the specific pieces of data and get them inserted into the correct fields in Drupal?”

The GAME website redesign greatly improved the flow of the content and how it was displayed on the frontend, and part of that improvement was displaying specific pieces of content in different sections of the page. The burning question that plagued us when tackling this problem was “How are we going to extract the specific pieces of data and get them inserted into the correct fields in Drupal?”

Before we could get deep into the code, we had to do some planning and setup to make sure we were clear in how to best handle the different types of content. This also included hammering out the content model. Once we got to a spot where we could start migrating content, we decided to use the Migrate module. We grabbed the current site files, images and database and put them into a central location outside of the current site that we could easily access. This would allow us to re-run these migrations even after the site launched (if we needed to)!

Migrating Articles

This content on the new site is connected to MDEdge.com via a Rest API. One complication is that the content on GAME was added manually to Typo3, and wasn’t tagged for use with specific fields. The content type on the new Drupal site had a few fields for the data we were displaying, and a field that stores the article ID from MDedge.com. To get that ID for this migration, we mapped the title for news articles in Typo3 to the tile of the article on MDEdge.com. It wasn’t a perfect solution, but it allowed us to do an initial migration of the data.

Conferences Migration

For GAME’s conferences, since there were not too many on the site, we decided to import the main conference data via a Google spreadsheet. The Google doc was a fairly simple spreadsheet that contained a column we used to identify each row in the migration, plus a column for each field that is in that conference’s content type. This worked out well because most of the content in the redesign was new for this content type. This approach allowed the client to start adding content before the content types or migrations were fully built.

Our spreadsheet handled the top level conference data, but it did not handle the pages attached to each conference. Page content was either stored in the Typo3 data or we needed to extract the HTML from the ASP sites.

Typo3 Categories to Drupal Taxonomies

To make sure we mapped the content in the migrations properly, we created another Google doc mapping file that connected the Typo3 categories to Drupal taxonomies. We set it up to support multiple taxonomy terms that could be mapped to one Typo3 category.
[NB: Here is some code that we used to help with the conversion: https://pastebin.com/aeUV81UX.]

Our mapping system worked out fantastically well. The only problem we encountered was that since we were allowing three taxonomy terms to be mapped to one Typo3 category, the client noticed some use cases where too many taxonomies were assigned to content that had more than one Typo3 category in certain use cases. But this was a content-related issue and required them to re-look at this document and tweak it as necessary.

Slaying the Beast:
Extracting, Importing, and Redirecting

One of the larger problems we tackled was how to get the HTML from the Typo3 system and the ASP conference sites into the new Drupal 7 setup.

The ASP conference sites were handled by grabbing the HTML for each of those pages and extracting the page title, body, and photos. The migration of the conference sites was challenging because we were dealing with different HTML for different sites and trying to get get all those differences matched up in Drupal.

Grabbing the data from the Typo3 sites presented another challenge because we had to figure out where the different data was stored in the database. This was a uniquely interesting process because we had to determine which tables were connected to which other tables in order to figure out the content relationships in the database.

The migration of the conference sites was challenging because we were dealing with different HTML for different sites and trying to get get all those differences matched up in Drupal.

A few things we learned in this process:

  • We found all of the content on the current site was in these tables (which are connected to each other): pages, tt_content, tt_news, tt_news_cat_mm and link_cache.
  • After talking with the client, we were able to grab content based on certain Typo3 categories or the pages hierarchy relationship. This helped fill in some of the gaps where a direct relationship could not be made by looking at the database.
  • It was clear that getting 100% of the legacy content wasn’t going to be realistic, mainly because of the loose content relationships in Typo3. After talking to the client we agreed to not migrate content older than a certain date.
  • It was also clear that—given how much HTML was in the content—some manual cleanup was going to be required.

Once we were able to get to the main HTML for the content, we had to figure out how to extract the specific pieces we needed from that HTML.

Once we had access to the data we needed, it was a matter of getting it into Drupal. The migrate module made a lot of this fairly easy with how much functionality it provided out of the box. We ended up using the prepareRow() method a lot to grab specific pieces of content and assigning them to Drupal fields.

Handling Redirects

We wanted to handle as many of the redirects as we could automatically, so the client wouldn’t have to add thousands of redirects and to ensure existing links would continue to work after the new site launched. To do this we mapped the unique row in the Typo3 database to the unique ID we were storing in the custom migration.

As long as you are handling the unique IDs properly in your use of the Migration API, this is a great way to handle mapping what was migrated to the data in Drupal. You use the unique identifier stored for each migration row and grab the corresponding node ID to get the correct URL that should be loaded. Below are some sample queries we used to get access to the migrated nodes in the system. We used UNION queries because the content that was imported from the legacy system could be in any of these tables.

SELECT destid1 FROM migrate_map_cmeactivitynode WHERE sourceid1 IN(:sourceid) UNION SELECT destid1 FROM migrate_map_cmeactivitycontentnode WHERE sourceid1 IN(:sourceid) UNION SELECT destid1 FROM migrate_map_conferencepagetypo3node WHERE sourceid1 IN(:sourceid) … SELECTdestid1FROMmigrate_map_cmeactivitynodeWHEREsourceid1IN(:sourceid)UNIONSELECTdestid1FROMmigrate_map_cmeactivitycontentnodeWHEREsourceid1IN(:sourceid)UNIONSELECTdestid1FROMmigrate_map_conferencepagetypo3nodeWHEREsourceid1IN(:sourceid)

Wrap Up

Migrating complex websites is rarely simple. One thing we learned on this project is that it is best to jump deep into migrations early in the project lifecycle, so the big roadblocks can be identified as early as possible. It also is best to give the client as much time as possible to work through any content cleanup issues that may be required.

We used a lot of Google spreadsheets to get needed information from the client. This made things much simpler on all fronts and allowed the client to start gathering needed content much sooner in the development process.

In a perfect world, all content would be easily migrated over without any problems, but this usually doesn’t happen. It can be difficult to know when you have taken a migration “far enough” and you are better off jumping onto other things. This is where communication with the full team early is vital to not having migration issues take over a project.

Web Chef Chris Roane
Chris Roane

When not breaking down and solving complex problems as quickly as possible, Chris volunteers for a local theater called Arthouse Cinema & Pub.

Jul 24 2010
Jul 24
Printer-friendly versionPDF version

If you're like most people who build using Drupal you want to build your site around nodes, that have titles, descriptions, tags, comments, etc. It's no surprise since those types of content oriented sites are the ones that make the world of the web go round. But what about another type of site that focuses not on nodes but on data contained in the Drupal database? Can you do it? Yes. Should you do it? Only if that sort of thing excites you or perhaps if a client requests it.

I'll give you a little background on where I'm coming from and what made me decide to play around with Drupal as a way to expose data sets. About a decade ago I was part of a group of people who built a first of its kind application to send and receive data from wholesale power markets in North America. The application was built on three tiers: a client, an application server, and a database server. The client and application server were written in Java and the database relied on Oracle technology. It was about as far away from open source as you could get. The product was successful though and we did hundreds of installs all across North America. What the product basically did for its owners was suck in data from a central server then expose that data to the user via tables and charts visible from the user interface. At some point last year I started wondering if I could do something similar with Drupal. So I started investigating, tried out different modules and realized that I could do even more with a Drupal install than we did with that product.

There are four tasks that need to be accomplished if you want to use Drupal as an effective data platform.

  1. Capture data to the Drupal database tables
  2. Setting the data types (text, numeric, etc.)
  3. Expose the data to the user using tables and charts
  4. Offer simplified download format options

 Capturing Data

I have been capturing data using the Table Wizard module. Table Wizard allows you to create a database table by uploading a delimited (like tab or comma separated) file. This works pretty well. I recently uploaded a file with over 300,000 records and it was in the database within a few minutes. After you upload your file you click on the table name to "analyze" the table. Table Wizard will let you know what columns you have in the table, the data types and identify the primary key. When you first upload your data you will want to go switch over to your database to change data types of the appropriate ones and identify a primary key. Once you have done that you can come back and re-analyze the table. 

Setting the Data Types

This is probably the step that will be the least welcome to those who are not familiar with databases. If you want to do extensive work with data in Drupal then you will want to become familiar enough with phpMyAdmin and MySQL data types to be able to navigate to your database and apply the appropriate settings for each type of data. You should also indicate which field contains the primary key for the table. The primary key is a unique value that identifies each record in a table. If you have a dataset that does not have a column with unique values then consider adding a column unique numeric code in each row. You can name the column something like record_id and then add values like 100001, 100002, 100003 and so on.

Expose the Data Using Tables and Charts

If you have been a good boy or girl and added a single field primary key then the Table Wizard module will let you check a box that is labeled, "Provide default view." This option automatically creates a view that you can expand upon as you see fit. Auto created table wizard views will have the tag "tw" added to them. You can also create a view by navigating to /build/views/add and looking for the radio button in the 'View Type' section that starts with Database table. At this point you need to a thing or two about the Views module to go further. There's not enough space to cover that here so I'm going to skip over the details and point you to the key modules and settings.

The standard views module will let you create a table from your data. You just have to choose a Page display and choose the Table style. You can then choose your fields which will appear as columns within the table. Give the page a Path and perhaps add an Exposed Filter or two to allow users to narrow down the data they are looking at. If you want to get a little fancy you can choose use Views Calc to create tables that include calculations like min, max or average on a set of numeric data.

Charts are a little more tricky but it can be done with the assistance of Charts and Graphs and Views Charts. Charts and Graphs allows you to integrate a number of different free charting solutions (including Google Charts) into your site. View Charts makes those features available as a Style (called "Charts") within the Views interface. The chart gets exposed on a page based on the Path that you define. You can also create a chart within the Block display type. There are other charting modules that exist for Drupal. All of them have some drawbacks but so far the Charts and Graphs/Views Charts combo has worked the best for me.

Offering A Download Option

The ability to download data probably won't be a must have feature for every site. After all someone could probably just copy the table and paste it into their favorite spreadsheet program. It's a nice feature to offer though and you can do it pretty easily with the help of the Views Bonus Pack module. With Views Bonus Pack installed you get the option to add a Feed display within the Views interface. With the Feed display selected you then get the option of setting a Style that includes the option for CSV and other popular formats like TXT, DOC, XLS and XML. You can then attach that display to the Table display and there will be an image that appears at the bottom of the table page that will generate a file download when clicked. Be sure to add the same filters to the CSV display that you have added to the Table display so the downloaded file reflects what the user is seeing with any filters selected. 

The usual caveats apply to the ideas and tips that I've offered here. At any time a better module could come along or an existing module could change and alter the process a bit. So tread lightly and do some exploration before you commit to building a data driven site using Drupal. In fact, even though there was a very recent release of Table Wizard the development is being deprecated in favor of the Data module. I'll continue to upload using Table Wizard for now since I think it works well but also plan on giving the Data module a shot in the near future. 

As always I'd love to hear your thoughts and tips on modules and workflows in the comments. If you have built or know of a good data access website built using Drupal feel free to share those links as well.

Video Links

YouTube Version

Flash Version

Quicktime Version

Jan 07 2009
krs
Jan 07

First of all, this only applies to Views1 & Drupal 5 - there's probably a similar function in views2 however.

The setup: A timestamp ($node->created) was being selected in a View. The only options for formatting it are as Short, Long, Medium, Time Ago, or Custom. With custom you can enter any string that can be interpreted by php's date() function, and have that format apply.

The problem: The site that used a different format as its standard for displaying date and time, let's say "D F j, Y \a\t g:i A". We had created a function in template.php called site_format_date() that accepted a timestamp, and returned the date nicely formatted that way. All for reuseability! Now all the dates in the code (and template files, and theme function overrides) we write can call that function. And when Mrs. Editor decides that she really would like it to be something completely different, we can change it in one place and be done!

The table problem: What a pain to have to go to each View that uses a time or date, and select the "custom" format, and enter into the textbox the format we want to use. Now when someone wants to change the site-wide format, we'll have to go back to each view and update all the format fields! Argh!

So what we're trying to do, is somehow add our own custom format as one of the options for displaying a timestamp, in addition to the basic ones Drupal and Views provides us. Unfortunately, those choices are created by a call to views_handler_field_dates(), which just returns a fixed array of those 5 choices, and there's no way to get your own in there.

hook_views_tables_alter() to the rescue! This hook lets you modify how the Views module understands the tables in the Drupal database. Most of the core tables are added by .inc files in the views module, and many contrib modules have their own views.inc file. Using this hook you have the opportunity to change anything you like about how views works with modules, and do it safely without hacking any modules.

The fields I wanted to affect (the created and changed timestamp) are node fields. You can see them defined in path/to/views/modules/views_node.inc. I'm trying affect how the fields are handled (as opposed to filters, arguments, sorts, or anything else). By checking the .inc file I can see the keys I want to affect are simply called 'created' and 'changed'. Putting that all together, here is my complete hook function :

function hook_views_tables_alter(&$tabledata) {
$tabledata['node']['fields']['created']['handler']['my_custom_format_date'] = t('As My Custom Date');
$tabledata['node']['fields']['changed']['handler']['my_custom_format_date'] = t('As My Custom Date');
}

Now when I can choose "As My Custom Date" to format the timestamps! Views will automatically call my_custom_format_date() for me, and I just pass the data right along to our sitewide date formatter.

function my_custom_format_date($fieldinfo, $fielddata, $value, $data) {
return $value ? site_format_date($value) : theme('views_nodate');
}

If figuring out the syntax of the array to modify was a little confusing (views syntax? confusing? never!) try this little snippet first:

function hook_views_tables_alter(&$tabledata) {
drupal_set_message(print_r($tabledata, TRUE));
}

and you'll get a very long printout of how every table interacts with the Views module. With some searching and little luck, you should be able to find the section you want to modify. Lots of great views1 and views2 documentation is available, but it gets a little hairy! Unfortunately there's not a lot of documentation that I could find on this very powerful hook! I discovered it while poking through the views.module code and found 1 Drupal issue at http://drupal.org/node/142383. (This was the issue that actually enabled the hook to work properly as of about July 2007).

Remind me to add some pictures to this post!

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web