Jul 13 2017
Jul 13
July 13th, 2017

When creating the Global Academy for continuing Medical Education (GAME) site for Frontline, we had to tackle several complex problems in regards to content migrations. The previous site had a lot of legacy content we had to bring over into the new system. By tackling each unique problem, we were able to migrate most of the content into the new Drupal 7 site.

Setting Up the New Site

The system Frontline used before the redesign was called Typo3, along with a suite of individual, internally-hosted ASP sites for conferences. Frontline had several kinds of content that displayed differently throughout the site. The complexity with handling the migration was that a lot of the content was in WYSIWYG fields that contained large amounts of custom HTML.

We decided to go with Drupal 7 for this project so we could more easily use code that was created from the MDEdge.com site.

“How are we going to extract the specific pieces of data and get them inserted into the correct fields in Drupal?”

The GAME website redesign greatly improved the flow of the content and how it was displayed on the frontend, and part of that improvement was displaying specific pieces of content in different sections of the page. The burning question that plagued us when tackling this problem was “How are we going to extract the specific pieces of data and get them inserted into the correct fields in Drupal?”

Before we could get deep into the code, we had to do some planning and setup to make sure we were clear in how to best handle the different types of content. This also included hammering out the content model. Once we got to a spot where we could start migrating content, we decided to use the Migrate module. We grabbed the current site files, images and database and put them into a central location outside of the current site that we could easily access. This would allow us to re-run these migrations even after the site launched (if we needed to)!

Migrating Articles

This content on the new site is connected to MDEdge.com via a Rest API. One complication is that the content on GAME was added manually to Typo3, and wasn’t tagged for use with specific fields. The content type on the new Drupal site had a few fields for the data we were displaying, and a field that stores the article ID from MDedge.com. To get that ID for this migration, we mapped the title for news articles in Typo3 to the tile of the article on MDEdge.com. It wasn’t a perfect solution, but it allowed us to do an initial migration of the data.

Conferences Migration

For GAME’s conferences, since there were not too many on the site, we decided to import the main conference data via a Google spreadsheet. The Google doc was a fairly simple spreadsheet that contained a column we used to identify each row in the migration, plus a column for each field that is in that conference’s content type. This worked out well because most of the content in the redesign was new for this content type. This approach allowed the client to start adding content before the content types or migrations were fully built.

Our spreadsheet handled the top level conference data, but it did not handle the pages attached to each conference. Page content was either stored in the Typo3 data or we needed to extract the HTML from the ASP sites.

Typo3 Categories to Drupal Taxonomies

To make sure we mapped the content in the migrations properly, we created another Google doc mapping file that connected the Typo3 categories to Drupal taxonomies. We set it up to support multiple taxonomy terms that could be mapped to one Typo3 category.
[NB: Here is some code that we used to help with the conversion: https://pastebin.com/aeUV81UX.]

Our mapping system worked out fantastically well. The only problem we encountered was that since we were allowing three taxonomy terms to be mapped to one Typo3 category, the client noticed some use cases where too many taxonomies were assigned to content that had more than one Typo3 category in certain use cases. But this was a content-related issue and required them to re-look at this document and tweak it as necessary.

Slaying the Beast:
Extracting, Importing, and Redirecting

One of the larger problems we tackled was how to get the HTML from the Typo3 system and the ASP conference sites into the new Drupal 7 setup.

The ASP conference sites were handled by grabbing the HTML for each of those pages and extracting the page title, body, and photos. The migration of the conference sites was challenging because we were dealing with different HTML for different sites and trying to get get all those differences matched up in Drupal.

Grabbing the data from the Typo3 sites presented another challenge because we had to figure out where the different data was stored in the database. This was a uniquely interesting process because we had to determine which tables were connected to which other tables in order to figure out the content relationships in the database.

The migration of the conference sites was challenging because we were dealing with different HTML for different sites and trying to get get all those differences matched up in Drupal.

A few things we learned in this process:

  • We found all of the content on the current site was in these tables (which are connected to each other): pages, tt_content, tt_news, tt_news_cat_mm and link_cache.
  • After talking with the client, we were able to grab content based on certain Typo3 categories or the pages hierarchy relationship. This helped fill in some of the gaps where a direct relationship could not be made by looking at the database.
  • It was clear that getting 100% of the legacy content wasn’t going to be realistic, mainly because of the loose content relationships in Typo3. After talking to the client we agreed to not migrate content older than a certain date.
  • It was also clear that—given how much HTML was in the content—some manual cleanup was going to be required.

Once we were able to get to the main HTML for the content, we had to figure out how to extract the specific pieces we needed from that HTML.

Once we had access to the data we needed, it was a matter of getting it into Drupal. The migrate module made a lot of this fairly easy with how much functionality it provided out of the box. We ended up using the prepareRow() method a lot to grab specific pieces of content and assigning them to Drupal fields.

Handling Redirects

We wanted to handle as many of the redirects as we could automatically, so the client wouldn’t have to add thousands of redirects and to ensure existing links would continue to work after the new site launched. To do this we mapped the unique row in the Typo3 database to the unique ID we were storing in the custom migration.

As long as you are handling the unique IDs properly in your use of the Migration API, this is a great way to handle mapping what was migrated to the data in Drupal. You use the unique identifier stored for each migration row and grab the corresponding node ID to get the correct URL that should be loaded. Below are some sample queries we used to get access to the migrated nodes in the system. We used UNION queries because the content that was imported from the legacy system could be in any of these tables.

SELECT destid1 FROM migrate_map_cmeactivitynode WHERE sourceid1 IN(:sourceid) UNION SELECT destid1 FROM migrate_map_cmeactivitycontentnode WHERE sourceid1 IN(:sourceid) UNION SELECT destid1 FROM migrate_map_conferencepagetypo3node WHERE sourceid1 IN(:sourceid) … SELECTdestid1FROMmigrate_map_cmeactivitynodeWHEREsourceid1IN(:sourceid)UNIONSELECTdestid1FROMmigrate_map_cmeactivitycontentnodeWHEREsourceid1IN(:sourceid)UNIONSELECTdestid1FROMmigrate_map_conferencepagetypo3nodeWHEREsourceid1IN(:sourceid)

Wrap Up

Migrating complex websites is rarely simple. One thing we learned on this project is that it is best to jump deep into migrations early in the project lifecycle, so the big roadblocks can be identified as early as possible. It also is best to give the client as much time as possible to work through any content cleanup issues that may be required.

We used a lot of Google spreadsheets to get needed information from the client. This made things much simpler on all fronts and allowed the client to start gathering needed content much sooner in the development process.

In a perfect world, all content would be easily migrated over without any problems, but this usually doesn’t happen. It can be difficult to know when you have taken a migration “far enough” and you are better off jumping onto other things. This is where communication with the full team early is vital to not having migration issues take over a project.

Web Chef Chris Roane
Chris Roane

When not breaking down and solving complex problems as quickly as possible, Chris volunteers for a local theater called Arthouse Cinema & Pub.

Jan 30 2017
Jan 30
January 30th, 2017

Welcome to the final episode of Season 2 of Sharp Ideas! On this episode, Randy and Doug talk to Four Kitchens directors Elia Albarran, Todd Nienkerk, and Aaron Stanush, about keeping your team happy, working ethically with clients, and how to prepare your people for the future of work.

Broadcasting directly to you from wherever the web meets business and design. You can listen to us on SoundCloud (on the site or download the app!) or find us with your other favorite podcasts on the Stitcher app.

Recommended Posts

Douglas Bigham
Douglas Bigham

Doug is a writer and ex-academic with a background in digital publics and social language use. He likes dark beer, bright colors, and he speaks a little Klingon.

Aug 23 2011
Aug 23

Part of the reason I love Drupal is that it treats everyone as a potential developer. The fact that the code is publicly available, that it is created collectively in the issue queues on Drupal.org, and the documentation is relatively easy to access are all parts of the equation. With some time and interest almost anyone can get involved and start building sites.

I believe that this is one of the biggest values of Drupal. A value we work to leverage for our clients. We are Drupal evangelists so teaching and encouraging folks to build sites is just part of our nature. We can't help it. We have structured this into our interactions with clients through our public and private trainings and try to build this into all of our client relationships.

That is why we are so pleased when our clients can take that next step from users to site builders. I outlined New York's Downtown Community Television Center took this next step. We not only built their site but also trained them and included them in our development workflow. Now they are able to enhance and maintain their site with limited assistance from us.

Similarly we are proud to highlight our relationship with Tern Bicycles and to announce their new site: http://ternbicycles.com.

We began our relationship with Tern by building their former flagship site. See our writeup on Drupal.org: Dahon Folding Bicycles Since the launch over a year ago the folks at Tern have been very busy. They've launched several new products and a new line of bicycles.

During that time their team has also learned a large amount about building Drupal sites. We gave them guidance on setting up a development environment and best practices. They used some of the great Drupal documentation and built their Biologic Accessories line http://www.thinkbiologic.com with limited assistance from us. They did much of the site building and most of the theming for: http://ternbicycles.com 

Below I discuss with Tern Bicycle's Terry Chen what it was like learning Drupal and moving from a user to a site builder and themer.

What was your technical background before using Drupal?

I was in charge of a large static site, which compromised of over 1,500+ pages.  I am well versed in writing html and css code.

Why did you select Drupal to use to build your sites?

The 1,500+ page site was manageable but slowly becoming unwieldy.  The main goal of the new site would be to easily translate and build a site for different countries.  

We looked at the big content management systems on the market and actually decided to use Joomla!.  We based our decision on the recommendation of a co-worker who has had previous experience working with Joomla! and the numerous criticisms about how difficult it was to learn Drupal.  We feared that our less tech-savvy customers would not be able to use Drupal.

We knew that whatever system we selected would have to fulfill our present as well as our future requirements.  After we made our decision, I installed both systems on my computer and strangely enough I didn’t understand Joomla! but understood how to use Drupal pretty quickly.  I could also see how powerful and flexible Drupal was in comparison.  I knew right away that Drupal was the software for our organization. 

My co-workers didn’t completely agree with me but backed me up and every day we are grateful that we decided to use Drupal.

What were you able to build with Drupal?

Since 2009, we have moved all of our sites (except one), externally and internally, to Drupal.  We currently are running 6 different sites powered by Drupal for our company.  We have 2 internal sites and 4 external sites running on Drupal.

What was the most challenging element of building a site with Drupal?

I am most familiar with Drupal 6.  As mentioned before, Drupal is an extremely powerful program.  However, sometimes it is too powerful and at times it becomes confusing what the best way is to build the site.  We had to quickly learn which modules to use and how to effectively use them.  Also there was uncertainty of not knowing if a developer would stop development of a module.  Zivtech quickly steered us down the correct path. 

What has been the most rewarding part of working with Drupal?

When I get website requests, I can now say, “Yes, that’s possible”.  Our site has gone from looking nice to being a real tool. 
It has almost become too easy to build a site.  Last summer I built a full site in about 2 weeks.

How did Zivtech make it easier to build off of what we created for you?

I worked with Zivtech on two sites for our organization.  The first site we were complete newbies to the world of Drupal.  We needed a lot of hand-holding and Zivtech taught me what the best practices are when building a site in Drupal. 

I learned so much from that experience that for our latest project together, our new Tern Bicycles website, I told them that I could work on the design, theming and site configuration while they worked on the heavy coding on the backend. 

And this is why I love working with Zivtech.  Not only are they great teachers, but also the team is flexible enough to work in the way that we need.  In a short time, I went from being a passive member in the development process to being an active participant.  Zivtech has slowly dragged me into the world of git and Terminal.

Apr 14 2011
Apr 14

I recently had occasion to review the new website of a major bank's CRA/charity wing. As a web developer, I'm always curious how other sites are built. This one raised a number of red flags for me, so I'd like to write about it as a showcase. I have three questions on my mind:

  1. How do professional web shops get away with such poor quality work?
  2. How do clients know what to look for (and avoid)?
  3. With plenty of good web shops out there, why aren't big clients connecting with them?

I don't have the answers yet, but I do want to raise the questions. First, reviewing the site from my developer's perspective:

  • The page contents are loaded with Javascript. With Javascript turned off, there's a little bit of left nav, and the main area is essentially blank. This means the site is unreadable to screen readers (browsers for blind people), so the site is not 508 compliant. Maybe more importantly, it means the contents of the page are invisible to search engines. (See Google's cached copy of the homepage for example.)
  • The Javascript that pulls in the page contents is loading an XML file with AJAX (see line 72 of the homepage source). XML is meant for computers to talk to each other, not for human-readable websites, and AJAX is meant for interactive applications, not the main content area of every page. (I can only imagine the workflow for editing the content on the client side: does their CMS output XML? Do they manually edit XML? Or can the content never change without the original developers?)
  • The meta tags are all generic: The OpenGraph page title (used by Facebook) across the site is "ShareThis Homepage". (ShareThis has a "social" widget which I assume they copied the code from, but having those meta values is probably worse than having none at all.)
  • None of the titles are links, so even if Google could read the site, it would just see a lot of Read More's.
  • From a usability perspective, the 11px font size for most of the content is difficult to read.
  • The Initiatives by State map is built in Flash, which makes it unviewable on non-Android mobile devices. Flash is also unnecessary for maps now, given the slew of great HTML5-based mapping tools. Not to mention the odd usability quirks/bugs of the map's interface.

I could go on, but that's enough to make the point. So what's going on here? I've seen enough similar signs in other projects to feel confident in speculating about this one.

The vendor wasn't entirely incompetent - the hundreds of lines of Javascript code needed some technical proficiency to write - yet the site ignores so many core principles of good web development circa 2011. Whatever skills were applied here, were misplaced. The "web" has to accommodate our phones, TVs, even our cars, with "mobile" browsers (broadly defined) expected to eclipse the desktop in the not-too-distant future. That means progressive enhancement and basic HTML quality are critical. Web users also have an infinity of sites to visit, so to justify the investment in yet another site, you need some basic Search Engine Optimization for people to find you. Building a site that is readable only to a narrow subset of desktop browsers constitutes an unfinished product in my book.

On the client side, any site with more than one page, that needs to be updated more than once in a blue moon, needs a content management system. I don't see the tell-tales of any common CMS here, and the way the contents are populated with AJAX suggests the CMS under the hood is weak or non-existent. Reinventing the wheel with entirely custom code for a site makes it difficult to maintain in the long run: developers with expertise in common frameworks/CMSs won't want to touch it, and whoever does will need a long ramp-up/head-scratching period to understand it. It's also unnecessary with so many tested tools available. So clients need to insist on a CMS, and if a vendor tries to talk them out of one, or claim it will be 3x the price, they need to find a better vendor. I work with Drupal and think it's the best fit for many sites (and free of license fees), but there are many good options.

The site doesn't say who built it, and searching for relevant keywords doesn't bring up any clearly proud vendors. Was it a web shop at all, or an ad agency that added some token web services to their roster? (General rule: avoid those vendors.) Clients need to see their sites not as another piece of throwaway marketing material, but as a long-term, audience-building investment. Thinking of websites as advertisements that only need to be viewed on Windows running Internet Explorer is missing the point.

I wonder, given the client (with $10 billion in profit in 2010), how much this site cost. It's not a brochure site, but it's not particularly complex either. The only really custom piece is the map, and the same style could probably be implemented with OpenLayers (or Google Maps with some compromise from the client on color requirements). Whatever they paid, I suspect they could have paid one of the top Drupal shops the same price to build a maintainable, standards-based, truly impressive website, for visitors, internal staff, and reviewing developers alike.

Then again, being such a large client means the vendor likely had to deal with all kinds of red tape. Maybe the really good web shops don't connect with that class of client because it's not worth the hassle. But surely the U.S. House of Representatives, in the process of moving to Drupal, has its own brand of red tape, and the vendor has project managers who can handle it.

Websites are complex beasts and evaluating them from the client perspective is not the same as watching a proposed TV commercial. So how do client without core competencies in web development know what to avoid? Googling it will only get them so far. But the burden is ultimately on them: we all consume products about which we lack core expertise, and big corporations (as consumers and clients themselves) need to figure out the same heuristics as everyone else. Trusting reputable vendors is one approach, but it's a vicious cycle if they're locked into one vendor (as companies with existing B2B relationships often are).

Diversifying the advice you get is critical. Big projects should have RFPs and a bidding process. (That helps enforce a realistic division of labor: little shops like mine don't respond to RFPs, but big shops that can afford that investment are happy to manage the project and outsource some development so suit their own core competencies.)

The bidding process could even involve the vendors defending their proposals in front of their competitors. Then the top-tier CMS shop can eliminate the static-HTML or Cold Fusion shop from the running before it's too late. There are no silver bullets - there's a potential to be fleeced in any market - but in most of them, consumers have figured out ways to spot red flags and protect themselves. Website clients need to educate themselves and catch up.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web