Nov 01 2018
Nov 01

Content migration is a topic with a lot of facets. We’ve already covered some important migration information on our blog:

So far, readers of this series will have gotten lots of good process information, and learned how to move a Drupal 6 or 7 site into Drupal 8. This post, though, will cover what you do when your content is in some other data framework. If you haven’t read through the previous installments, I highly recommend you do so. We’ll be building on some of those concepts here.

Content Type Translation

One of the first steps of a Drupal to Drupal migration is setting up the content types in the destination site. But what do you do if you are moving to Drupal from another system? Well, you will need to do a little extra analysis in your discovery phase, but it’s very doable.

Most content management systems have at least some structure that is similar to Drupal’s node types, as well as a tag/classification/category system that is analogous to Drupal’s taxonomy. And it’s almost certain to have some sort of user account. So, the first part of your job is to figure out how all that works.

Is there only one ‘content type’, which is differentiated by some sort of tag (“Blog Post”, “Product Page”, etc.)? Well, then, each of those might be a different content type in Drupal. Are Editors and Writers stored in two different database tables? Well, you probably just discovered two different user roles, and will be putting both user types into Drupal users, but with different roles. Does your source site allow comments? That maps pretty closely to Drupal comments, but make sure that you actually want to migrate them before putting in the work! Drupal 8 Content Migration: A Guide For Marketers, one of the early posts in this series, can help you make that decision.

Most CMS systems will also have a set of meta-data that is pretty similar to Drupal’s: created, changed, author, status and so on. You should give some thought to how you will map those fields across as well. Note that author is often a reference to users, so you’ll need to consider migration order as well.

If your source data is not in a content management system (or you don’t have access to it), you may have to dig into the database directly. If you have received some or all of your content in the XML, CSV, or other text-type formats, you may just have to open the files and read them to see what you are working with.

In short, your job here will be to distill the non-Drupal conventions of your source site into a set of Drupal-compatible entity types, and then build them.

Migration from CSV

CSV is an acronym for “Comma-Separated Value”, and is a file format often used for transferring data in large quantity. If you get some of your data from a client in a spreadsheet, it’s wise to export it to CSV. This format strips all the MS Office or Google Sheets gobbledygook, and just gives you a straight block of data.

Currently, migrations of CSV files into Drupal use the Migrate Source CSV module. However, this module is being moved into core and deprecated. Check the Bring migrate_source_csv to core issue to see what the status on that is, and adjust this information accordingly.

The Migrate Source CSV module has a great example and some good documentation, so I’ll just touch on the highlights here.

First, know that CSV isn’t super-well structured, so each entity type will need to be a separate file. If you have a spreadsheet with multiple tabs, you will need to export each separately, as well.

Second, connecting to it is somewhat different than connecting to a Drupal database. Let’s take a look at the data and source configuration from the default example linked above.


  1. id,first_name,last_name,email,country,ip_address,date_of_birth

  2. 1,Justin,Dean,,Indonesia,,01/05/1955

  3. 2,Joan,Jordan,,Thailand,,10/14/1958

  4. 3,William,Ray,,Germany,,08/13/1962

migrate_source_csv/tests/modules/migrate_source_csv_test/config/install/migrate_plus.migration.migrate_csv.yml (Abbreviated)

  1. ...

  2. source:

  3.   plugin: csv

  4.   path: /artifacts/people.csv

  5.   keys:

  6.     - id

  7.   header_row_count: 1

  8.   column_names:

  9.     -

  10.       id: Identifier

  11.     -

  12.       first_name: 'First Name'

  13.     -

  14.       last_name: 'Last Name'

  15.     -

  16.       email: 'Email Address'

  17.     -

  18.       country: Country

  19.     -

  20.       ip_address: 'IP Address'

  21.     -

  22.       date_of_birth: 'Date of Birth'

  23. ...

Note first that this migration is using plugin: csv, instead of the d7_node or d7_taxonomy_term that we’ve seen previously. This plugin is in the Migrate Source CSV module, and handles reading the data from the CSV file.

  path: /artifacts/people.csv

The path config, as you can probably imagine, is the path to the file you’re migrating.  In this case, the file is contained within the module itself.

  1. keys:

  2. - id

The keys config is an array of columns that are the unique id of the data.

  1. header_row_count: 1

  2. column_names:

  3. -

  4. id: Identifier

  5. -

  6. first_name: 'First Name'

  7. -

  8. last_name: 'Last Name'

  9. ...

These two configurations interact in an interesting way. If your data has a row of headers at the top, you will need to let Drupal know about it by setting a header_row_count. When you do that, Drupal will parse the header row into field ids, then move the file to the next line for actual data parsing.

However, if you set the column_names configuration, Drupal will override the field ids created when it parsed the header row. By passing only select field ids, you can skip fields entirely without having to edit the actual data. It also allows you to specify a human-readable field name for the column of data, which can be handy for your reference, or if you’re using Drupal Migrate’s admin interface.

You really should set at least one of these for each CSV migration.

The process configuration will treat these field ids exactly the same as a Drupal fieldname.

Process and Destination configuration for CSV files are pretty much the same as with a Drupal-to-Drupal import, and they are run with Drush exactly the same.

Migration from XML/RSS

XML’s a common data storage format, that presents data in a tagged format. Many content management systems or databases have an ‘export as xml’ option. One advantage XML has over CSV is that you can put multiple data types into a single file. Of course, if you have lots of data, this advantage could turn into a disadvantage as the file size balloons! Weigh your choice carefully.

The Migrate Plus module has a data parser for XMl, so if you’ve been following along with our series so far, you should already have this capability installed.

Much like CSV, you will have to connect to a file, rather than a database. RSS is a commonly used xml format, so we’ll walk through connecting to an RSS file for our example. I pulled some data from Phase2’s own blog RSS for our use, too. (Abbreviated)

  1. <?xml version="1.0" encoding="utf-8"?>

  2. <rss ... xml:base="">

  3.   <channel>

  4.     <title>Phase2 Ideas</title>

  5.     <link></link>

  6.     <description/>

  7.     <language>en</language>

  8.         <item>

  9.             <title>The Top 5 Myths of Content Migration *plus one bonus fairytale</title>

  10.             <link></link>

  11.             <description>The Top 5 Myths of Content Migration ... </description>

  12.             <pubDate>Wed, 08 Aug 2018 14:23:34 +0000</pubDate>

  13.             <dc:creator>Bonnie Strong</dc:creator>

  14.             <guid isPermaLink="false">1304 at</guid>

  15.         </item>

  16.     </channel>

  17. </rss>


  1. id: example_xml_articles

  2. label: 'Import articles'

  3. status: true

  4. source:

  5.   plugin: url

  6.   data_fetcher_plugin: http

  7.   urls: ''

  8.   data_parser_plugin: simple_xml

  9.   item_selector: /rss/channel/item

  10.   fields:

  11.     -

  12.       name: guid

  13.       label: GUID

  14.       selector: guid

  15.     -

  16.       name: title

  17.       label: Title

  18.       selector: title

  19.     -

  20.       name: pub_date

  21.       label: 'Publication date'

  22.       selector: pubDate

  23.     -

  24.       label: 'Origin link'

  25.       selector: link
  26.     -

  27.       name: summary

  28.       label: Summary

  29.       selector: description

  30.   ids:

  31.     guid:

  32.       type: string

  33. destination:

  34.   plugin: 'entity:node'

  35. process:

  36.   title:

  37.     plugin: get

  38.     source: title

  39.   field_remote_url: link
  40.   body: summary

  41.   created:

  42.     plugin: format_date

  43.     from_format: 'D, d M Y H:i:s O'

  44.     to_format: 'U'

  45.     source: pub_date

  46.   status:

  47.     plugin: default_value

  48.     default_value: 1

  49.   type:

  50.     plugin: default_value

  51.     default_value: article

The key bits here are in the source configuration.

  1. source:

  2. plugin: url

  3. data_fetcher_plugin: http

  4. urls: ''

  5. data_parser_plugin: simple_xml

  6. item_selector: /rss/channel/item

Much like CSV’s use of the csv plugin to read a file, XML is not using the d7_node or d7_taxonomy_term plugin to read the data. Instead, it’s pulling in a url and reading the data it finds there. The data_fetcher_plugin takes one of two different possible values, either http or file. HTTP is for a remote source, like an RSS feed, while File is for a local file. The urls config should be pretty obvious.

The data_parser_plugin specifies what php library to use to read and interpret the data. Possible parsers here include JSON, SOAP, XML and SimpleXML. SimpleXML’s a great library, so we’re using that here.

Finally, item_selector defines where in the XML the items we’re importing can be found. If you look at our data example above, you’ll see that the actual nodes are in rss -> channel -> item. Each node would be an item.

  1.  fields:

  2. ...

  3.     -

  4.       name: pub_date

  5.       label: 'Publication date'

  6.       selector: pubDate

  7. ...

Here you see one of the fields from the xml. The label is just a human-readable label for the field, while the selector is the field within the XML item we’re getting.

The name is what we’ll call a pseudo-field. A pseudo-fields acts as a temporary storage for data. When we get to the Process section, the pseudo-fields are treated essentially as though they were fields in a database.

We’ve seen pseudo-fields before, when we were migrating taxonomy fields in Drupal 8 Migrations: Taxonomy and Nodes. We will see why they are important here in a minute, but there’s one more important thing in source.

  1.  ids:

  2.     guid:

  3.       type: string

This snippet here sets the guid to be a unique of the article we’re importing. This guarantees us uniqueness and is very important to specify.

Finally, we get to the process section.

  1. process:

  2. ...

  3. created:

  4. plugin: format_date

  5. from_format: 'D, d M Y H:i:s O'

  6. to_format: 'U'

  7. source: pub_date

  8. ...

So, here is where we’re using the pseudo-field we set up before. This takes the value from pubDate that we stored in the pseudo-field pub_date, does some formatting to it, and assigns it to the created field in Drupal. The rest of the fields are done in a similar fashion.

Destination is set up exactly like a Drupal-to-Drupal migration, and the whole thing is run with Drush the exact same way. Since RSS is a feed of real-time content, it would be easy to set up a cron job to run that drush command, add the --update flag, and have this migration go from one-time content import to being a regular update job that kept your site in sync with the source.

Migration from WordPress

WordPress export screenshotA common migration path is from WordPress to Drupal. Phase2 recently did so with our own site, and we have done it for clients as well. There are several ways to go about it, but our own migration used the WordPress Migrate module.

In your WordPress site, under Tools >> Export, you will find a tool to dump your site data into a customized xml format. You can also use the wp-cli tool to do it from the command line, if you like.

Once you have this file, it becomes your source for all the migrations. Here’s some good news: it’s an XML file, so working with it is very similar to working with RSS. The main difference is in how we specify our source connections.


  1. langcode: en

  2. status: true

  3. dependencies:

  4.   enforced:

  5.     module:

  6.       - phase2_migrate

  7. id: example_wordpress_authors

  8. class: null

  9. field_plugin_method: null

  10. cck_plugin_method: null

  11. migration_tags:

  12.   - example_wordpress

  13.   - users

  14. migration_group: example_wordpress_group

  15. label: 'Import authors (users) from WordPress WXL file.'

  16. source:

  17.   plugin: url

  18.   data_fetcher_plugin: file
  19.   data_parser_plugin: xml

  20.   item_selector: '/rss/channel/wp:author'

  21.   namespaces:

  22.     wp: ''

  23.     excerpt: ''

  24.     content: ''

  25.     wfw: '

  26.     dc: ''

  27.   urls:

  28.     - 'private://example_output.wordpress.2018-01-31.000.xml'

  29.   fields:

  30.     -

  31.       name: author_login

  32.       label: 'WordPress username'

  33.       selector: 'wp:author_login'

  34.     -

  35.       name: author_email

  36.       label: 'WordPress email address'

  37.       selector: 'wp:author_email'

  38.     -

  39.       name: author_display_name

  40.       label: 'WordPress display name (defaults to username)'

  41.       selector: 'wp:author_display_name'

  42.     -

  43.       name: author_first_name

  44.       label: 'WordPress author first name'

  45.       selector: 'wp:author_first_name'

  46.     -

  47.       name: author_last_name

  48.       label: 'WordPress author last name'

  49.       selector: 'wp:author_last_name'

  50.   ids:

  51.     author_login:

  52.       type: string

  53. process:

  54.   name:

  55.     plugin: get

  56.     source: author_login

  57.     plugin: get

  58.     source: author_email

  59.   field_display_name

  60.     plugin: get

  61.     source: author_display_name

  62.   field_first_name:

  63.     plugin: get

  64.     source: author_first_name

  65.   field_last_name:

  66.     plugin: get

  67.     source: author_last_name

  68.   status:

  69.     plugin: default_value

  70.     default_value: 0

  71. destination:

  72.   plugin: 'entity:user'

  73. migration_dependencies: null

If you’ve been following along in our series, a lot of this should look familiar.

  1. source:

  2. plugin: url

  3. data_fetcher_plugin: file
  4. data_parser_plugin: xml

  5. item_selector: '/rss/channel/wp:author'

This section works just exactly like the XML RSS example above. Instead of using http, we are using file for the data_fetcher_plugin, so it looks for a local file instead of making an http request. Additionally, due to the difference in the structure of an RSS feed compared to a WordPress WXL file, the item_selector is different, but it works the same way.

  1.     namespaces:

  2.       wp: ''

  3.       excerpt: ''

  4.       content: ''

  5.       wfw: ''

  6.       dc: ''

These namespace designations allow Drupal’s xml parser to understand the particular brand and format of the Wordpress export.

  1.    urls:

  2.       - 'private://example_output.wordpress.2018-01-31.000.xml'

Finally, this is the path to your export file. Note that it is in the private filespace for Drupal, so you will need to have private file management configured in your Drupal site before you can use it.

  1. fields:

  2. -

  3. name: author_login

  4. label: 'WordPress username'

  5. selector: 'wp:author_login'

We’re also setting up pseudo-fields again, storing the value from wp:author_login in author_login.

Finally, we get to the process section.

  1. process:

  2. name:

  3. plugin: get

  4. source: author_login

So, here is where we’re using the pseudo-field we set up before. This takes the value from wp:author_login that we stored in author_login and assigns it to the name field in Drupal.

Configuration for the migration of the rest of the entities - categories, tags, posts, and pages - look pretty much the same. The main difference is that the source will change slightly:

example_wordpress_migrate/config/install/migrate_plus.migration.example_wordpress_category.yml  (abbreviated)

  1. source:

  2. ...

  3. item_selector: '/rss/channel/wp:category'

example_wordpress_migrate/config/install/migrate_plus.migration.example_wordpress_tag.yml (abbreviated)

  1. source:

  2. ...

  3. item_selector: '/rss/channel/wp:tag'

example_wordpress_migrate/config/install/migrate_plus.migration.example_wordpress_post.yml (abbreviated)

  1. source:

  2. ...

  3. item_selector: '/rss/channel/item[wp:post_type="post"]'

And, just like our previous two examples, Wordpress migrations can be run with Drush.

A cautionary tale

As we noted in Managing Your Drupal 8 Migration, it’s possible to write custom Process Plugins. Depending on your data structure, it may be necessary to write a couple to handle values in these fields. On the migration of Phase2’s site recently, after doing a baseline test migration of our content, we discovered a ton of malformed links and media entities. So, we wrote a process plugin that did a bunch of preg_replace to clean up links, file paths, and code formatting in our body content. This was chained with the default get plugin like so:

  1. process:

  2. body/value:

  3. -

  4. plugin: get

  5. source: content

  6. -

  7. plugin: p2body

The plugin itself is a pretty custom bit of work, so I’m not including it here. However, a post on custom plugins for migration is in the works, so stay tuned.

Useful Resources and References

If you’ve enjoyed this series so far, we think you might enjoy a live version, too! Please drop by our session proposal for Drupalcon Seattle, Moving Out, Moving In! Migrating Content to Drupal 8 and leave some positive comments.

Feb 13 2018
Feb 13

In this post, we’ll begin to talk about the development considerations of actual website code migration and other technological details. In these exercises, we’re assuming that you’re moving from Drupal 6 or 7 to Drupal 8. In a later post, I will examine ways to move other source formats into Drupal 8 - including CSV files, non-Drupal content management systems, or database dumps from weird or proprietary frameworks.

Migration: A Primer

Before we get too deep into the actual tech here, we should probably take a minute to define some terms and explain what’s actually happening under the hood when we run a migration, or the rest of this won’t make much sense.

When we run a migration, what happens is that the Web Server loads the content from the old site, converts it to a Drupal 8 format, and saves it in the new site.  Sounds simple, right?

Actually, it pretty much is that simple. At least, conceptually. So, try to keep those three steps in mind as we go through the hard stuff later. Everything we do is designed to make one of those three steps work.

Key Phrases

  • Migration: The process of moving content from one site to another. ‘A migration’ typically refers to all the content of a single content or entity type (in other words, one node type, one taxonomy, and so on).

  • Migration Group: A collection of Migrations with common traits

  • Source: The Drupal 6 or 7 database from which you’re drawing your content (or other weird source of data, if applicable)

  • Process: The stuff that Drupal code does to the data after it’s been loaded, in order to digest it into a format that Drupal 8 can work with

  • Destination: The Drupal 8 site

Interestingly, each of those key phrases above corresponds directly to a code file that’s required for migration. Each Migration has a configuration (.yml) file, and each is individually tailored for the content of that entity. As config files, each of these is pretty independant and not reusable. However, we can also assign them to Migration Groups. Groups are also configuration (.yml) files. They allow us to declare common configurations once, and reuse them in each migration that belongs to that group.

The Source Plugin code is responsible for doing queries to the Source database, retrieving the data, and formatting it into PHP objects that can be worked on. The Process Plugin takes that data, does stuff to it, and passes it to the next step. The Destination Plugin then saves it in Drupal 8 format.  Rinse, repeat.

On a Drupal-to-Drupal migration, around 75% of your time will be spent working in the Migration or Migration Group config, declaring the different Process Plugins to use. You may wind up writing one or more Process Plugins as part of your migration development, but a lot of really useful ones are included in Drupal core migration code and are documented here. A few more are included with Migrate Plus.

Drupal 8 core has Source Plugins for all standard Drupal 6 and Drupal 7 entity types (node, taxonomy, user, etc.). The only time you’ll ever need to write a Source plugin is for a migration from a source other than Drupal 6 or 7, and many of these are already available as Contrib modules.

Also included in Drupal core are Destination Plugins for all of the core entity types. Unless you’re using a custom entity in Drupal 8, and migrating data into that entity, you’ll probably never write a Destination Plugin.

Development Foundations

There are a few key requirements you need to have in place before you can begin development.  First, and probably foremost, you need to have both your Drupal 6/7 and Drupal 8 sites - the former full of all your valuable content, and the latter empty of everything but structure.

An important note: though the completed migration will be run on your production server, you should be using development environments for this work. At Phase2, we use Outrigger to simplify and standardize our dev and production environments.

For migration purposes, we only actually need the Drupal 7 site’s database itself, in a place that’s accessible to the destination site.  I usually take an SQL dump from production, and install it as an additional database on the same server as the destination, to avoid network latency and complicated authentication requirements. Obviously, unless you freeze content for the duration of the migration development, you’ll have to repeat this process for final content migration on production.

I’d like to reiterate some advice from my last post: I strongly recommend sanitizing user accounts and email addresses on your development databases.  Use drush sql-sanitize and avoid any possibly embarrassing and unprofessional gaffes.

On your Drupal 8 site, you should already have completed the creation of the new content types, based on information you discovered and documented in your first steps.  This should also encompass the creation of taxonomy vocabularies, and any fields on your user entities.

In your Drupal 8 settings.php file, add a second database config array pointed at the Drupal 7 source database.


  1. $databases['migration_source_db']['default'] = array(
  2.   'database' => 'example_source',

  3. 'username' => 'username',

  4. 'password' => 'password',

  5. 'prefix' => '',

  6. 'host' => 'db',

  7. 'port' => '',

  8. 'namespace' => 'Drupal\Core\Database\Driver\mysql',

  9. 'driver' => 'mysql',

  10. );

Finally, you’ll need to add the migration module suite to your site.  The baseline for migrations is migrate, migrate_drupal, migrate_plus, and migrate_tools.  The Migrate and Migrate Drupal modules are core code. Migrate provides the basic functionality required to take content and put it into Drupal 8.  Migrate Drupal provides code that understands the structure of Drupal 6 and 7 content, and makes it much more straightforward to move content forward within the Drupal ecosystem.

Both Migrate Plus and Migrate Tools are contributed modules available at Migrate Plus, as the name implies, adds some new features, most importantly migration groups. Migrate Tools provides the drush integration we will use to run and rollback migrations.

Drupal 8 core code also provides migrate_drupal_ui, but I recommend against using it. By using Migrate Tools, we can make use of drush, which is more efficient, can be incorporated into shell scripts, and has more clear error messages.

Framing the House

We’ve done the planning and laid the foundations, so now it’s time to start building this house!

We start with a new, custom module.  This can be pretty bare-bones, to start with.


  1. type: module

  2. name: 'Example Migrate'

  3. description: 'Example custom migrations'

  4. package: 'Example Migrate'

  5. core: '8.x'

  6. dependencies:

  7. - drupal:migrate

  8. - drupal:migrate_plus

  9. - drupal:migrate_tools

  10. - drupal:migrate_drupal

Within our module folder, we need a config/install directory. This is where all our config files will go.

Migration Groups

The first thing we should make is a general migration group. While it’s possible to put all the configuration into each and every migration you write, I’m a strong believer in DRY programming (Don’t Repeat Yourself).  Migrate Plus gives us the ability to put common configuration into a single file and use it for multiple migrations, so let’s take advantage of that power!

Note the filename we’re using here. This naming convention gives Migrate Plus the ability to find and parse this configuration, and marks it as a migration group.


  1. # The machine name of the group, by which it is referenced in individual migrations.

  2. id: example_general

  3. # A human-friendly label for the group.

  4. label: General Imports

  5. # More information about the group.

  6. description: Common configuration for simple migrations.

  7. # Short description of the type of source, e.g. "Drupal 6" or "WordPress".

  8. source_type: Drupal 7 Site

  9. # Here we add any default configuration settings to be shared among all

  10. # migrations in the group.

  11. shared_configuration:

  12. source:

  13. key: migration_source_db
  14. # We add dependencies just to make sure everything we need will be available

  15. dependencies:

  16. enforced:

  17. module:

  18. - example_migrate

  19. - migrate_drupal

  20. - migrate_tools

This is a very simple group that will use for migrations of simple content . Most of the stuff in here is self-descriptive.  However, source is a critical config - it uses the key of the database configuration we added earlier, to give migrate access to that database.  We’ll examine a more complicated migration group another time.

User Migration

In Drupal, users pretty much have their fingers in every pie.  They are listed as authors on content, they are creators of files… you get the picture.  That’s why it’s usually the first migration to get run.

Note again the filename convention here, which allows Migrate Plus to find it, and marks it as a migration (as opposed to a group).


  1. # Migration for user accounts.

  2. id: example_user

  3. label: User Migration

  4. migration_group: example_general

  5. source:

  6. plugin: d7_user

  7. destination:

  8. plugin: entity:user

  9. process:

  10. plugin: get

  11. status: status

  12. name:

  13. -

  14. plugin: get

  15. source: name

  16. -

  17. plugin: dedupe_entity

  18. entity_type: user

  19. field: name

  20. roles:

  21. plugin: static_map

  22. source: roles

  23. map:

  24. 2: authenticated

  25. 3: administrator

  26. 4: author

  27. 5: guest_author

  28. 6: content_approver

  29. created: created

  30. changed: changed

  31. migration_dependencies:

  32. required: { }

  33. dependencies:

  34. enforced:

  35. module:

  36. - example_migrate

Wow! There’s lots of stuff going on here.  Let’s try and break it down a bit.

  1. id: example_user

  2. label: User Migration

  3. migration_group: example_general

The id designation is a standard machine name for this migration.  We will call this with drush to run the migration. Label is a standard human-readable name.  The migration_group should be obvious - it connects this migration to the group we designed above, which means we are now importing all the config in there.  Notably, that connects us to the D7 database.

  1. source:

  2. plugin: d7_user

  3. destination:

  4. plugin: entity:user

Here are two key items.  The source plugin defines where we are getting our data, and what format it’s going to come in.  In this case, we are using Drupal core’s d7_user plugin.

The destination plugin defines what we’re making out of that data, and the format it ends up in.  In this case, we’re using Drupal core’s entity:user plugin.

  1. process:

  2. plugin: get

  3. status: status

  4. name:

  5. -

  6. plugin: get

  7. source: name

  8. -

  9. plugin: dedupe_entity

  10. entity_type: user

  11. field: name

  12. roles:

  13. plugin: static_map

  14. source: roles

  15. map:

  16. 2: authenticated

  17. 3: administrator

  18. 4: author

  19. 5: guest_author

  20. 6: content_approver

  21. created: created

  22. changed: changed

Now we get into the real meat of a migration - the Process section. Each field you’re going to migrate has to be defined here. They are keyed by their field machine name in Drupal 8.  

Each field assigns a plugin parameter, which defines the Process Plugin to use on the data. Each of these process plugins will take a source parameter, and then possibly others.  The source parameter defines the field in the data array provided by the source plugin.  (Yeah, like I’ve said before, naming things clearly isn’t Drupal’s strong suit).

Our first example is mail. Here we are assigning it the get process plugin. This is the easiest process to understand, as it literally takes the data from the old site and gives it to the new site without transforming it in any way. Since email addresses don’t have any formatting changes or necessary transformations, we just move them.

In fact, the get process plugin is Drupal’s default, and our next example shows a shortcut to use it. The status field is getting its data from the old status field. Since get is our default, we don’t even need to actually specify the plugin, and the source is simply implied. See the documentation on for more detail.

Name is a slightly more complicated matter.  While usernames don’t change much in their format, we want to make absolutely sure that they are unique.  This leads us to Plugin Chaining, an interesting option that allows us to pass data from one plugin to another, before saving it. The YML array syntax, as demonstrated above, allows us to define more than one plugin for a single field.

We start off by defining the get plugin, which just gets the data from a source field. (You can’t use the default shortcut when you’re chaining, incidentally.)

We then pass it off to the next plugin in the chain, dedupe_entity. This plugin ensures that each record is absolutely certain to be unique.  It has the additional parameters entity_type and field. These define the entity type to check against for uniqueness, and the field in which to look on that entity. See the documentation for more detail.

Note that this usage of dedupe_entity does not specify a source parameter.  That’s because plugin chaining hands off the data from the first plugin in line to the next, becoming, in effect, the source.  It’s very similar to method chaining in jQuery or OOP PHP.  You can chain together as many process plugins as you need, though if you start getting up above four it might be time to re-evaluate what you’re doing, and possibly write a custom processor.

Our final example to examine is roles. User roles in Drupal 7 were keyed numerically, but in Drupal 8 they are based on machine names.  The static_map plugin takes the old numbers, and assigns them to a machine name, which becomes the new value.

The last two process items are changed and created. Like status, they are using the get process plugin, and being designated in the shortcut default syntax.

  1. migration_dependencies:

  2. required: { }

  3. dependencies:

  4. enforced:

  5. module:

  6. - example_migrate

The last two configs are pretty straightforward.  Migration Dependencies are used when a migration requires data from other migrations (we’ll get into that more another time). Dependencies are used when a migration requires a specific additional module to be enabled. In my opinion it’s pretty redundant with the dependencies declared in the module itself, so I don’t use it much.

In the next post, we’ll cover taxonomy migrations and simple node migrations. We’ll also share a really useful tool for migration development.  Thanks for reading!

Jan 18 2018
Jan 18

In my last post,  we discussed why marketers might want to migrate their content to Drupal 8, and the strategy and planning required to get started. The spreadsheet we shared with you in that post is the foundation of a good migration, and it usually takes a couple sprints of research, discussion, and documentation to compile.  It’s also a process that’s applicable to all migration of content, no matter the source or destination framework.

In this post, we will talk about what’s required from your internal teams to actually pull off a content migration to Drupal 8. In later posts, we’ll cover the actual technical details of making the migration happen.

Migration: A Definition

It’s probably worth taking some time here to clarify what, exactly, we’re talking about when we say ‘migration’. In this context, a migration is a (usually automated) transferring of existing content from an old web site to a new one. This also usually implies a systems upgrade, from an outdated version of your content management system to a current version.  In these exercises, we’re assuming that you’re moving from Drupal 6 or 7 to Drupal 8.

What kind of team is required?

There are several phases of migration, each of which requires a different skill set.  The first step is outlined in detail in my last post. The analysis done here is a joint effort, generally requiring input from a project manager and/or analyst, a marketing manager, and a developer.  

The project manager and analyst should be well versed in information architecture and content strategy (there is some great information on this topic at Further, it is really helpful if they have an understanding of the capabilities of the source and target systems, as this often informs what content is transferable, and how.

It’s also helpful if your team has a handle on the site’s traffic and usage. This usually falls to a senior content editor or marketing manager.  Also important is that they have the ability to decide what content is worth migrating, and in what form.

In the documentation phase of migration, the developer often has limited input, as this is the least-technical phase of the whole process. However, they should definitely have some level of oversight on the decisions being made, just to ensure technical feasibility.  That requires a good thorough understanding of the capabilities of the source and target systems.

One of the parties should also have the ability to make and export content types and fields. You can see Mike Potter’s excellent Guide to Configuration Management for more information on that.

Once development on the migration begins, it mostly becomes a developer task. Migrations are a really great mentoring opportunity (We’re really big on this at Phase2).  

Finally, someone on the team also needs the ability to setup the source and target databases and files for use in all the environments (development, testing, production).


“How long will all this take?”  We hear this a lot.  And, of course, there’s no one set answer. Migration is a complicated task with a lot of testing and a lot of patience required. It’s pretty difficult to pin down, but here are some (really, really rough) guidelines for you to start from. Many of the tasks below may sound unfamiliar; they will be covered in detail in later posts.


Node/User/Taxonomy migrations

1-5 content types

6-10 content types

11+ content types

Initial analysis (“the spreadsheet”)

16-24 hours

32-40 hours

48-56 hours

Content type creation & export

16-40 hours

40-80 hours

8 hours/type

Configuration Grouping

16-24 hours

24-40 hours

24-40 hours

Content migrations

16-40 hours

32-56 hours

8 hours/type


24-32 hours

40-56 hours

8 hours/type

Additional Migrations


Files & media migration

32-56 hours

Other entity types

16-40 hour per entity type

Migrations from non-Drupal sources

16-40 hour per source type

The numbers here are in “averaged person-hours” format - this would be what it would take for a single experienced developer to accomplish these tasks. Again, remember that these are really rough numbers and your mileage will vary.

You might note, reading the numbers closely, that most of the tasks are ‘front-loaded’.  Migration is definitely a case where the heavy work happens at the start, to get things established.  Adding additional content types becomes simpler with time - fields are often reused, or at least similar enough to each other to allow for some overlap of code and configuration.

Finally, these numbers are also based on content types of "average" complexity. By this I mean, somewhere between 5 and 15 un-customized content fields.  Content types with substantially more fields, or with fields that require a lot of handling on the data, will expand the complexity of the migration.  More complexity means more time.  This is an area where it's hard to provide any specific numbers even as a guideline, but your migration planning spreadsheet will likely give you an idea of how much extra work is necessary.  Use your best judgement and don't be afraid to give yourself some wiggle room in the overall total to cover these special cases.

Security and Safety Considerations

As with all web development, a key consideration in migrating content is security. The good news is that migration is usually a one-time occurence.  Once it’s done, all the modules and custom code you’ve written are disabled, so they don’t typically present any security holes. As long as your development and database servers are set up to industry standard, migration doesn’t present any additional challenges in and of itself.

That said, it’s important to remember that you are likely to be working with extremely sensitive data - user data almost always contains PII (Personally Identifiable Information). It is therefore important to make sure that user data - in the form of database dumps, xml files, or other stores - does not get passed around in emails or other unsecure formats.

Depending on your business, you may also have the same concerns with actual content, or with image and video files. Be sensible, take proper precautions.  And make sure that your git repository is not public.

I also strongly recommend sanitizing user accounts and email addresses on your development databases.  There’s no feeling quite like accidentally sending a few thousand dummy emails to your unsuspecting and confused customers.  Use drush sql-sanitize and avoid any possibly embarrassing and unprofessional gaffes.

What’s next?

Well, we’ve covered all the project management aspects of migration - next up is some tech talk!  Stay tuned for my next post, which will cover the foundations of developing a migration.

Nov 07 2017
Nov 07

If you’re a marketer considering a move from Drupal 7 to Drupal 8, it’s important to understand the implications of content migration. You’ve worked hard to create a stable of content that speaks to your audience and achieves business goals, and it’s crucial that the migration of all this content does not disrupt your site’s user experience or alienate your visitors.  

Content migrations are, in all honesty, fickle, challenging, and labor-intensive. The code that’s produced for migration is used once and discarded; the documentation to support them is generally never seen again after they’re done. So what’s the value in doing it at all?

Your data is important (Especially for SEO!) 

No matter what platform you’re working to migrate, your data is important. You’ve invested lots of time, money, and effort into producing content that speaks to your organization’s business needs.

Migrating your content smoothly and efficiently is crucial for your site’s SEO ranking. If you fail to migrate highly trafficked content or to ensure that existing links direct readers to your content’s new home you will see visitor numbers plummet. Once you fall behind in SEO, it’s difficult to climb back up to a top spot, so taking content migration seriously from the get go is vital for your business’ visibility.

Also, if you work in healthcare or government, some or all of your content may be legally mandated to be both publically available, and letter-for-letter accurate. You may also have to go through lengthy (read: expensive) legal reviews for every word of content on your sites to ensure compliance with an assortment of legal standards – HIPAA, Section 508 and WCAG accessibility, copyright and patent review, and more.  

Some industries also mandate access to content and services for people with Limited English Proficiency, which usually involves an additional level of editorial content review (See for resources).  

At media organizations, it’s pretty simple – their content is their business!

In short, your content is a business investment – one that should be leveraged.

So Where do I start with a Drupal 8 migration?

Like with anything, you start at the beginning. In this case that’s choosing the right digital technology partner to help you with your migration. Here’s a handy guide to help you choose the right vendor and start your relationship off on the right foot.

Once you choose your digital partner content migration should start at the very beginning of the engagement. Content migration is one of the building blocks of a good platform transition. It’s not something that can be left for later – trust us on this one. It’s complicated, takes a lot of developer hours, and typically affects your both content strategy and your design.

Done properly, the planning stages begin in the discovery phase of the project with your technology vendor, and work on migration usually continues well into the development phase, with an additional last-sprint push to get all the latest content moved over.

While there are lots of factors to consider, they boil down to two questions: What content are we migrating, and how are we doing it?

Which Content to Migrate

You may want to transition all of your content, but this is an area that does bear some discussion. We usually recommend a thorough content audit before embarking on any migration adventure. You can learn more about website content audits here. Since most migration happens at a code & database level, it’s possible to filter by virtually any facet of the content you like. The most common in our experience are date of creation, type of content, and categorization.

While it might be tempting to cut off your site’s content to the most recent few articles, Chris Anderson’s 2004 Wired article, “The Long Tail” ( observes that a number of business models make good use of old, infrequently used content. The value of the Long Tail to your business is most certainly something that’s worth considering.

Obviously, the type of content to be migrated is pretty important as well. Most content management systems differentiate between different ‘content types’, each with their own uses and value.  A good thorough analysis of the content model, and the uses to which each of these types has been and will be used, is invaluable here.  There are actually two reasons for that.  First, the analysis can be used to determine what content will be migrated, and how.  Later, this analysis serves as the basis of the creation of those ‘content types’ in the destination site.

A typical analysis takes place in a spreadsheet (yay, spreadsheets!). Our planning sheet has multiple tabs but the critical one in the early stages is Content Types.


content types planning sheet

Here you see some key fields: Count, Migration, and Field Mapping Status.

Count is the number of items of each content type. This is often used to determine if it’s more trouble than it’s worth to do an automated content migration, as opposed to a simple cut & paste job. As a very general guideline, if there are more than 50 items of content in a content type, then that content should probably be migrated with automation. Of course, the amount of fields in a content type can sway that as well. Once this determination is made, that info is stored in the Migration field.

The Field Mapping Status Column is a status column for the use of developers, and reflects the current efforts to create the new content types, with all their fields.  It’s a summary of the Content Type Specific tabs in the spreadsheet. More detail on this is below.

Ultimately, the question of what content to migrate is a business question that should be answered in close consultation with your stakeholders.  Like all such conversations, this will be most productive if your decisions are made based on hard data.

How do we do it?

This is, of course, an enormous question. Once you’ve decided what content you are going to migrate, you begin by taking stock of the content types you are dealing with. That’s where the next tabs in the spreadsheet come in.

The first one you should tackle is the Global Field Mappings. Most content management systems define a set of default fields that are attached to all content types. In Drupal, for example, this includes title, created, updated, status, and body. Rather than waste effort documenting these on every content type, document them once and, through the magic of spreadsheet functions, print them out on the Content Type tabs.


global field mappings

Generally, you want to note Name, Machine Name, Field Type, and any additional Requirements or Notes on implementation on these spreadsheets.

It’s worth noting here that there are decisions to be made about what fields to migrate, just as you made decisions about what content types.  Some data will simply be irrelevant or redundant in the new system, and may safely be ignored.


migration planning sheet

In addition to content types, you also want to document any supporting data – most likely users and any categorization or taxonomy. For a smooth migration, you usually want to actually start the development with them.

The last step we’ll cover in this post is content type creation. Having analyzed the structure of the data in the old system, it’s time to begin to recreate that structure in the new platform. For Drupal, this means creating new content type bundles, and making choices about the field types. New platforms, or new versions of platforms, often bring changes to field types, and some content will have to be adapted into new containers along the way.  We’ll cover all that in a later post.

Now, many systems have the ability to migrate content types, in addition to content. Personally, I recommend against using this capability. Unless your content model is extremely simple, the changes to a content type’s fields are usually pretty significant. You’re better off putting in some labor up front than trying to clean up a computer’s mess later.

In our next post, we’ll address the foundations of Drupal content migrations – Migration Groups, and Taxonomy and User Migrations. Stay tuned!

Feb 14 2013
Feb 14

Posted Feb 14, 2013 // 0 comments

With Mobile in the forefront of digital government initiatives, laying the foundation for a mobile solution for the Department of Energy (DOE) was a priority. With this in mind, we wanted to meet this challenge for DOE in a way that was efficient and affordable. We saw a unique opportunity to quickly and easily adapt the existing site to be flexible for all devices. This bypassed a long and possibly difficult redesign process, saving our friends at DOE time and money. Tasking a single developer to work with their existing assets and make them flexible, we were able to create a mobile solution straight from their existing website.

Our starting point included a solid foundation - a static, pixel-based, 12-column grid, with some javascript doing additional layout tweaks. We knew that with this as a base, we could create a responsive solution with what DOE already had. Instead of creating a whole new design, we were able to primarily work on the front-end with CSS and not have to add too much javascript or Drupal development to the process.

Our strategy here was to start with the basics - converting the pixel-based grid to a percentage-based grid, to achieve the broadest results in the shortest amount of time. This worked quite well. Our grid was 1000px wide, which made the math quite simple; where it wasn’t we made subtle tweaks to padding and widths to make it work.

Once the grid was made flexible, we started shrinking down the page in-browser, looking for points at which the design and layout broke down. When content got too narrow or the layout just didn’t work, we added additional style sheets at these points, which switched the layout and styling up a little bit to work better. This process is detailed at Web Designer Wall.

We also made some adaptations to the large highlighted “hero” images and image galleries, so that they would provide different size images at different screen sizes, using the Adaptive Image Styles module. 

These techniques brought us most of the way towards a fully mobile-friendly site. There are a few outstanding visual pain points; that’s where we have brought our design partners, HUGE Inc., into the process, asking them to provide additional insight and guidance to this agile solution.

The success of this project is as much in our relationship with the Department of Energy and our passion for innovation as it is in any engineering techniques. We wanted to give them the best, most efficient solution we could, and their trust in us allowed us to experiment to find it.

The payoff for this approach is that, in just 65 hours of development and project conception, we have come most of the way to a fully-realized mobile solution. We are looking forward to completing this project with the Department of Energy, addressing further mobile needs and refinements. Stay tuned for the deployment of this work in the very near future!

We're working to keep as a model government site, not just in its overall presentation but also in how we cost-effectively manage and develop the site. This move to a mobile solution without a complete redesign is a great example of what we're working toward.
-- Robert Roberts, Director of Digital Strategies, Department of Energy

Senior Developer Joshua Turton brings a wide variety of skills to his role at Phase2. A programmer comfortable working on both front-end and server-side technologies, he also brings a strong visual sensibility to his work. More than nine years ...

Nov 13 2012
Nov 13

Posted Nov 13, 2012 // 2 comments

Theming in Drupal has been complicated and difficult, particularly when approaching the problem of websites with multiple layouts.  We've all seen sites with dozens of tpls, with code written into templates, with mile-long preprocess functions covering multiple possibilities… It can be messy!

But it doesn't have to be this way. The addition of just a few modules, and a solid base theme, allow a site's layout to be configured without writing a single line of code. This was the topic of Omega: From Download to Layout in 45 Minutes, my presentation at BADCamp. This post will cover just one part of that, creating a layout for one of the wireframes without writing a single line of code.

Dev Site Setup

We'll make the assumption that you have a dev site set up, with the appropriate content types and some demo content to support this site, along with a few views and utility blocks.  You have also installed the Omega base theme and created a new sub-theme, titled Epsilon.


The site is laid out using 960gs, a popular grid system. For more information on 960gs, visit their site.
Here's a view of the section/content list page wireframe.


An alternate view of this layout clearly shows the columns widths used for layout on the grid.

The core of layouts in Omega lies in the Zone and region configration tab. Regions should be familiar to us from block admin - that's been around for a couple Drupal versions.  Regions are where you put your content and blocks. But what are Zones and Sections?

Think of them as larger containers into which the zones are placed, in a hierarchy.  Sections are the largest, and can have one or more zones inside them. Zones are next, and can have one or more regions inside them. Finally come regions, this is where the grid is really laid out, as Omega makes them quite easy to size by columns. These serve to wrap and contain your HTML, allowing for common styling and easy layout choices.

In this third view of our wireframe, we clearly see three different sections.

Header Section

Let's start by looking at the header section in detail.  The wireframe really only contains one zone, with two regions: logo and menu. Opening the Header Section configuration menu in Omega, on the Zone and region configuration tab, we see that this section actually has 4 zones, with a total of 6 regions in it – clearly too many for what we need, but a nice example of how versatile Omega is.

Omega allows us to move zones from one section to another, or even to disable them altogether.  We do this by opening each zone's fieldset, then the configuration fieldset within that, and setting Section to – None –.

In this case, we'll start by doing that for User Zone and Header Zone.

Next, we set the width of the Branding Region in the Branding Zone to 4 columns.  This is where we will put our logo, which if you recall was 4 columns wide.

After that, we go to the Menu Region in the Menu Zone, and set the width to 8 columns, as laid out in our wireframes. Weighting in Omega works just like it does anywhere else in Drupal – the higher the weight, the later the item renders in the process, so we set the weight to 5 to push the Menu Region after the Branding Region. And, again, Omega allows us to move a region from one zone to another, so we move the Menu Region to the Branding Zone. This will stack the Branding and Menu regions horizontally without any additional CSS.  Doing this leaves the Menu Zone without any regions, so as a last housekeeping item we set the Section for Menu Zone to – None –.

Content Section

The Content Section comes with three zones by default.  I'm going to leave the Preface and Postscript Zones alone for now, and work in the Content Zone of the Content Section.  Yes, there's a Content Section, and a Content Zone, and guess what?  There's a Content Region, too.  Naming things in Drupal is not an exact science.

We're going to focus our attention for now on the Content Zone.

In the Sidebar First region, I'm going to set the width to 4 columns, and set the weight to 5.  Again, a higher weight will push the item later in the rendering process - in this case, after the Content Region.

I'll set the width of the Content Region to 8 columns, and leave the weight alone.

Finally, I'll set the Zone of the Sidebar Second region to – None –, which will remove it from the Content Zone altogether.

Lastly, in the Footer Zone of the Footer Section, I'm going to set the width of the Footer First region to 8 columns, and the width of the Footer Second region to 4 columns. Since these two regions are already in the same zone, they will line up horizontally automatically, so there's no need to move them around from one region to another.

Save and view the page. Here's what a default Omega sub-theme looks like, out of the box.

And here's what Epsilon looks like now.

All that – and no coding!

Delta and Context

The problem of applying different theme settings – layouts – to different pages remains, of course.  That's where the delta and context modules come in.  For a more thorough explanation of how they work, see the slides from the full presentation at our Slideshare.


Senior Developer Joshua Turton brings a wide variety of skills to his role at Phase2. A programmer comfortable working on both front-end and server-side technologies, he also brings a strong visual sensibility to his work. More than nine years ...

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web