A middle-format approach for complex and incremental Drupal 8 migrations

Parent Feed: 

Supercharge Drupal 8 migration process by externalizing data export and transformation.

Nuvole offers a training at DrupalCon Amsterdam: "Drupal 8 Migration as a process" - register until October 27.

When having to import data onto a Drupal 8 site there is no other choice than relying on core’s Migrate API and its contrib ecosystem. The Migrate API in Drupal 8 implements a rather classic Extract, Transform, Load (ETL) process, with the following "Drupal-lingo" twists:

  • The extract phase is called "source" and it uses a source plugin to read data from external systems, be it a Drupal 7 database, a CSV file, a REST web service, etc.
  • The transform phase is called "process" and it uses process plugins to process and transform data
  • The load phase is called "destination" and it uses destination plugins to import data into specific Drupal 8 entity storages (e.g. nodes, taxonomy terms, etc.)

To recap, Drupal core implements the ETL process as follows:

  • Source plugins extract the data from the source.
  • Process plugins transform the data.
  • Destination plugins save the data as Drupal 8 entities.

Limitations of standard Drupal 8 ETL process

In the process described above the three steps are executed at the same moment in time, sequentially, every time we run a migration import. In this scenario we are to lose one of the most valuable aspects of an ETL process: testing and validating data prior to import.

Also, the process above easily accommodates for multiple data sources to be consolidated into one, coherent, dataset. With Drupal Migrate API this is only possible by using chained source plugin, and it can only run on the destination site. This is quite a limitation in complex enterprise scenarios, where displayed data is often the result of complex and convoluted transformations in the backend.

A middle-format approach

At Nuvole we have adopted a so called "middle-format approach". The process simply moves the Extract and Transform parts outside Drupal so to ease the production of an easy to import dataset, in a well known middle-format (such as JSON API).

This approach has proved very successful in complex scenarios and, while completely leveraging the standard Drupal 8 migration process, it also allows to:

  1. Aggregate data from different sources (not only Drupal 7 databases)
  2. Test the data transformation process
  3. Test imported Drupal 8 data
  4. Easily automate the points above

The process looks like the following:

The approach outlined above brings the following benefits:

  • Exported data can be reviewed by the client and iteratively refined
  • Since Drupal 8 is not a precondition to export and transform data, data export site building can run in parallel, making the whole migration process much more efficient
  • Exported data can be presented to the different stakeholders using a user friendly UI, even before starting any Drupal 8 development
  • Since the data uses a well-known middle-format building the import process (as Drupal core’s Migrate plugins) is straightforward and allows to maximize code reusability

Middle-format migration at work

At Nuvole we have successfully used the middle-format approach to incrementally transform, review and import data over several years, on complex sites such as the World Food Programme main website. In that scenario we had to consolidate data coming from two Drupal 7 sites (plus a number of external data sources) in 13 different languages, over a two years period.

We have recently open-sourced a simplified version of the tool we have used to export and transform data, you can find its boilerplate code here.

We will run an hands-on session on how to use such a tool to plan and execute data exports in our upcoming DrupalCon Amsterdam training.

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web