Sep 13 2019
Sep 13

Throughout the series, we explored many migration topics. We started with an overview of the ETL process and workflows for managing migrations. Then, we presented example migrations for different entities: nodes, files, images, taxonomy terms, users, and paragraphs. Next, we shifted focus to migrations from different sources: CSV, JSON, XML, Google Sheet, Microsoft Excel, and LibreOffice Calc files. Later, we explored how to manage migrations as configuration, use groups to share configuration, and execute migrations from the user interface. Finally, we gave recommendations and provided tools for debugging migrations from the command line and the user interface. Although we covered a lot of ground, we only scratched the surface. The Migrate API is so flexible that its use cases are virtually endless. To wrap up the series, we present an introduction to a very popular topic: Drupal upgrades. Let’s get started.

Note: In this article, when we talk about Drupal 7, the same applies to Drupal 6.

What is a Drupal upgrade?

The information we presented in the series is generic enough that it applies to many types of Drupal migrations. There is one particular use case that stands out from the rest: Drupal upgrades. An upgrade is the process of taking your existing Drupal site and copy its configuration and content over to a new major version of Drupal. For example, going from Drupal 6 or 7 to Drupal 8. The following is an oversimplification of the workflow to perform the upgrade process:

  • Install a fresh Drupal 8 site.
  • Add credentials so that the new site can connect to Drupal 7’s database.
  • Use the Migrate API to generate migration definition files. They will copy over Drupal 7’s configuration and content. This step is only about generating the YAML files.
  • Execute those migrations to bring the configuration and content over to Drupal 8.

Preparing your migration

Any migration project requires a good plan of action, but this is particularly important for Drupal upgrades. You need to have a general sense of how the upgrade process works, what assumptions are made by the system, and what limitations exist. Read this article for more details on how to prepare a site for upgrading it to Drupal 8. Some highlights include:

  • Both sites need to be in the latest stable version of their corresponding branch. That means the latest release of Drupal 7 and 8 at the time of performing the upgrade process. This also applies to any contributed module.
  • Do not do any configuration of the Drupal 8 site until the upgrade process is completed. Any configuration you make will be overridden, and there is no need for it anyways. Part of the process includes recreating the old site’s configuration: content types, fields, taxonomy vocabularies, etc.
  • Do not create content on the Drupal 8 site until the upgrade process is completed. The upgrade process will keep the unique identifiers from the source site: `nid`, `uid`, `tid`, `fid`, etc. If you were to create content, the references among entities could be broken when the upgrade process overrides the unique identifiers. To prevent data loss, wait until the old site's content has been migrated to start adding content to the new site.
  • For the system to detect a module’s configuration to be upgraded automatically, it has to be enabled on both sites. This applies to contributed modules in Drupal 7 (e.g., link) that were moved to core in Drupal 8. Also to Drupal 7 modules (e.g. address field) that were superseded by a different one in Drupal 8 (e.g. address). In any of those cases, as long as the modules are enabled on both ends, their configuration and content will be migrated. This assumes that the Drupal 8 counterpart offers an automatic upgrade path.
  • Some modules do not offer automatic upgrade paths. The primary example is the Views module. This means that any view created in Drupal 7 needs to be manually recreated in Drupal 8.
  • The upgrade procedure is all about moving data, not logic in custom code. If you have custom modules, the custom code needs to be ported separately. If those modules store data in Drupal’s database, you can use the Migrate API to move it over to the new site.
  • Similarly, you will have to recreate the theme from scratch. Drupal 8 introduced Twig which is significantly different to the PHPTemplate engine used by Drupal 7.

Customizing your migration

Note that the creation and execution of the migration files are separate steps. Upgrading to a major version of Drupal is often a good opportunity to introduce changes to the website. For example, you might want to change the content modeling, navigation, user permissions, etc. To accomplish that, you can modify the generated migration files to account for any scenario where the new site’s configuration diverts from the old one. And only when you are done with the customizations, you execute the migrations. Examples of things that could change include:

  • Combining or breaking apart content types.
  • Moving data about people from node entities to user entities, or vice versa.
  • Renaming content types, fields, taxonomy vocabularies and terms, etc.
  • Changing field types. For example, going from Address Field module in Drupal 7 to Address module in Drupal 8.
  • Merging multiple taxonomy vocabularies into one.
  • Changing how your content is structured. For example, going from a monolithic body field to paragraph entities.
  • Changing how your multimedia files are stored. For example, going from image fields to media entities.

Performing the upgrade

There are two options to perform the upgrade. In both cases, the process is initiated from the Drupal 8 site. One way is using the Migrate Drupal UI core module to perform the upgrade from the browser’s user interface. When the module is enabled, go to `/upgrade` and provide the database credentials of the Drupal 7 site. Based on the installed modules on both sites, the system will give you a report of what can be automatically upgraded. Consider the limitations explained above. While the upgrade process is running, you will see a stream of messages about the operation. These messages are logged to the database so you can read them after the upgrade is completed. If your dataset is big or there are many expensive operations like password encryption, the process can take too long to complete or fail altogether.

The other way to perform the upgrade procedure is from the command line using Drush. This requires the Migrate Upgrade contributed module. When enabled, it adds Drush commands to import and rollback a full upgrade operation. You can provide database connection details of the old site via command line options. One benefit of using this approach is that you can create the migration files without running them. This lets you do customizations as explained above. When you are done, you can run the migrations following the same workflow of manually created ones.

Known issues and limitations

Depending on whether you are upgrading from Drupal 6 or 7, there is a list of known issues you need to be aware of. Read this article for more information. One area that can be tricky is multilingual support. As of this writing, the upgrade path for multilingual sites is not complete. Limited support is available via the Migrate Drupal Multilingual core module. There are many things to consider when working with multilingual migrations. For example, are you using node or field translations? Do entities have revisions? Read this article for more information.

Upgrade paths for contributed modules

The automatic upgrade procedure only supports Drupal core modules. This includes modules that were added to core in Drupal 8. For any other contributed module, it is the maintainers’ decision to include an automatic upgrade path or not. For example, the Geofield module provides an upgrade path. It is also possible that a module in Drupal 8 offers an upgrade path from a different module in Drupal 7. For example, the Address module provides an upgrade path from the Address Field module. Drupal Commerce also provides some support via the Commerce Migrate module.

Not every module offers an automated upgrade path. In such cases, you can write custom plugins which ideally are contributed back to Drupal.org ;-) Or you can use the techniques learned in the series to transform your source data into the structures expected by Drupal 8. In both cases, having a broad understanding of the Migrate API will be very useful.

Upgrade strategies

There are multiple migration strategies. You might even consider manually recreating the content if there is only a handful of data to move. Or you might decide to use the Migrate API to upgrade part of the site automatically and do a manual copy of a different portion of it. You might want to execute a fully automated upgrade procedure and manually clean up edge cases afterward. Or you might want to customize the migrations to account for those edge cases already. Michael Anello created an insightful presentation on different migration strategies. Our tips for writing migrations apply as well.

Drupal upgrades tend to be fun, challenging projects. The more you know about the Migrate API the easier it will be to complete the project. We enjoyed writing this overview of the Drupal Migrate API. We would love to work on a follow up series focused on Drupal upgrades. If you or your organization could sponsor such endeavor, please reach out to us via the site’s contact form.

What about upgrading to Drupal 9?

In March 2017, project lead Dries Buytaert announced a plan to make Drupal upgrades easier forever. This was reinforced during his keynote at DrupalCon Seattle 2019. You can watch the video recording in this link. In short, Drupal 9.0 will be the latest point release of Drupal 8 minus deprecated APIs. This has very important implications:

  • When Drupal 9 is released, the Migrate API should be mostly the same as Drupal 8. Therefore, anything that you learn today will be useful for Drupal 9 as well.
  • As long as your code does not use deprecated APIs, upgrading from Drupal 8 to Drupal 9 will be as easy as updating from Drupal 8.7 to 8.8.
  • Because of this, there is no need to wait for Drupal 9 to upgrade your Drupal 6 or 7 site. You can upgrade to Drupal 8 today.

Thank you!

And that concludes the #31DaysOfMigration series. For joining us in this learning experience, thank you very much! ¡Muchas gracias! Merci beaucoup! :-D We are also very grateful to Agaric.coop, Drupalize.Me, and Centarro.io for sponsoring this series.

What did you learn in today’s blog post? Did you know the upgrade process is able to copy content and configuration? Did you know that you can execute the upgrade procedure either from the user interface or the command line? Share your answers in the comments. Also, we would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Sep 13 2019
Sep 13

When one starts working with migrations, it is easy to be overwhelmed by so many modules providing migration functionality. Throughout the series, we presented many of them trying to cover one module at a time. We did this to help the reader understand when a particular module is truly needed and why. But we only scratched the surface. Today’s article presents a list of migration related Drupal modules, all good for Drupal 8, for quick reference. Let’s get started.

Core modules

At the time of this writing, Drupal core ships with four migration modules:

  • Migrate: provides the base API for migrating data.
  • Migrate Drupal: offers functionality to migrate from other Drupal installations. It serves as the foundation for upgrades from Drupal 6 and 7. It also supports reading configuration entities from Drupal 8 sites.
  • Drupal Migrate UI: provides a user interface for upgrading a Drupal 6 or 7 site to Drupal 8.
  • Migrate Drupal Multilingual: is an experimental module required by multilingual translations. When they become stable, the module will be removed from Drupal core. See this article for more information.

Migration runners

Once the migration definition files have been created, there are many options to execute them:

Source plugins

The Migrate API offers many options to fetch data from:

Destination plugins

The Migrate API is mostly used to move data into Drupal, but it is possible to write to other destinations:

  • Migrate (core): provides classes for creating content and configuration entities. It also offers the `null` plugin which in itself does not write to anything. It is used in multilingual migrations for entity references.
  • Migrate Plus: provides the `table` plugin for migrating into tables not registered with Drupal Schema API.
  • CSV file: example destination plugin implementation to write CSV files. The module was created by J Franks for a DrupalCamp presentation. Check out the repository and video recording.

Development related

These modules can help with writing Drupal migrations:

Field and module related

Modules created by Tess Flynn (socketwench)

While doing the research for this article, we found many useful modules created by Tess Flynn (socketwench). She is a fantastic presenter who also has written about Drupal migrations, testing, and much more. Here are some of her modules:

Miscellaneous

  • Feeds Migrate: it aims to provide a user interface similar to the one from Drupal 7’s Feeds module, but working on top of Drupal 8’s Migrate API.
  • Migrate Override: allows flagging fields in a content entity so they can be manually changed by side editors without being overridden in a subsequent migration.
  • Migrate Status: checks if migrations are currently running.
  • Migrate QA: provides tools for validating content migrations. See this presentation for more details.

And many, many more modules!

What did you learn in today’s blog post? Did you find a new module that could be useful in current or future projects? Did we miss a module that has been very useful to you? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

Next: Introduction to Drupal 8 upgrades

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Sign up to be notified when Agaric gives a migration training:

Sep 12 2019
Sep 12

In recent articles, we have presented some recommendations and tools to debug Drupal migrations. Using a proper debugger is definitely the best way to debug Drupal be it migrations or other substems. In today’s article, we are going to learn how to configure XDebug inside DrupalVM to connect to PHPStorm. First, via the command line using Drush commands. And then, via the user interface using a browser. Let’s get started.

Important: User interfaces tend to change. Screenshots and referenced on-screen text might differ in new versions of the different tools. They can also vary per operating system. This article uses menu items from Linux. Refer the the official DrupalVM documentation for detailed installation and configuration instructions. For this article, it is assumed that VirtualBox, Vagrant, and Ansible are already installed. If you need help with those, refer to the DrupalVM’s installation guide.

Getting DrupalVM

First, get a copy of DrupalVM by cloning the repository or downloading a ZIP or TAR.GZ file from the available releases. If you downloaded a compressed file, expand it to have access to the configuration files. Before creating the virtual machine, make a copy of default.config.yml into a new file named config.yml. The latter will be used by DrupalVM to configure the virtual machine (VM). In this file, make the following changes:

# config.yml file
# Based off default.config.yml
vagrant_hostname: migratedebug.test
vagrant_machine_name: migratedebug
# For dynamic IP assignment the 'vagrant-auto_network' plugin is required.
# Otherwise, use an IP address that has not been used by any other virtual machine.
vagrant_ip: 0.0.0.0

# All the other extra packages can remain enabled.
# Make sure the following three get installed by uncommenting them.
installed_extras:
  - drupalconsole
  - drush
  - xdebug

php_xdebug_default_enable: 1
php_xdebug_cli_disable: no

The vagrant_hostname is the URL you will enter in your browser’s address bar to access the Drupal installation. Set vagrant_ip to an IP that has not been taken by another virtual machine. If you are unsure, you can set the value to 0.0.0.0 and install the vagrant-auto_network plugin. The plugin will make sure that an available IP is assigned to the VM. In the installed_extras section, uncomment xdebug and drupalconsole. Drupal Console is not necessary for getting XDebug to work, but it offers many code introspection tools that are very useful for Drupal debugging in general. Finally, set php_xdebug_default_enable to 1 and php_xdebug_cli_disable to no. These last two settings are very important for being able to debug Drush commands.

Then, open a terminal and change directory to where the DrupalVM files are located. Keep the terminal open are going to execute various commands from there. Start the virtual machine by executing vagrant up. If you had already created the VM, you can still make changes to the config.yml file and then reprovision. If the virtual machine is running, execute the command: vagrant provision. Otherwise, you can start and reprovision the VM in a single command: vagrant up --provision. Finally, SSH into the VM executing vagrant ssh.

By default, DrupalVM will use the Drupal composer template project to get a copy of Drupal. That means that you will be managing your module and theme dependencies using composer. When you SSH into the virtual machine, you will be in the /var/www/drupalvm/drupal/web directory. That is Drupal’s docroot. The composer file that manages the installation is actually one directory up. Normally, if you run a composer command from a directory that does not have a composer.json file, composer will try to find one up in the directory hierarchy. Feel free to manually go one directory up or rely on composer’s default behaviour to locate the file.

For good measure, let’s install some contributed modules. Inside the virtual machine, in Drupal’s docroot, execute the following command: composer require drupal/migrate_plus drupal/migrate_tools. You can also create directory in /var/www/drupalvm/drupal/web/modules/custom and place the custom module we have been working on throughout the series. You can get it at https://github.com/dinarcon/ud_migrations.

To make sure things are working, let’s enable one example modules by executing: drush pm-enable ud_migrations_config_entity_lookup_entity_generate. This module comes with one migration: udm_config_entity_lookup_entity_generate_node. If you execute drush migrate:status the example migration should be listed.

Configuring PHPStorm

With Drupal already installed and the virtual machine running, let’s configure PHPStorm. Start a new project pointing to the DrupalVM files. Feel free to follow your preferred approach to project creation. For reference, one way to do it is by going to "Files > Create New Project from Existing Files". In the dialog, select "Source files are in a local directory, no web server is configured yet." and click next. Look for the DrupalVM directory, click on it, click on “Project Root”, and then “Finish”. PHPStorm will begin indexing the files and detect that it is a Drupal project. It will prompt you to enable the Drupal coding standards, indicate which directory contains the installation path, and if you want to set PHP include paths. All of that is optional but recommended, especially if you want to use this VM for long term development.

Now the important part. Go to “Files > Settings > Language and Frameworks > PHP”. In the panel, there is a text box labeled “CLI Interpreter”. In the right end, there is a button with three dots like an ellipsis (...). The next step requires that the virtual machine is running because PHPStorm will try to connect to it. After verifying that it is the case, click the plus (+) button at the top left corner to add a CLI Interpreter. From the list that appears, select “From Docker, Vagrant, VM, Remote...”. In the “Configure Remote PHP Interpreter” dialog select “Vagrant”. PHPStorm will detect the SSH connection to connect to the virtual machine. Click “OK” to close the multiple dialog boxes. When you go back to the “Languages & Frameworks” dialog, you can set the “PHP language level” to match the same version from the Remote CLI Interpreter.

Remote PHP interpreter.

CLI interpreters.

You are almost ready to start debugging. There are a few things pending to do. First, let’s create a breakpoint in the import method of the MigrateExecutable class. You can go to “Navigate > Class” to the project by class name. Or click around in the Project structure until you find the class. It is located at ./drupal/web/core/modules/migrate/src/MigrateExecutable.php in the VM directory. You can add a breakpoint by clicking on the bar to the left of the code area. A red circle will appear, indicating that the breakpoint has been added.

Then, you need to instruct PHPStorm to listen for debugging connections. For this, click on “Run > Start Listening for PHP Debugging Connections”. Finally, you have to set some server mappings. For this you will need the IP address of the virtual machine. If you configured the VM to assign the IP dynamically, you can skip this step momentarily. PHPStorm will detect the incoming connection, create a server with the proper IP, and then you can set the path mappings.

Triggering the breakpoint

Let’s switch back to the terminal. If you are not inside the virtual machine, you can SSH into the VM executing vagrant ssh. Then, execute the following command (everything in one line):

XDEBUG_CONFIG="idekey=PHPSTORM" /var/www/drupalvm/drupal/vendor/bin/drush migrate:import udm_config_entity_lookup_entity_generate_node

For the breakpoint to be triggered, the following needs to happen:

  • You must execute Drush from the vendor directory. DrupalVM has a globally available Drush binary located at /usr/local/bin/drush. That is not the one to use. For debugging purposes, always execute Drush from the vendor directory.
  • The command needs to have XDEBUG_CONFIG environment variable set to “idekey=PHPSTORM”. There are many ways to accomplish this, but prepending the variable as shown in the example is a valid way to do it.
  • Because the breakpoint was set in the import method, we need to execute an import command to stop at the breakpoint. The migration in the example Drush command is part of the example module that was enabled earlier.

When the command is executed, a dialog will appear in PHPStorm. In it, you will be asked to select a project or a file to debug. Accept what is selected by default for now.  By accepting the prompt, a new server will be configured using the proper IP of the virtual machine.  After doing so, go to “Files > Settings > Language and Frameworks > PHP > Servers”. You should see one already created. Make sure the “Use path mappings” option is selected. Then, look for the direct child of “Project files”. It should be the directory in your host computer where the VM files are located. In that row, set the “Absolute path on the server” column to  /var/www/drupalvm. You can delete any other path mapping. There should only be one from the previous prompt. Now, click “OK” in the dialog to accept the changes.

Incoming Drush connection.

Path mappings

Finally, run the Drush command from inside the virtual machine once more. This time the program execution should stop at the breakpoint. You can use the Debug panel to step over each line of code and see how the variables change over time. Feel free to add more breakpoints as needed. In the previous article, there are some suggestions about that. When you are done, let PHPStorm know that it should no longer listen for connections. For that, click on “Run > Stop Listening for PHP Debugging Connections”. And that is how you can debug Drush commands for Drupal migrations.

Path mappings.

Debugging from the user interface

If you also want to be able to debug from the user interface, go to this URL and generate the bookmarklets for XDebug: https://www.jetbrains.com/phpstorm/marklets/ The IDE Key should be PHPSTORM. When the bookmarklets are created, you can drag and drop them into your browser’s bookmarks toolbar. Then, you can click on them to start and stop a debugging session. The IDE needs to be listening for incoming debugging connections as it was the case for Drush commands.

PHPStorm bookmarlets generator

Note: There are browser extensions that let you start and stop debugging sessions. Check the extensions repository of your browser to see which options are available.

Finally, set breakpoints as needed and go to a page that would trigger them. If you are following along with the example, you can go to http://migratedebug.test/admin/structure/migrate/manage/default/migrations/udm_config_entity_lookup_entity_generate_node/execute Once there, select the “Import” operation and click the “Execute” button. This should open a prompt in PHPStorm to select a project or a file to debug. Select the index.php located in Drupal’s docroot. After accepting the connection, a new server should be configured with the proper path mappings. At this point, you should hit the breakpoint again.

Incoming web connections.

Happy debugging! :-)

What did you learn in today’s blog post? Did you know how to debug Drush commands? Did you know how to trigger a debugging session from the browser? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Sep 07 2019
Sep 07

In the previous article, we began talking about debugging Drupal migrations. We gave some recommendations of things to do before diving deep into debugging. We also introduced the log process plugin. Today, we are going to show how to use the Migrate Devel module and the debug process plugin. Then we will give some guidelines on using a real debugger like XDebug. Next, we will share tips so you get used to migration errors. Finally, we are going to briefly talk about the migrate:fields-source Drush command. Let’s get started.

Example configuration for debug process plugin

The migrate_devel module

The Migrate Devel module is very helpful for debugging migrations. It allows you to visualize the data as it is received from the source, the result of field transformation in the process pipeline, and values that are stored in the destination. It works by adding extra options to Drush commands. When these options are used, you will see more output in the terminal with details on how rows are being processed.

As of this writing, you will need to apply a patch to use this module. Migrate Devel was originally written for Drush 8 which is still supported, but no longer recommended. Instead, you should use at least version 9 of Drush. Between 8 and 9 there were major changes in Drush internals.  Commands need to be updated to work with the new version. Unfortunately, the Migrate Devel module is not fully compatible with Drush 9 yet. Most of the benefits listed in the project page have not been ported. For instance, automatically reverting the migrations and applying the changes to the migration files is not yet available. The partial support is still useful and to get it you need to apply the patch from this issue. If you are using the Drush commands provided by Migrate Plus, you will also want to apply this patch. If you are using the Drupal composer template, you can add this to your composer.json to apply both patches:

"extra": {
  "patches": {
    "drupal/migrate_devel": {
      "drush 9 support": "https://www.drupal.org/files/issues/2018-10-08/migrate_devel-drush9-2938677-6.patch"
    },
    "drupal/migrate_tools": {
      "--limit option": "https://www.drupal.org/files/issues/2019-08-19/3024399-55.patch"
    }
  }
}

With the patches applied and the modules installed, you will get two new command line options for the migrate:import command: --migrate-debug and --migrate-debug-pre. The major difference between them is that the latter runs before the destination is saved. Therefore, --migrate-debug-pre does not provide debug information of the destination.

Using any of the flags will produce a lot of debug information for each row being processed. Many times, analyzing a subset of the records is enough to stop potential issues. The patch to Migrate Tools will allow you to use the --limit and --idlist options with the migrate:import command to limit the number of elements to process.

To demonstrate the output generated by the module, let’s use the image migration from the CSV source example. You can get the code at https://github.com/dinarcon/ud_migrations. The following snippets how to execute the import command with the extra debugging options and the resulting output:

# Import only one element.
$ drush migrate:import udm_csv_source_image --migrate-debug --limit=1

# Use the row's unique identifier to limit which element to import.
$ drush migrate:import udm_csv_source_image --migrate-debug --idlist="P01"
$ drush migrate:import udm_csv_source_image --migrate-debug --limit=1
┌──────────────────────────────────────────────────────────────────────────────┐
│                                   $Source                                    │
└──────────────────────────────────────────────────────────────────────────────┘
array (10) [
    'photo_id' => string (3) "P01"
    'photo_url' => string (74) "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg"
    'path' => string (76) "modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv"
    'ids' => array (1) [
        string (8) "photo_id"
    ]
    'header_offset' => NULL
    'fields' => array (2) [
        array (2) [
            'name' => string (8) "photo_id"
            'label' => string (8) "Photo ID"
        ]
        array (2) [
            'name' => string (9) "photo_url"
            'label' => string (9) "Photo URL"
        ]
    ]
    'delimiter' => string (1) ","
    'enclosure' => string (1) """
    'escape' => string (1) "\"
    'plugin' => string (3) "csv"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│                                 $Destination                                 │
└──────────────────────────────────────────────────────────────────────────────┘
array (4) [
    'psf_destination_filename' => string (25) "picture-15-1421176712.jpg"
    'psf_destination_full_path' => string (25) "picture-15-1421176712.jpg"
    'psf_source_image_path' => string (74) "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg"
    'uri' => string (29) "./picture-15-1421176712_6.jpg"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│                             $DestinationIDValues                             │
└──────────────────────────────────────────────────────────────────────────────┘
array (1) [
    string (1) "3"
]
════════════════════════════════════════════════════════════════════════════════
Called from +56 /var/www/drupalvm/drupal/web/modules/contrib/migrate_devel/src/EventSubscriber/MigrationEventSubscriber.php
 [notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_csv_source_image'

In the terminal, you can see the data as it is passed along in the Migrate API. In the $Source, you can see how the source plugin was configured and the different columns for the row being processed. In the $Destination, you can see all the fields that were mapped in the process section and their values after executing all the process plugin transformation. In $DestinationIDValues, you can see the unique identifier of the destination entity that was created. This migration created an image so the destination array has only one element: the file ID (fid). For paragraphs, which are revisioned entities, you will get two values: the id and the revision_id. The following snippet shows the $Destination and  $DestinationIDValues sections for the paragraph migration in the same example module:

$ drush migrate:import udm_csv_source_paragraph --migrate-debug --limit=1
┌──────────────────────────────────────────────────────────────────────────────┐
│                                   $Source                                    │
└──────────────────────────────────────────────────────────────────────────────┘
Omitted.
┌──────────────────────────────────────────────────────────────────────────────┐
│                                 $Destination                                 │
└──────────────────────────────────────────────────────────────────────────────┘
array (3) [
    'field_ud_book_paragraph_title' => string (32) "The definitive guide to Drupal 7"
    'field_ud_book_paragraph_author' => string UTF-8 (24) "Benjamin Melançon et al."
    'type' => string (17) "ud_book_paragraph"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│                             $DestinationIDValues                             │
└──────────────────────────────────────────────────────────────────────────────┘
array (2) [
    'id' => string (1) "3"
    'revision_id' => string (1) "7"
]
════════════════════════════════════════════════════════════════════════════════
Called from +56 /var/www/drupalvm/drupal/web/modules/contrib/migrate_devel/src/EventSubscriber/MigrationEventSubscriber.php
 [notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_csv_source_paragraph'

The debug process plugin

The Migrate Devel module also provides a new process plugin called debug. The plugin works by printing the value it receives to the terminal. As Benji Fisher explains in this issue, the debug plugin offers the following advantages over the log plugin provided by the core Migrate API:

  • The use of print_r() handles both arrays and scalar values gracefully.
  • It is easy to differentiate debugging code that should be removed from logging plugin configuration that should stay.
  • It saves time as there is no need to run the migrate:messages command to read the logged values.

In short, you can use the debug plugin in place of log. There is a particular case where using debug is really useful. If used in between of a process plugin chain, you can see how elements are being transformed in each step. The following snippet shows an example of this setup and the output it produces:

field_tags:
  - plugin: skip_on_empty
    source: src_fruit_list
    method: process
    message: 'No fruit_list listed.'
  - plugin: debug
    label: 'Step 1: Value received from the source plugin: '
  - plugin: explode
    delimiter: ','
  - plugin: debug
    label: 'Step 2: Exploded taxonomy term names '
    multiple: true
  - plugin: callback
    callable: trim
  - plugin: debug
    label: 'Step 3: Trimmed taxonomy term names '
  - plugin: entity_generate
    entity_type: taxonomy_term
    value_key: name
    bundle_key: vid
    bundle: tags
  - plugin: debug
    label: 'Step 4: Generated taxonomy term IDs '
$ drush migrate:import udm_config_entity_lookup_entity_generate_node --limit=1
Step 1: Value received from the source plugin: Apple, Pear, Banana
Step 2: Exploded taxonomy term names Array
(
    [0] => Apple
    [1] =>  Pear
    [2] =>  Banana
)
Step 3: Trimmed taxonomy term names Array
(
    [0] => Apple
    [1] => Pear
    [2] => Banana
)
Step 4: Generated taxonomy term IDs Array
(
    [0] => 2
    [1] => 3
    [2] => 7
)
 [notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_config_entity_lookup_entity_generate_node'

The process pipeline is part of the node migration from the entity_generate plugin example. In the code snippet, a debug step is added after each plugin in the chain. That way, you can verify that the transformations are happening as expected. In the last step you get an array of the taxonomy term IDs (tid) that will be associated with the field_tags field. Note that this plugin accepts two optional parameters:

  • label is a string to print before the debug output. It can be used to give context to what is being printed.
  • multiple is a boolean that when set to true signals the next plugin in the pipeline to process each element of an array individually. The functionality is similar to the multiple_values plugin provided by Migrate Plus.

Using the right tool for the job: a debugger

Many migration issues can be solved by following the recommendations from the previous article and the tools provided by Migrate Devel. But there are problems so complex that you need a full-blown debugger. The many layers of abstraction in Drupal, and the fact that multiple modules might be involved in a single migration, makes the use of debuggers very appealing. With them, you can step through each line of code across multiple files and see how each variable changes over time.

In the next article, we will explain how to configure XDebug to work with PHPStorm and DrupalVM. For now, let’s consider where are good places to add breakpoints. In this article, Lucas Hedding recommends adding them in:

  • The import method of the MigrateExecutable class.
  • The processRow method of the MigrateExecutable class.
  • The process plugin if you know which one might be causing an issue. The transform method is a good place to set the breakpoint.

The use of a debugger is no guarantee that you will find the solution to your issue. It will depend on many factors, including your familiarity with the system and how deep lies the problem. Previous debugging experience, even if not directly related to migrations, will help a lot. Do not get discouraged if it takes you too much time to discover what is causing the problem or if you cannot find it at all. Each time you will get a better understanding of the system.

Adam Globus-Hoenich, a migrate maintainer, once told me that the Migrate API "is impossible to understand for people that are not migrate maintainers." That was after spending about an hour together trying to debug an issue and failing to make it work. I mention this not with the intention to discourage you. But to illustrate that no single person knows everything about the Migrate API and even their maintainers can have a hard time debugging issues. Personally, I have spent countless hours in the debugger tracking how the data flows from the source to the destination entities. It is mind-blowing, and I barely understand what is going on. The community has come together to produce a fantastic piece of software. Anyone who uses the Migrate API is standing on the shoulders of giants.

If it is not broken, break it on purpose

One of the best ways to reduce the time you spend debugging an issue is having experience with a similar problem. A great way to learn is by finding a working example and breaking it on purpose. This will let you get familiar with the requirements and assumptions made by the system and the errors it produces.

Throughout the series, we have created many examples. We have made our best effort to explain how each example work. But we were not able to document every detail in the articles. In part to keep them within a reasonable length. But also, because we do not fully comprehend the system. In any case, we highly encourage you to take the examples and break them in every imaginable way. Making one change at a time, see how the migration behaves and what errors are produced. These are some things to try:

  • Do not leave a space after a colon (:) when setting a configuration option. Example: id:this_is_going_to_be_fun.
  • Change the indentation of plugin definitions.
  • Try to use a plugin provided by a contributed module that is not enabled.
  • Do not set a required plugin configuration option.
  • Leave out a full section like source, process, or destination.
  • Mix the upper and lowercase letters in configuration options, variables, pseudofields, etc.
  • Try to convert a migration managed as code to configuration; and vice versa.

The migrate:fields-source Drush command

Before wrapping up the discussion on debugging migrations, let’s quickly cover the migrate:fields-source Drush command. It lists all the fields available in the source that can be used later in the process section. Many source plugins require that you manually set the list of fields to fetch from the source. Because of this, the information provided by this command is redundant most of the time. However, it is particularly useful with CSV source migrations. The CSV plugin automatically includes all the columns in the file. Executing this command will let you know which columns are available. For example, running drush migrate:fields-source udm_csv_source_node produces the following output in the terminal:

$ drush migrate:fields-source udm_csv_source_node
 -------------- -------------
  Machine Name   Description
 -------------- -------------
  unique_id      unique_id
  name           name
  photo_file     photo_file
  book_ref       book_ref
 -------------- -------------

The migration is part of the CSV source example. By running the command you can see that the file contains four columns. The values under "Machine Name" are the ones you are going to use for field mappings in the process section. The Drush command has a --format option that lets you change the format of the output. Execute drush migrate:fields-source --help to get a list of valid formats.

What did you learn in today’s blog post? Have you ever used the migrate devel module for debugging purposes? What is your strategy when using a debugger like XDebug? Any debugging tips that have been useful to you? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Sep 05 2019
Sep 05

Throughout the series we have shown many examples. I do not recall any of them working on the first try. When working on Drupal migrations, it is often the case that things do not work right away. Today’s article is the first of a two part series on debugging Drupal migrations. We start giving some recommendations of things to do before diving deep into debugging. Then, we are going to talk about migrate messages and presented the log process plugin. Let’s get started.

Example configuration for log process plugin.

Minimizing the surface for errors

The Migrate API is a very powerful ETL framework that interacts with many systems provided by Drupal core and contributed modules. This adds layers of abstraction that can make the debugging process more complicated compared to other systems. For instance, if something fails with a remote JSON migration, the error might be produced in the Migrate API, the Entity API, the Migrate Plus module, the Migrate Tools module, or even the Guzzle HTTP Client library that fetches the file. For a more concrete example, while working on a recent article, I stumbled upon an issue that involved three modules. The problem was that when trying to rollback a CSV migration from the user interface an exception will be thrown making the operation fail. This is related to an issue in the core Migrate API that manifests itself when rollback operations are initiated from the interface provided by Migrate Plus. Then, the issue causes a condition in the Migrate Source CSV plugin that fails and the exception is thrown.

In general, you should aim to minimize the surface for errors. One way to do this by starting the migration with the minimum possible set up. For example, if you are going to migrate nodes, start by configuring the source plugin, one field (the title), and the destination. When that works, keep migrating one field at a time. If the field has multiple subfields, you can even migrate one subfield at a time. Commit every progress to version control so you can go back to a working state if things go wrong. Read this article for more recommendations on writing migrations.

What to check first?

Debugging is a process that might involve many steps. There are a few things that you should check before diving too deep into trying to find the root of the problem. Let’s begin with making sure that changes to your migrations are properly detected by the system. One common question I see people ask is where to place the migration definition files. Should they go in the migrations or config/install directory of your custom module? The answer to this is whether you want to manage your migrations as code or configuration. Your choice will determine the workflow to follow for changes in the migration files to take effect. Migrations managed in code go in the migrations directory and require rebuilding caches for changes to take effect. On the other hand, migrations managed in configuration are placed in the config/install directory and require configuration synchronization for changes to take effect. So, make sure to follow the right workflow.

After verifying that your changes are being applied, the next thing to do is verify that the modules that provide your plugins are enabled and the plugins themselves are properly configured. Look for typos in the configuration options. Always refer to the official documentation to know which options are available and find the proper spelling of them. Other places to look at is the code for the plugin definition or articles like the ones in this series documenting how to use them. Things to keep in mind include proper indentation of the configuration options. An extra whitespace or a wrong indentation level can break the migration. You can either get a fatal error or the migration can fail silently without producing the expected results. Something else to be mindful is the version of the modules you are using because the configuration options might change per version. For example, the newly released 8.x-3.x branch of Migrate Source CSV changed various configuration options as described in this change record. And the 8.x-5.x branch of Migrate Plus changed some configurations for plugin related with DOM manipulation as described in this change record. Keeping an eye on the issue queue and change records for the different modules you use is always a good idea.

If the problem persists, look for reports of similar problems in the issue queue. Make sure to include closed issues as well in case your problem has been fixed or documented already. Remember that a problem in a module can affect a different module. Keeping an eye on the issue queue and change records for all the modules you use is always a good idea. Another place ask questions is the #migrate channel in Drupal slack. The support that is offered there is fantastic.

Migration messages

If nothing else has worked, it is time to investigate what is going wrong. In case the migration outputs an error or a stacktrace to the terminal, you can use that to search in the code base where the problem might originate. But if there is no output or if the output is not useful, the next thing to do is check the migration messages.

The Migrate API allows plugins to log messages to the database in case an error occurs. Not every plugin leverages this functionality, but it is always worth checking if a plugin in your migration wrote messages that could give you a hint of what went wrong. Some plugins like skip_on_empty and skip_row_if_not_set even expose a configuration option to specify messages to log. To check the migration messages use the following Drush command: drush migrate:messages [migration_id]. If you are managing migrations as configuration, the interface provided by Migrate Plus also exposes them.

Messages are logged separately per migration, even if you run multiple migrations at once. This could happen if you execute dependencies or use groups or tags. In those cases, errors might be produced in more than one migration. You will have to look at the messages for each of them individually.

Let’s consider the following example. In the source there is a field called src_decimal_number with values like 3.1415, 2.7182, and 1.4142. It is needed to separate the number into two components: the integer part (3) and the decimal part (1415). For this, we are going to use the extract process plugin. Errors will be purposely introduced to demonstrate the workflow to check messages and update migrations. The following example shows the process plugin configuration and the output produced by trying to import the migration:

# Source values: 3.1415, 2.7182, and 1.4142

psf_number_components:
  plugin: explode
  source: src_decimal_number
$ drush mim ud_migrations_debug
[notice] Processed 3 items (0 created, 0 updated, 3 failed, 0 ignored) - done with 'ud_migrations_debug'

In MigrateToolsCommands.php line 811:
ud_migrations_debug Migration - 3 failed.

The error produced in the console does not say much. Let’s see if any messages were logged using: drush migrate:messages ud_migrations_debug. In the previous example, the messages will look like this:

 ------------------- ------- --------------------
  Source IDs Hash    Level   Message
 ------------------- ------- --------------------
  7ad742e...732e755   1       delimiter is empty
  2d3ec2b...5e53703   1       delimiter is empty
  12a042f...1432a5f   1       delimiter is empty
 ------------------------------------------------

In this case, the migration messages are good enough to let us know what is wrong. The required delimiter configuration option was not set. When an error occurs, usually you need to perform at least three steps:

  • Rollback the migration. This will also clear the messages.
  • Make changes to definition file and make they are applied. This will depend on whether you are managing the migrations as code or configuration.
  • Import the migration again.

Let’s say we performed these steps, but we got an error again. The following snippet shows the updated plugin configuration and the messages that were logged:

psf_number_components:
  plugin: explode
  source: src_decimal_number
  delimiter: '.'
 ------------------- ------- ------------------------------------
  Source IDs Hash    Level   Message
 ------------------- ------- ------------------------------------
  7ad742e...732e755   1       3.1415000000000002 is not a string
  2d3ec2b...5e53703   1       2.7181999999999999 is not a string
  12a042f...1432a5f   1       1.4141999999999999 is not a string
 ----------------------------------------------------------------

The new error occurs because the explode operation works on strings, but we are providing numbers. One way to fix this is to update the source to add quotes around the number so it is treated as a string. This is of course not ideal and many times not even possible. A better way to make it work is setting the strict option to false in the plugin configuration. This will make sure to cast the input value to a string before applying the explode operation. This demonstrates the importance of reading the plugin documentation to know which options are at your disposal. Of course, you can also have a look at the plugin code to see how it works.

Note: Sometimes the error produces an non-recoverable condition. The migration can be left in a status of "Importing" or "Reverting". Refer to this article to learn how to fix this condition.

The log process plugin

In the example, adding the extra configuration option will make the import operation finish without errors. But, how can you be sure the expected values are being produced? Not getting an error does not necessarily mean that the migration works as expected. It is possible that the transformations being applied do not yield the values we think or the format that Drupal expects. This is particularly true if you have complex process plugin chains. As a reminder, we want to separate a decimal number from the source like 3.1415 into its components: 3 and 1415.

The log process plugin can be used for checking the outcome of plugin transformations. This plugin offered by the core Migrate API does two things. First, it logs the value it receives to the messages table. Second, the value is returned unchanged so that it can be used in process chains. The following snippets show how to use the log plugin and what is stored in the messages table:

psf_number_components:
  - plugin: explode
    source: src_decimal_number
    delimiter: '.'
    strict: false
  - plugin: log
 ------------------- ------- --------
  Source IDs Hash    Level   Message
 ------------------- ------- --------
  7ad742e...732e755   1       3
  7ad742e...732e755   1       1415
  2d3ec2b...5e53703   1       2
  2d3ec2b...5e53703   1       7182
  12a042f...1432a5f   1       1
  2d3ec2b...5e53703   1       4142
 ------------------------------------

Because the explode plugin produces an array, each of the elements is logged individually. And sure enough, in the output you can see the numbers being separated as expected.

The log plugin can be used to verify that source values are being read properly and process plugin chains produce the expected results. Use it as part of your debugging strategy, but make sure to remove it when done with the verifications. It makes the migration to run slower because it has to write to the database. The overhead is not needed once you verify things are working as expected.

In the next article, we are going to cover the Migrate Devel module, the debug process plugin, recommendations for using a proper debugger like XDebug, and the migrate:fields-source Drush command.

What did you learn in today’s blog post? What workflow do you follow to debug a migration issue? Have you ever used the log process plugin for debugging purposes? If so, how did it help to solve the issue? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

Next: How to debug Drupal migrations - Part 2

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Sep 04 2019
Sep 04

In recent posts we have explored the Migrate Plus and Migrate Tools modules. They extend the Migrate API to provide migrations defined as configuration entities, groups to share configuration among migrations, a user interface to execute migrations, among other things. Yet another benefit of using Migrate Plus is the option to leverage the many process plugins it provides. Today, we are going to learn about two of them: `entity_lookup` and `entity_generate`. We are going to compare them with the `migration_lookup` plugin, show how to configure them, and explain their compromises and limitations. Let’s get started.

What is the difference among the migration_lookup, entity_lookup, entity_generate plugins?

In the article about migration dependencies we covered the `migration_lookup` plugin provided by the core Migrate API. It lets you maintain relationships among entities that are being imported. For example, if you are migrating a node that has associated users, taxonomy terms, images, paragraphs, etc. This plugin has a very important restriction: the related entities must come from another migration. But what can you do if you need to reference entities that already exists system? You might already have users in Drupal that you want to assign as node authors. In that case, the `migration_lookup` plugin cannot be used, but `entity_lookup` can do the job.

The `entity_lookup` plugin is provided by the Migrate Plus module. You can use it to query any entity in the system and get its unique identifier. This is often used to populate entity reference fields, but it can be used to set any field or property in the destination. For example, you can query existing users and assign the `uid` node property which indicates who created the node. If no entity is found, the module returns a `NULL` value which you can use in combination of other plugins to provide a fallback behavior. The advantage of this plugin is that it does not require another migration. You can query any entity in the entire system.

The `entity_generate` plugin, also provided by the Migrate Plus module, is an extension of `entity_lookup`. If no entity is found, this plugin will automatically create one. For example, you might have a list of taxonomy terms to associate with a node. If some of the terms do not exist, you would like to create and relate them to the node.

Note: The `migration_lookup` offers a feature called stubbing that neither `entity_lookup` nor `entity_generate` provides. It allows you to create a placeholder entity that will be updated later in the migration process. For example, in a hierarchical taxonomy terms migration, it is possible that a term is migrated before its parent. In that case, a stub for the parent will be created and later updated with the real data.

Getting the example code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is `UD Config entity_lookup and entity_generate examples` whose machine name is `ud_migrations_config_entity_lookup_entity_generate`. It comes with one JSON migrations: `udm_config_entity_lookup_entity_generate_node`. Read this article for details on migrating from JSON files. The following snippet shows a sample of the file:


{
  "data": {
    "udm_nodes": [
      {
        "unique_id": 1,
        "thoughtful_title": "Amazing recipe",
        "creative_author": "udm_user",
        "fruit_list": "Apple, Pear, Banana"
      },
      {...},
      {...},
      {...}
    ]
  }
}

Additionally, the example module creates three users upon installation: 'udm_user', 'udm_usuario', and 'udm_utilisateur'. They are deleted automatically when the module is uninstalled. They will be used to assign the node authors. The example will create nodes of types "Article" from the standard installation profile. You can execute the migration from the interface provided by Migrate Tools at `/admin/structure/migrate/manage/default/migrations`.

Using the entity_lookup to assign the node author

Let’s start by assigning the node author. The following snippet shows how to configure the `entity_lookup` plugin to assign the node author:


uid:
  - plugin: entity_lookup
    entity_type: user
    value_key: name
    source: src_creative_author
  - plugin: default_value
    default_value: 1

The `uid` node property is used to assign the node author. It expects an integer value representing a user ID (`uid`). The source data contains usernames so we need to query the database to get the corresponding user IDs. The users that will be referenced were not imported using the Migrate API. They were already in the system. Therefore, `migration_lookup` cannot be used, but `entity_lookup` can.

The plugin is configured using three keys. `entity_type` is set to machine name of the entity to query: `user` in this case. `value_key` is the name of the entity property to lookup. In Drupal, the usernames are stored in a property called `name`. Finally, `source` specifies which field from the source contains the lookup value for the `name` entity property. For example, the first record has a `src_creative_author` value of `udm_user`. So, this plugin will instruct Drupal to search among all the users in the system one whose `name` (username) is `udm_user`. If a value if found, the plugin will return the user ID. Because the `uid` node property expects a user ID, the return value of this plugin can be used directly to assign its value.

What happens if the plugin does not find an entity matching the conditions? It returns a `NULL` value. Then it is up to you to decide what to do. If you let the `NULL` value pass through, Drupal will take some default behavior. In the case of the `uid` property, if the received value is not valid, the node creation will be attributed to the anonymous user (uid: 0). Alternatively, you can detect if `NULL` is returned and take some action. In the example, the second record specifies the "udm_not_found" user which does not exists. To accommodate for this, a process pipeline is defined to manually specify a user if `entity_lookup` did not find one. The `default_value` plugin is used to return `1` in that case. The number represents a user ID, not a username. Particularly, this is the user ID of "super user" created when Drupal was first installed. If you need to assign a different user, but the user ID is unknown, you can create a pseudofield and use the `entity_lookup` plugin again to finds its user ID. Then, use that pseudofield as the default value.

Important: User entities do not have bundles. Do not set the `bundle_key` nor `bundle` configuration options of the `entity_lookup`. Otherwise, you will get the following error: "The entity_lookup plugin found no bundle but destination entity requires one." Files do not have bundles either. For entities that have bundles like nodes and taxonomy terms, those options need to be set in the `entity_lookup` plugin.

Using the entity_generate to assign and create taxonomy terms

Now, let’s migrate a comma separated list of taxonomy terms. An example value is `Apple, Pear, Banana`.  The following snippet shows how to configure the `entity_generate` plugin to look up taxonomy terms and create them on the fly if they do not exist:


field_tags:
  - plugin: skip_on_empty
    source: src_fruit_list
    method: process
    message: 'No src_fruit_list listed.'
  - plugin: explode
    delimiter: ','
  - plugin: callback
    callable: trim
  - plugin: entity_generate
    entity_type: taxonomy_term
    value_key: name
    bundle_key: vid
    bundle: tags

The terms will be assigned to the `field_tags` field using a process pipeline of four plugins:

  • `skip_on_empty` will skip the processing of this field if the record does not have a `src_fruit_list` column.
  • `explode` will break the string of comma separated files into individual elements.
  • `callback` will use the `trim` PHP function to remove any whitespace from the start or end of the taxonomy term name.
  • `entity_generate` takes care of finding the taxonomy terms in the system and creating the ones that do not exist.

For a detailed explanation of the `skip_on_empty` and `explode` plugins see this article. For the `callback` plugin see this article. Let’s focus on the `entity_generate` plugin for now. The `field_tags` field expects an array of taxonomy terms IDs (`tid`). The source data contains term names so we need to query the database to get the corresponding term IDs. The taxonomy terms that will be referenced were not imported using the Migrate API. And they might exist in the system yet. If that is the case, they should be created on the fly. Therefore, `migration_lookup` cannot be used, but `entity_generate` can.

The plugin is configured using five keys. `entity_type` is set to machine name of the entity to query: `taxonomy_term` in this case. `value_key` is the name of the entity property to lookup. In Drupal, the taxonomy term names are stored in a property called `name`. Usually, you would include a `source` that specifies which field from the source contains the lookup value for the `name` entity property. In this case it is not necessary to define this configuration option. The lookup value will be passed from the previous plugin in the process pipeline. In this case, the trimmed version of the taxonomy term name.

If, and only if, the entity type has bundles, you also must define two more configuration options: `bundle_key` and `bundle`. Similar to `value_key` and `source`, these extra options will become another condition in the query looking for the entities. `bundle_key` is the name of the entity property that stores which bundle the entity belongs to. `bundle` contains the value of the bundle used to restrict the search. The terminology is a bit confusing, but it boils down to the following. It is possible that the same value exists in multiple bundles of the same entity. So, you must pick one bundle where the lookup operation will be performed. In the case of the taxonomy term entity, the bundles are the vocabularies. Which vocabulary a term belongs to is associated in the `vid` entity property. In the example, that is `tags`. Let’s consider an example term of "Apple". So, this plugin will instruct Drupal to search for a taxonomy term whose `name` (term name) is "Apple" that belongs to the "tags" `vid` (vocabulary).

What happens if the plugin does not find an entity matching the conditions? It will create one on the fly! It will use the value from the source configuration or from the process pipeline. This value will be used to assign the `value_key` entity property for the newly created entity. The entity will be created in the proper bundle as specified by the `bundle_key` and `bundle` configuration options. In the example, the terms will be created in the `tags` vocabulary. It is important to note that values are trimmed to remove whispaces at the start and end of the name. Otherwise, if your source contains spaces after the commas that separate elements, you might end up with terms that seem duplicated like "Apple" and " Apple".

More configuration options

Both `entity_lookup` and `entity_generate` share the previous configuration options. Additionally, the following options are only available:
`ignore_case` contains a boolean value to indicate if the query should be case sensitive or not. It defaults to true.
`access_check` contains a boolean value to indicate if the system should check whether the user has access to the entity. It defaults to true.
`values` and `default_values` apply only to the `entity_generate` plugin. You can use them to set fields that could exist in the destination entity. An example configuration is included in the code for the plugin.

One interesting fact about these plugins is that none of the configuration options is required. The `source` can be skipped if the value comes from the process pipeline. The rest of the configuration options can be inferred by code introspection. This has some restrictions and assumptions. For example, if you are migrating nodes, the code introspection requires the `type` node property defined in the process section. If you do not set one because you define a `default_bundle` in the destination section, an error will be produced. Similarly, for entity reference fields it is assumed they point to one bundle only. Otherwise, the system cannot guess which bundle to lookup and an error will be produced. Therefore, always set the `entity_type` and `value_key` configurations. And for entity types that have bundles, `bundle_key` and `bundle` must be set as well.

Note: There are various open issues contemplating changes to the configuration options. See this issue and the related ones to keep up to date with any future change.

Compromises and limitations

The `entity_lookup` and `entity_generate` plugins violate some ETL principles. For example, they query the destination system from the process section. And in the case of `entity_generate` it even creates entities from the process section. Ideally, each phase of the ETL process is self contained. That being said, there are valid uses cases to use these plugins and they can you save time when their functionality is needed.

An important limitation of the `entity_generate` plugin is that it is not able to clean after itself. That is, if you rollback the migration that calls this plugin, any created entity will remain in the system. This would leave data that is potentially invalid or otherwise never used in Drupal. Those values could leak into the user interface like in autocomplete fields. Ideally, rolling back a migration should delete any data that was created with it.

The recommended way to maintain relationships among entities in a migration project is to have multiple migrations. Then, you use the `migration_lookup` plugin to relate them. Throughout the series, several examples have been presented. For example, this article shows how to do taxonomy term migrations.

What did you learn in today’s blog post? Did you know how to configure these plugins for entities that do not have bundles? Did you know that reverting a migration does not delete entities created by the `entity_generate` plugin? Did you know you can assign fields in the generated entity? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Sep 02 2019
Sep 02

In previous posts we introduced the concept of defining migrations as configuration entities. These types of migrations can be executed from a user interface provided by the Migrate Tools module. In today’s article, we will present the workflow to import configuration entities and execute migrations from the user interface. Let’s get started.

Important: User interfaces tend to change. In fact, Migrate Tools recently launched a redesign of the user interface. Screenshots and referenced on-screen text might change over time. Some exposed settings might work, but give no feedback indicating a successful outcome. Some forms might only work with specific versions of modules. The goal is to demonstrate what is possible and what are the limitations of doing so. In general, executing migrations from the command line offers a better experience.

Getting the example code

You can get the full code example for today’s example at https://github.com/dinarcon/ud_migrations The module to use is `UD configuration group migration (JSON source)` whose machine name is `ud_migrations_config_group_json_source`.  It comes with three migrations: `udm_config_group_json_source_paragraph`, `udm_config_group_json_source_image`, and  `udm_config_group_json_source_node`. Additionally, the demo module provides the `udm_config_group_json_source` group.

You could install the module as we have done with every other example in the series. Instructions can be found in this article. When the module is installed, you could execute the migrations using Drush commands as explained in this article. But, we are going to use the user interface provided by Migrate Plus instead.

Importing configuration entities from the user interface

The first step is getting the configuration entities into Drupal’s active configuration. One way to do this is by using the "Single item" import from the "Configuration synchronization" interface. It can be found at `/admin/config/development/configuration/single/import`. Four configuration items will be imported in the following order:

  1. The `udm_config_group_json_source` using "Migration Group" as the configuration type.
  2. The `udm_config_group_json_source_image` using "Migration" as the configuration type.
  3. The `udm_config_group_json_source_paragraph` using "Migration" as the configuration type.
  4. The `udm_config_group_json_source_node` using "Migration" as the configuration type.

When importing configuration this way, you need to select the proper "Configuration type" from the dropdown. Otherwise, the system will make its best effort to produce a valid configuration entity from the YAML code you paste in the box. This can happen without a visual indication that you imported the wrong configuration type which can lead to hard to debug errors.

Image removed.

Note: If you do not see the "Migration" and "Migration group" configuration types, you need to enable the Migrate Plus module.

Another thing to pay close attention is changes in formatting when pasting code from a different source. Be it GitHub, your IDE, or an online tutorial, verify that your YAML syntax is valid and whitespaces are preserved. A single extraneous whitespace can break the whole migration. For invalid syntax, the best case scenario is the system rejecting the YAML definition. Otherwise, you might encounter hard to debug errors.

In today’s example, we are going to copy the YAML definitions from GitHub. You can find the source files to copy here. When viewing a file, it is recommended to click the "Raw" button to get the content of the file in plain text. Then, copy and paste the whole file in the designated box of the "Single item" import interface. Make sure to import the four configuration entities in the order listed above using the proper type.

Image removed.

To verify that this worked, you can use the "Single item" export interface located at `/admin/config/development/configuration/single/export`. Select the "Configuration type" and the "Configuration name" for each element that was imported. You might see some extra keys in the export. As long as the ones you manually imported are properly set, you should be good to continue.

Note that we included `uuid` keys in all the example migrations. As explained in this article, that is not required. Drupal would automatically create a `uuid` when the configuration is imported. But defining one simplifies the update process. If you needed to update any of those configurations, you can directly visit the "Single item" import interface and paste the new configuration. Otherwise, you would have to export it first, copy the `uuid` value, and add it to the code to import.

Executing configuration entities from the user interface

With the migration related configuration entities in Drupal’s active configuration, you should be able to execute them from the user interface provided by Migrate Tools. It is available in  "Manage > Structure > Migration" at `/admin/structure/migrate`. You should see the `UD Config Group (JSON source)` that was imported in the previous step.

Image removed.

Note: If you do not see the "Migration" link in "Manage > Structure" interface, you need to enable the Migrate Tools module.

This user interface will list all the migration groups in the system. From there, you can get to the individual migrations. You can even inspect the migration definition from this interface. It is important to note that only migrations defined as configuration entities will appear in this interface. Migrations defined as code will not be listed.

Image removed.

For the `udm_config_group_json_source` group, click the "List migrations" button to display all the migrations in the group. Next, click the "Execute" button on the `udm_config_group_json_source_image` migration. Then, make sure "Import" is selected as the operation and click the "Execute" button. Drupal will perform the import operation for the image migration. A success status message will appear if things work as expected. You can also verify that images where imported by visiting the files listing page at `/admin/content/files`.

Repeat the same process for the `udm_config_group_json_source_paragraph` and `udm_config_group_json_source_node`migrations. The final result will be similar to the one from the JSON source example. For import operations, the system will check for migration dependencies and execute them advance if defined. That means that in this example, you could run the import operation directly on the node migration. Drupal will automatically execute the images and paragraphs migrations. Note that migration dependencies are only executed automatically for import operations. Dependent migrations will not be rolled back automatically if the main migration is rolled back individually.

This example includes a paragraph migrations. As explained in this article, if you rollback the node migration, any paragraph associated with the nodes will be deleted. The Migrate API will not be aware of this. To fix the issue and recover the paragraphs, you need to rollback the paragraph migration as well. Then you re-import the paragraph and node migrations again. Doing this from the user interface involves more manual steps compared to running a single Drush command in the terminal.

Limitations of the user interface

Although you can import and execute migrations from the user interface, this workflow comes with many limitations.The following is a list of some of the things that you should consider:

  • Updating configuration in production environments is not recommended. This can be enforced using the Configuration Read-only mode module.
  • If the imported configuration entity did not contain a `uuid`, you need to export that configuration to get the auto generated value. This should be used in subsequent configuration import operations if updates to the YAML definition files are needed. Otherwise, you will get an error like "An entity with this machine name already exists but the import did not specify a UUID."
  • The following operations to not provide any user interface feedback if they succeeded: "Rollback", "Stop", and "Reset". See this issue for more context.
  • As of this writing, most operations for CSV sources fail if using the 8.x-3.x branch of the Migrate Source CSV module. See this issue for more context.
  • As of this writing, the user interface for renaming columns in CSV sources produces a fatal error if using the 8.x-3.x branch of the Migrate Source CSV module. See this issue for more context.
  • As of this writing, it is not possible to execute all migrations in a group from the user interface in one operation. See this issue for more context.
  • The Migrate source UI module can be used to upload a file to be used as source in CSV migrations. As of this writing, a similar feature for JSON and XML files is not working. See this issue for more context.

To reiterate, it is not recommended to use the user interface to add configuration related to migrations and execute them. The extra layer of abstractions can make it harder to debug problems with migrations if they arise. If possible, execute your migrations using the commands provided by Migrate Tools. Finally, it is recommended to read this article to learn more about the difference between managing migrations as code or configuration.

What did you learn in today’s blog post? Did you know you can import migration related configuration entities from the user interface? Did you know that Migrate Tools provides a user interface for running migrations? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 29 2019
Aug 29

In the previous posts we talked about option to manage migrations as configuration entities and some of the benefits this brings. Today, we are going to learn another useful feature provided by the Migrate Plus module: migration groups. We are going to see how they can be used to execute migrations together and share configuration among them. Let’s get started.

Example definition of a migration group.

Understanding migration groups

The Migrate Plus module defines a new configuration entity called migration group. When the module is enabled, each migration can define one group they belong to. This serves two purposes:

  1. It is possible to execute operations per group. For example, you can import or rollback all migrations in the same group with one Drush command provided by the Migrate Tools module.
  2. It is is possible to declare shared configuration for all the migrations within a group. For example, if they use the same source file, the value can be set in a single place: the migration group.

To demonstrate how to leverage migration groups, we will convert the CSV source example to use migrations defined as configuration and groups. You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD configuration group migration (CSV source) whose machine name is ud_migrations_config_group_csv_source. It comes with three migrations: udm_config_group_csv_source_paragraph, udm_config_group_csv_source_image, and  udm_config_group_csv_source_node. Additionally, the demo module provides the udm_config_group_csv_source group.

Note: The Migrate Tools module provides a user interface for managing migrations defined as configuration. It is available under “Structure > Migrations” at /admin/structure/migrate. This user interface will be explained in a future entry. For today’s example, it is assumed that migrations are executed using the Drush commands provided by Migrate Plus. In the past we have used the Migrate Run module to execute migrations, but this module does not offer the ability to import or rollback migrations per group.

Creating a migration group

The migration groups are defined in YAML files using the following naming convention: migrate_plus.migration_group.[migration_group_id].yml. Because they are configuration entities, they need to be placed in the config/install directory of your module. Files placed in that directory following that pattern will be synced into Drupal’s active configuration when the module is installed for the first time (only). If you need to update the migration groups, you make the modifications to the files and then sync the configuration again. More details on this workflow can be found in this article. The following snippet shows an example migration group:

uuid: e88e28cc-94e4-4039-ae37-c1e3217fc0c4
id: udm_config_group_csv_source
label: 'UD Config Group (CSV source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'CSV resource'
shared_configuration: null

The uuid key is optional. If not set, the configuration management system will create one automatically and assign it to the migration group. Setting one simplifies the workflow for updating configuration entities as explained in this article. The id key is required. Its value is used to associate individual migrations to this particular group.

The label, description, and source_type keys are used to give details about the migration. Their value appear in the user interface provided by Migrate Tools. label is required and serves as the name of the group. description is optional and provides more information about the group. source_type is optional and gives context about the type of source you are migrating from. For example, "Drupal 7", "WordPress", "CSV file", etc.

To associate a migration to a group, set the migration_group key in the migration definition file: For example:

uuid: 97179435-ca90-434b-abe0-57188a73a0bf
id: udm_config_group_csv_source_node
label: 'UD configuration host node migration for migration group example (CSV source)'
migration_group: udm_config_group_csv_source
source: ...
process: ...
destination: ...
migration_dependencies: ...

Note that if you omit the migration_group key, it will default to a null value meaning the migration is not associated with any group. You will still be able to execute the migration from the command line, but it will not appear in the user interface provided by Migrate Tools. If you want the migration to be available in the user interface without creating a new group, you can set the migration_group key to default. This group is automatically created by Migrate Plus and can be used as a generic container for migrations.

Organizing and executing migrations

Migration groups are used to organize migrations. Migration projects usually involve several types of elements to import. For example, book reports, events, subscriptions, user accounts, etc. Each of them might require multiple migrations to be completed. Let’s consider a news articles migration. The "book report" content type has many entity reference fields: book cover (image), support documents (file), tags (taxonomy term), author (user), citations (paragraphs). In this case, you will have one primary node migration that depends on five migrations of multiple types. You can put all of them in the same group and execute them together.

It is very important not to confuse migration groups with migration dependencies. In the previous example, the primary book report node migration should still list all its dependencies in the migration_dependencies section of its definition file. Otherwise, there is no guarantee that the five migrations it depends on will be executed in advance. This could cause problems if the primary node migration is executed before images, files, taxonomy terms, users, or paragraphs have already been imported into the system.

It is possible to execute all migrations in a group by issuing a single Drush with the --group flag. This is supported by the import and rollback commands exposed by Migrate Tools. For example, drush migrate:import --group='udm_config_group_csv_source'. Note that as of this writing, there is no way to run all migrations in a group in a single operation from the user interface. You could import the main migration and the system will make sure that if any explicit dependency is set, they will be run in advance. If the group contained more migrations than the ones listed as dependencies, those will not be imported. Moreover, migration dependencies are only executed automatically for import operations. Dependent migrations will not be rolled back automatically if the main migration is rolled back individually.

Note: This example assumes you are using Drush to execute the migrations. At the time of this writing, it is not possible to rollback a CSV migration from the user interface. See this issue in the Migrate Source CSV for more context.

Sharing configuration with migration groups

Arguably, the major benefit of migration groups is the ability to share configuration among migrations. In the example, there are three migrations all reading from CSV files. Some configurations like the source plugin and header_offset can be shared. The following snippet shows an example of declaring shared configuration in the migration group for the CSV example:

uuid: e88e28cc-94e4-4039-ae37-c1e3217fc0c4
id: udm_config_group_csv_source
label: 'UD Config Group (CSV source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'CSV resource'
shared_configuration:
  dependencies:
    enforced:
      module:
        - ud_migrations_config_group_csv_source
  migration_tags:
    - UD Config Group (CSV Source)
    - UD Example
  source:
    plugin: csv
    # It is assumed that CSV files do not contain a headers row. This can be
    # overridden for migrations where that is not the case.
    header_offset: null

Any configuration that can be set in a regular migration definition file can be set under the shared_configuration key. When the migrate system loads the migration, it will read the migration group it belongs to, and pull any shared configuration that is defined. If both the migration and the group provide a value for the same key, the one defined in the migration definition file will override the one set in the migration group. If a key only exists in the group, it will be added to the migration when the definition file is loaded.

In the example, dependencies, migration_tag, and source options are being set. They will apply to all migrations that belong to the udm_config_group_csv_source group. If you updated any of these values, the changes would propagate to every migration in the group. Remember that you would need to sync the migration group configuration for the update to take effect. You can do that with partial configuration imports as explained in this article.

Any configuration set in the group can be overridden in specific migrations. In the example, the header_offset is set to null which means the CSV files do not contain a header row. The node migration uses a CSV file that contains a header row so that configuration needs to be redeclared. The following snippet shows how to do it:

uuid: 97179435-ca90-434b-abe0-57188a73a0bf
id: udm_config_group_csv_source_node
label: 'UD configuration host node migration for migration group example (CSV source)'
# Any configuration defined in the migration group can be overridden here
# by re-declaring the configuration and assigning a value.
# dependencies inherited from migration group.
# migration_tags inherited from migration group.
migration_group: udm_config_group_csv_source
source:
  # plugin inherited from migration group.
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_people.csv
  ids: [unique_id]
  # This overrides the header_offset defined in the group. The referenced CSV
  # file includes headers in the first row. Thus, a value of 0 is used.
  header_offset: 0
process: ...
destination: ...
migration_dependencies: ...

Another example would be multiple migrations reading from a remote JSON. Let’s say that instead of fetching a remote file, you want to read a local file. The only file you would have to update is the migration group. Change the data_fetcher_plugin key to file and the urls array to the path to the local file. You can try this with the ud_migrations_config_group_json_source module from the demo repository.

What did you learn in today’s blog post? Did the know that migration groups can be used to share configuration among different migrations? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 29 2019
Aug 29

In the previous post we talked about migration groups provided by the Migrate Plus module. Today, we are going to compare them to migration tags. Along the way, we are going to dive deeper into how they work and provide tips to avoid problems when working with them. Let’s get started.

Example migration group definition containing migration tags.

What is the difference between migration tags and migration groups?

In the article on declaring migration dependencies we briefly touched on the topic of tags. Here is a summary of migration tags:

  • They are a feature provided by the core Migrate API.
  • Multiple tags can be assigned to a single migration.
  • They are defined in the migration definition file alone and do not require creating a separate file.
  • Both Migrate Tools and Migrate Run provide a flag to execute operations by tag.
  • They do not allow you to share configuration among migrations tagged with the same value.

Here is a summary of migration groups:

  • You need to install the Migrate Plus module to take advantage of them.
  • Only one group can be assigned to any migration.
  • You need to create a separate file to declare group. This affects the readability of migrations as their configuration will be spread over two files.
  • Only the Migrate Tools provides a flag to execute operations by group.
  • They offer the possibility to share configuration among members of the same group.
  • Any shared configuration could be overridden in the individual migration definition files.

What do migration tags and groups have in common?

The ability to put together multiple migrations under a single name. This name can be used to import or rollback all the associated migrations in one operation. This is true for the migrate:import and migrate:rollback Drush commands provided by Migrate Plus. What you have to do is use the --group or --tag flags, respectively. The following snipped shows an example of importing and rolling back the migrations by group and tag:

$ drush migrate:import --tag='UD Config Group (JSON Source)'
$ drush migrate:rollback --tag='UD Config Group (JSON Source)'

$ drush migrate:import --group='udm_config_group_json_source'
$ drush migrate:rollback --group='udm_config_group_json_source'

Note: You might get errors indicating that the "--tag" or "--group" options do not accept a value. See this issue if you experience that problem.

Neither migration tags nor groups replace migration dependencies. If there are explicit migration dependencies among the members of a tag or group, the Migrate API will determine the order in which the migrations need to be executed. But if no dependencies are explicitly set, there is no guarantee the migrations will be executed in any particular order. Keep this in mind if you have separate migrations for entities that you later use to populate entity reference fields. Also note that migration dependencies are only executed automatically for import operations. Dependent migrations will not be rolled back automatically if the main migration is rolled back individually.

Can groups only be used for migrations defined as configuration entities?

Technically speaking, no. It is possible to use groups for migrations defined as code. Notwithstanding, migration groups can only be created as configuration entities. You would have to rebuild caches and sync configuration for changes in the migrations and groups to take effect, respectively. This is error prone and can lead to hard to debug issues.

Also, things might get confusing when executing migrations. The user interface provided by Migrate Plus works exclusively with migrations defined as configuration entities. The Drush commands provided by the same module work for both types of migrations: code and configuration. The default and null values for the migration_group key are handled a bit different between the user interface and the Drush commands. Moreover, the ability to execute operations per group using Drush commands is provided only by the Migrate Tools module. The Migrate Run module lacks this functionality.

Managing migrations as code or configuration should be a decision to take at the start of the project. If you want to use migration groups, or some of the other benefits provided by migrations defined as configuration, stick to them since the very beginning. It is possible to change at any point and the transition is straightforward. But it should be avoided if possible. In any case, try not to mix both workflows in a single project.

Tip: It is recommended to read this article to learn more about the difference between managing migrations as code or configuration.

Setting migration tags inside migration groups

As seen in this article, it is possible to use set migration tags as part of the shared configuration of a group. If you do this, it is not recommended to override the migration_tags key in individual migrations. The end result might not be what you expect. Consider the following snippets as example:

# Migration group configuration entity definition.
# File: migrate_plus.migration_group.udm_config_group_json_source.yml
uuid: 78925705-a799-4749-99c9-a1725fb54def
id: udm_config_group_json_source
label: 'UD Config Group (JSON source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'JSON resource'
  migration_tags:
    - UD Config Group (JSON Source)
    - UD Example
# Migration configuration entity definition.
# File: migrate_plus.migration.udm_config_group_json_source_node.yml
uuid: def749e5-3ad7-480f-ba4d-9c7e17e3d789
id: udm_config_group_json_source_node
label: 'UD configuration host node migration for migration group example (JSON source)'
migration_tags:
  - UD Lorem Ipsum
migration_group: udm_config_group_json_source
source: ...
process: ...
destination: ...
migration_dependencies: ...

The group configuration declares two tags: UD Config Group (JSON Source) and UD Example. The migration configuration overrides the tags to a single value UD Lorem Ipsum. What would you expect the final value for the migration_tags key be? Is it a combination of the three tags? Is it only the one key defined in the migration configuration?

The answer in this case is not very intuitive. The final migration will have two tags: UD Lorem Ipsum and UD Example. This has to do with how Migrate Plus merges the configuration from the group into the individual migrations. It uses the array_replace_recursive() PHP function which performs the merge operation based on array keys. In this example, UD Config Group (JSON Source) and UD Lorem Ipsum have the same index in the migration_tags array. Therefore, only one value is preserved in the final result.

The examples uses the migration_tags key as it is the subject of this article, but the same applies to any nested structure. Some configurations are more critical to a migration than a tag or group. Debugging a problem like this can be tricky. But the same applies to any configuration that has a nested structure. If the end result might be ambiguous, it is preferred to avoid the situation in the first place. In general, nested structures should only be set in either the group or the migration definition file, but not both. Additionally, all the recommendations for writing migrations presented in this article also apply here.

What did you learn in today’s blog post? Did you know the difference between migration tags and groups? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 24 2019
Aug 24

In the last blog post we were introduced to managing migration as configuration entities using Migrate Plus. Today, we will present some benefits and potential drawbacks of this approach. We will also show a recommended workflow for working with migration as configuration. Let’s get started.

Example workflow for managing migration configuration entities.

What is the benefit of managing migration as configurations?

At first sight, there does not seem to be a big difference between defining migrations as code or configuration. You can certainly do a lot without using Migrate Plus’ configuration entities. The series so far contains many examples of managing migrations as code. So, what are the benefits of adopting s configuration entities?

The configuration management system is one of the major features that was introduced in Drupal 8. It provides the ability to export all your site’s configuration to files. These files can be added to version control and deployed to different environments. The system has evolved a lot in the last few years, and many workflows and best practices have been established to manage configuration. On top of Drupal core’s incremental improvements, a big ecosystem has sprung in terms of contributed modules. When you manage migrations via configuration, you can leverage those tools and workflows.

Here are a few use cases of what is possible:

  • When migrations are managed in code, you need file system access to make any changes. Using configuration entities allows site administrators to customize or change the migration via the user interface. This is not about rewriting all the migrations. That should happen during development and never on production environments. But it is possible to tweak certain options. For example, administrators could change the location to the file that is going to be migrated, be it a local file or on a remote server.
  • When writing migrations, it is very likely that you will work on a subset of the data that will eventually be used to get content into the production environment.  Having migrations as configuration allow you to override part of the migration definition per environment. You could use the Configuration Split module to configure different source files or paths per environment. For example, you could link to a small sample of the data in development, a larger sample in staging, and the complete dataset in production.
  • It would be possible to provide extra configuration options via the user interface. In the article about adding HTTP authentication to fetch remote JSON and XML files, the credentials were hardcoded in the migration definition file. That is less than ideal and exposes sensitive information. An alternative would be to provide a configuration form in the administration interface for the credentials to be added. Then, the submitted values could be injected into the configuration for the migration. Again, you could make use of contrib modules like Configuration Split to make sure those credentials are never exported with the rest of your site’s configuration.
  • You could provide a user interface to upload migration source files. In fact, the Migrate source UI module does exactly this. It exposes an administration interface where you have a file field to upload a CSV file. In the same interface, you get a list of supported migrations in the system. This allows a site administrator to manually upload a file to run the migration against. Note: The module is supposed to work with JSON and XML migrations. It did not work during my tests. I opened this issue to follow up on this.

These are some examples, but many more possibilities are available. The point is that you have the whole configuration management ecosystem at your disposal. Do you have another example? Please share it in the comments.

Are there any drawbacks?

Managing configuration as configuration adds an extra layer of abstraction in the migration process. This adds a bit of complexity. For example:

  • Now you have to keep the uuid and id keys in sync. This might not seem like a big issue, but it is something to pay attention to.
  • When you work with migrations groups (explained in the next article), your migration definition could live in more file.
  • The configuration management system has its own restrictions and workflows that you need to follow, particularly for updates.
  • You need to be extra careful with your YAML syntax, especially if syncing configuration via the user interface. It is possible to import invalid configuration without getting an error. It is until the migration fails that you realize something is wrong.

Using configuration entities to define migrations certainly offers lots of benefits. But it requires being extra careful managing them.

Workflow for managing migrations as configuration entities

The configuration synchronization system has specific workflows to make changes to configuration entities. This imposes some restrictions in the way you make updates to the migration definitions. Explaining how to manage configuration could use another 31 days blog post series. ;-) For now, only a general overview will be presented. The general approach is similar to managing configuration as code. The main difference is what needs to be done for changes to the migration files to take effect.

You could use the “Configuration synchronization” administration interface at /admin/config/development/configuration. In it you have the option to export  or import a “full archive” containing all your site’s settings or a “single item” like a specific migration. This is one way to manage migrations as configuration entities which lets you find their UUIDs if not set initially. This approach can be followed by site administrators without requiring file system access. Nevertheless, it is less than ideal and error-prone. This is not the recommended way to manage migration configuration entities.

Another option is to use Drush or Drupal Console to synchronize your site’s configuration via the command line. Similarly to the user interface approach, you can export and import your full site configuration or only single elements. The recommendation is to do partial configuration imports so that only the migrations you are actively working on are updated.

Ideally, your site’s architecture is completed before the migration starts. In practice, you often work on the migration while other parts of the sites are being built. If you were to export and import the entire site’s configuration as you work on the migrations, you might inadvertently override unrelated pieces of configurations. For instance, this can lead to missing content types, changed field settings, and lots of frustration. That is why doing partial or single configuration imports is recommended. The following code snippet shows a basic Drupal workflow for managing migrations as configuration:

# 1) Run the migration.
$ drush migrate:import udm_config_json_source_node_local

# 2) Rollback migration because the expected results were not obtained.
$ drush migrate:rollback udm_config_json_source_node_local

# 3) Change the migration definition file in the "config/install" directory.

# 4a) Sync configuration by folder using Drush.
$ drush config:import --partial --source="modules/custom/ud_migrations/ud_migrations_config_json_source/config/install"

# 4b) Sync configuration by file using Drupal Console.
$ drupal config:import:single --file="modules/custom/ud_migrations/ud_migrations_config_json_source/config/install/migrate_plus.migration.udm_config_json_source_node_local.yml"

# 5) Run the migration again.
$ drush migrate:import udm_config_json_source_node_local

Note the use of the --partial and --source flags in the migration import command. Also, note that the path is relative to the current working directory from where the command is being issued. In this snippet, the value of the source flag is the directory holding your migrations. Be mindful if there are other non-migration related configurations in the same folder. If you need to be more granular, Drupal Console offers a command to import individual configuration files as shown in the previous snippet.

Note: Uninstalling and installing the module again will also apply any changes to your configuration. This might produce errors if the migration configuration entities are not removed automatically when the module is uninstalled. Read this article for details on how to do that.

What did you learn in today’s blog post? Did you know the know the benefits and trade-offs of managing migrations as configuration? Did you know what to do for changes in migration configuration entities to take effect? Share your answers in the comments. Also, I would be grateful if you share this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 23 2019
Aug 23

Today, we are going to talk about how to manage migrations as configuration entities. This functionality is provided by the Migrate Plus module. First, we will explain the difference between managing migrations as code or configuration. Then, we will show how to convert existing migrations. Finally, we will talk about some important options to include in migration configuration entities. Let’s get started.

Example of migration defined as configuration entity.

Drupal migrations: code or configuration?

So far, we have been managing migrations as code. This is functionality provided out of the box. You write the migration definition file in YAML format. Then, you place it in the migrations directory of your module. If you need to update the migration, you make the modifications to the files and then rebuild caches. More details on the workflow for migrations managed in code can be found in this article.

Migrate Plus offers an alternative to this approach. It allows you to manage migrations as configuration entities. You still use YAML files to write the migration definition files, but their location and workflow is different. They need to be placed in a config/install directory. If you need to update the migration,  you make the modifications to the files and then sync the configuration again. More details on this workflow can be found in this article.

There is one thing worth emphasizing. When managing migrations as code you need access to the file system to update and deploy the changes to the file. This is usually done by developers.  When managing migrations as configuration, you can make updates via the user interface as long as you have permissions to sync the site’s configuration. This is usually done by site administrators. You might still have to modify files depending on how you manage your configuration. But the point is that file system access to update migrations is optional. Although not recommended, you can write, modify, and execute the migrations entirely via the user interface.

Transitioning to configuration entities

To demonstrate how to transition from code to configuration entities, we are going to convert the JSON migration example. You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD config JSON source migration whose machine name is udm_config_json_source. It comes with four migrations: udm_config_json_source_paragraph, udm_config_json_source_image, udm_config_json_source_node_local, and udm_config_json_source_node_remote.

The transition to configuration entities is a two step process. First, move the migration definition files from the migrations folder to a config/install folder. Second, rename the files so that they follow this pattern: migrate_plus.migration.[migration_id].yml. For example: migrate_plus.migration.udm_config_json_source_node_local.yml. And that’s it! Files placed in that directory following that pattern will be synced into Drupal’s active configuration when the module is installed for the first time (only). Note that changes to the files require a new synchronization operation for changes to take effect. Changing the files and rebuilding caches does not update the configuration as it was the case with migrations managed in code.

If you have the Migrate Plus module enabled, it will detect the migrations and you will be able to execute them. You can continue using the Drush commands provided the Migrate Run module. Alternatively, you can install the Migrate Tools module which provides Drush commands for running both types of migrations: code and configuration. Migrate Tools also offers a user interface for executing migrations. This user interface is only for migrations defined as configuration though. It is available at /admin/structure/migrate. For now, you can run the migrations using the following Drush command: drush migrate:import udm_config_json_source_node_local --execute-dependencies.

Note: For executing migrations in the command line, choose between Migrate Run or Migrate Tools. You pick one or the other, but not both as the commands provided by the two modules have the same name. Another thing to note is that the example uses Drush 9. There were major refactorings between versions 8 and 9 which included changes to the name of the commands.

UUIDs for migration configuration entities

When managing migrations as configuration, you can set extra options. Some are exposed by Migrate Plus while others come from Drupal’s configuration management system. Let’s see some examples.

The most important new option is defining a UUID for the migration definition file. This is optional, but adding one will greatly simplify the workflow to update migrations. The UUID is used to keep track of every piece of configuration in the system. When you add new configuration, Drupal will read the UUID value if provided and update that particular piece of configuration. Otherwise, it will create a UUID on the fly, attach it to the configuration definition, and then import it. That is why you want to set a UUID value manually. If changes need to be made, you want to update the same configuration, not create a new one. If no UUID was originally set, you can get the automatically created value by exporting the migration definition. The workflow for this is a bit complicated and error prone so always include a UUID with your migrations. This following snippet shows an example UUID:

uuid: b744190e-3a48-45c7-97a4-093099ba0547
id: udm_config_json_source_node_local
label: 'UD migrations configuration example'

The UUID a string of 32 hexadecimal digits displayed in 5 groups. Each is separated by hyphens following this pattern: 8-4-4-4-12. In Drupal, two or more pieces of configuration cannot share the same value. Drupal will check the UUID and the type of configuration in sync operations. In this case the type is signaled by the migrate_plus.migration. prefix in the name of the migration definition file.

When using configuration entities, a single migration is identified by two different options. The uuid is used by the Drupal’s configuration system and the id is used by the Migrate API. Always make sure that this combination is kept the same when updating the files and syncing the configuration. Otherwise you might get hard to debug errors. Also, make sure you are importing the proper configuration type. The latter should not be something to worry about unless you utilize the user interface to export or import single configuration items.

Tip: If you do not have a UUID in advance for your migration, you can copy one from another piece of configuration and change some of the hexadecimal digits. Keep in mind that this could lead to a duplicated UUID if you happen to make the exact changes to the UUID in two separate files. Another option is to use a tool to generate the values. Searching online for UUID generators will yield many  tools for this.

Automatically deleting migration configuration entities

By default, configuration remains in the system even if the module that added it gets uninstalled. This can cause problems if your migration depends on custom migration plugins provided by your module. It is possible to enforce that migration entities get removed when your custom module is uninstalled. To do this, you leverage the dependencies option provided by Drupal’s configuration management system. The following snippet shows how to do it:

uuid: b744190e-3a48-45c7-97a4-093099ba0547
id: udm_config_json_source_node_local
label: 'UD migrations configuration example'
dependencies:
  enforced:
    module:
      - ud_migrations_config_json_source

You add the machine name of your module to dependencies > enforced > module array. This adds an enforced dependency on your own module. The effect is that the migration will be removed from Drupal’s active configuration when your custom module is uninstalled. Note that the top level dependencies array can have others keys in addition to enforced. For example: config and module. Learning more about them is left as an exercise for the curious reader.

It is important not to confuse the dependencies and migration_dependencies options. The former is provided by Drupal’s configuration management system and was just explained. The latter is provided by the Migrate API and is used to declare migrations that need be imported in advance. Read this article to know more about this feature. The following snippet shows an example:

uuid: b744190e-3a48-45c7-97a4-093099ba0547
id: udm_config_json_source_node_local
label: 'UD migrations configuration example'
dependencies:
  enforced:
    module:
      - ud_migrations_config_json_source
migration_dependencies:
  required:
    - udm_config_json_source_image
    - udm_config_json_source_paragraph
  optional: []

What did you learn in today’s blog post? Did you know that you can manage migrations in two ways: code or configuration? Did you know that file name and location as well as workflows need to be adjusted depending on which approach you follow? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 21 2019
Aug 21

Today we will learn how to migrate content from LibreOffice Calc and Microsoft Excel files into Drupal using the Migrate Spreadsheet module. We will give instructions on getting the module and its dependencies. Then, we will present how to configure the module for spreadsheets with or without a header row. There are two example migrations: images and paragraphs. Let’s get started.

Example configuration for Microsoft Excel and LibreOffice Calc migration.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD Google Sheets, Microsoft Excel, and LibreOffice Calc source migration whose machine name is ud_migrations_sheets_sources. It comes with four migrations: udm_google_sheets_source_node.yml, udm_libreoffice_calc_source_paragraph.yml, udm_microsoft_excel_source_image.yml, and udm_backup_csv_source_node.yml. The image migration uses a Microsoft Excel file as source. The paragraph migration uses a LibreOffice Calc file as source. The CSV migration is a backup in case the Google Sheet is not available. To execute the last one you would need the Migrate Source CSV module.

You can get the Migrate Google Sheets module using composer: composer require drupal/migrate_spreadsheet:^1.0. This module depends on the PHPOffice/PhpSpreadsheet library and many PHP extensions including ext-zip. Check this page for a full list of dependencies. If any required extension is missing the installation will fail. If your Drupal site is not composer-based, you will not be able to use Migrate Spreadsheet, unless you go around a lot of hoops.

Understanding the example set up

This migration will reuse the same configuration from the introduction to paragraph migrations example. Refer to that article for details on the configuration. The destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain Microsoft Excel and LibreOffice Calc migrations. The end result will again be nodes containing an image and a paragraph with information about someone’s favorite book. The major difference is that we are going to read from different sources.

Note: You can literally swap migration sources without changing any other part of the migration.  This is a powerful feature of ETL frameworks like Drupal’s Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some machine names had to be changed to avoid conflicts with other examples in the demo repository.

Understanding the source document and plugin configuration

In any migration project, understanding the source is very important. For Microsoft Excel and LibreOffice Calc migrations, the primary thing to consider is whether or not the file contains a row of headers. Also, a workbook (file) might contain several worksheets (tabs). You can only migrate from one worksheet at a time. The example documents have two worksheets: UD Example Sheet and Do not peek in here. We are going to be working with the first one.

The spreadsheet source plugin exposes seven configuration options. The values to use might change depending on the presence of a header row, but all of them apply for both types of document. Here is a summary of the available configurations:

  • file is required. It stores the path to the document to process. You can use a relative path from the Drupal root, an absolute path, or stream wrappers.
  • worksheet is required. It contains the name of the one worksheet to process.
  • header_row is optional. This number indicates which row containing the headers. Contrary to CSV migrations, the row number is not zero-based. So, set this value to 1 if headers are on the first row, 2 if they are on the second, and so on.
  • origin is optional and defaults to A2. It indicates which non-header cell contains the first value you want to import. It assumes a grid layout and you only need to indicate the position of the top-left cell value.
  • columns is optional. It is the list of columns you want to make available for the migration. In case of files with a header row, use those header values in this list. Otherwise, use the default title for columns: A, B, C, etc. If this setting is missing, the plugin will return all columns. This is not ideal, especially for very large files containing more columns than needed for the migration.
  • row_index_column is optional. This is a special column that contains the row number for each record. This can be used as unique identifier for the records in case your dataset does not provide a suitable value. Exposing this special column in the migration is up to you. If so, you can come up with any name as long as it does not conflict with header row names set in the columns configuration. Important: this is an autogenerated column, not any of the columns that come with your dataset.
  • keys is optional and, if not set, it defaults to the value of row_index_column. It contains an array of column names that uniquely identify each record. For files with a header row, you can use the values set in the columns configuration. Otherwise, use default column titles like A, B, C, etc. In both cases, you can use the row_index_column column if it was set. Each value in the array will contain database storage details for the column.

Note that nowhere in the plugin configuration you specify the file type. The same setup applies for both Microsoft Excel and LibreOffice Calc files. The library will take care of detecting and validating the proper type.

Migrating spreadsheet files with a header row

This example is for the paragraph migration and uses a LibreOffice Calc file. The following snippets shows the UD Example Sheet worksheet and the configuration of the source plugin:

book_id, book_title, Book author
B10, The definitive guide to Drupal 7, Benjamin Melançon et al.
B20, Understanding Drupal Views, Carlos Dinarte
B30, Understanding Drupal Migrations, Mauricio Dinarte
source:
  plugin: spreadsheet
  file: modules/custom/ud_migrations/ud_migrations_sheets_sources/sources/udm_book_paragraph.ods
  worksheet: 'UD Example Sheet'
  header_row: 1
  origin: A2
  columns:
    - book_id
    - book_title
    - 'Book author'
  row_index_column: 'Document Row Index'
  keys:
    book_id:
      type: string

The name of the plugin is spreadsheet. Then you use the file configuration to indicate the path to the file. In this case, it is relative to the Drupal root. The UD Example Sheet is set as the worksheet to process. Because the first row of the file contains the header rows, then header_row is set to 1 and origin to A2.

Then specify which columns to make available to the migration. In this case, we listed all of them so this setting could have been left unassigned. It is better to get into the habit of being explicit about what you import. If the file were to change and more columns were added, you would not have to update the file to prevent unneeded data to be fetched. The row_index_column is not actually used in the migration, but it is set to show all the configuration options in the example. The values will be 1, 2, 3, etc.  Finally, the keys is set the column that serves as unique identifiers for the records.

The rest of the migration is almost identical to the CSV example. Small changes were made to prevent machine name conflicts with other examples in the demo repository. For reference, the following snippet shows the process and destination sections for the LibreOffice Calc paragraph migration.

process:
  field_ud_book_paragraph_title: book_title
  field_ud_book_paragraph_author: 'Book author'
destination:
  plugin: 'entity_reference_revisions:paragraph'
  default_bundle: ud_book_paragraph

Migrating spreadsheet files without a header row

Now let’s consider an example of a spreadsheet file that does not have a header row. This example is for the image migration and uses a Microsoft Excel file. The following snippets shows the UD Example Sheet worksheet and the configuration of the source plugin:

P01, https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg
P02, https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg
P03, https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg
source:
  plugin: spreadsheet
  file: modules/custom/ud_migrations/ud_migrations_sheets_sources/sources/udm_book_paragraph.ods
  worksheet: 'UD Example Sheet'
  header_row: 1
  origin: A2
  columns:
    - book_id
    - book_title
    - 'Book author'
  row_index_column: 'Document Row Index'
  keys:
    book_id:
      type: string

The plugin, file, amd worksheet configurations follow the same pattern as the paragraph migration. The difference for files with no header row is reflected in the other parameters. header_row is set to null to indicate the lack of headers and origin is to A1. Because there are no column names to use, you have to use the ones provided by the spreadsheet. In this case, we want to use the first two columns: A and B. Contrary to CSV migrations, the spreadsheet plugin does not allow you to define aliases for unnamed columns. That means that you would have to use A, B in the process section to refer to these columns.

row_index_column is set to null because it will not be used. And finally, in the keys section, we use the A column as the primary key. This might seem like an odd choice. Why use that value if you could use the row_index_column as the unique identifier for each row? If this were an isolated migration, that would be a valid option. But this migration is referenced from the node migration explained in the previous example. The lookup is made based on the values stored in the A column. If we used the index of the row as the unique identifier, we would have to update the other migration or the lookup would fail. In many cases, that is not feasible nor desirable.

Except for the name of the columns, the rest of the migration is almost identical to the CSV example. Small changes were made to prevent machine name conflicts with other examples in the demo repository. For reference, the following snippet shows part of the process and destination section for the Microsoft Excel image migration.

process:
  psf_destination_filename:
    plugin: callback
    callable: basename
    source: B # This is the photo URL column.
destination:
  plugin: 'entity:file'

Refer to this entry to know how to run migrations that depend on others. In this case, you can execute them all by running: drush migrate:import --tag='UD Sheets Source'. And that is how you can use Microsoft Excel and LibreOffice Calc files as the source of your migrations. This example is very interesting because each of the migration uses a different source type. The node migration explained in the previous post uses a Google Sheet. This is a great example of how powerful and flexible the Migrate API is.

What did you learn in today’s blog post? Have you migrated from Microsoft Excel and LibreOffice Calc files before? If so, what challenges have you found? Did you know the source plugin configuration is not dependent on the file type? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 21 2019
Aug 21

Today we will learn how to migrate content from Google Sheets into Drupal using the Migrate Google Sheets module. We will give instructions on how to publish them in JSON format to be consumed by the migration. Then, we will talk about some assumptions made by the module to allow easier plugin configurations. Finally, we will present the source plugin configuration for Google Sheets migrations. Let’s get started.

Example configuration for Google Sheets migration

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD Google Sheets, Microsoft Excel, and LibreOffice Calc source migration whose machine name is ud_migrations_sheets_sources. It comes with four migrations: udm_google_sheets_source_node.yml, udm_libreoffice_calc_source_paragraph.yml, udm_microsoft_excel_source_image.yml, and udm_backup_csv_source_node.yml. The last one is a backup in case the Google Sheet is not available. To execute it you would need the Migrate Source CSV module.

You can get the Migrate Google Sheets module and its dependency using composer: composer require drupal/migrate_google_sheets:^1.0'. It depends on Migrate Plus. Installing via composer will get you both modules.  If your Drupal site is not composer-based, you can download them manually.

Understanding the example set up

This migration will reuse the same configuration from the introduction to paragraph migrations example. Refer to that article for details on the configuration. The destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain Google Sheets migrations. The end result will again be nodes containing an image and a paragraph with information about someone’s favorite book. The major difference is that we are going to read from different sources. In the next article, two of the migrations will be explained. They read from Microsoft Excel and LibreOffice Calc files.

Note: You can literally swap migration sources without changing any other part of the migration.  This is a powerful feature of ETL frameworks like Drupal’s Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some machine names had to be changed to avoid conflicts with other examples in the demo repository.

Migrating nodes from Google Sheets

In any migration project, understanding the source is very important. For Google Sheets, there are many details that need your attention. First, the module works on top of Migrate Plus and extends its JSON data parser. In fact, you have to publish your Google Sheet and consume it in JSON format. Second, you need to make the JSON export publicly available. Third, you must understand the JSON format provided by Google Sheets and the assumptions made by the module to configure your fields properly. Specific instructions for Google Sheets migrations will be provided. That being said, everything explained in the JSON migration example is applicable in this case too.

Publishing a Google Sheet in JSON format

Before starting the migration, you need the source from where you will extract the data. For this, create a Google Sheet document. The example will use this one:

https://docs.google.com/spreadsheets/d/1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk/edit#gid=0

The 1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk value is the worksheet ID which will be used later. Once you are done creating the document, you need to publish it so it can be consumed by the Migrate API. To do this, go to the File menu and then click on Publish to the web. A modal window will appear where you can configure the export. Note that it is possible to publish the Entire document or only some of the worksheets (tabs). The example document has two: UD Example Sheet and Do not peek in here. Make sure that all the worksheets that you need are published or export the entire document. Unless multiple urls are configured, a migration can only import from one worksheet at a time. If you fetch from multiple urls they need to have homogeneous structures. When you click the Publish button, a new URL will be presented. In the example it is:

https://docs.google.com/spreadsheets/d/e/2PACX-1vTy2-CGzsoTBkmvYbolFh0UDWenwd9OCdel55j9Qa37g_earT1vA6y-6phC31Xkj8sTWF0o6mZTM90H/pubhtml

The previous URL will not be used. Publishing a document is a required step, but the URL that you get should be ignored. Note that you do not have to share the document. It is fine that the document is private to you as long as it is published. It is up to you if you want to make it available to Anyone with the link or Public on the web and potentially grant edit or comment access. The Share setting does not affect the migration. The final step is getting the JSON representation of the document. You need to assemble a URL with the following pattern:

http://spreadsheets.google.com/feeds/list/[workbook-id]/[worksheet-index]/public/values?alt=json

Replace the [workbook-id] by worksheet ID mentioned at the beginning of this section, the one that is part of the regular document URL, not the published URL. The worksheet-index is an integer number starting that represents the order in which worksheets appear in the document. Use 1 for the first, 2 for the second, and so on. This means that changing the order of the worksheets will affect your migration. At the very least, you will have to update the path to reflect the new index. In the example migration, the UD Example Sheet worksheet will be used. It appears first in the document so worksheet index is 1. Therefore, the exported JSON will be available at the following URL:

http://spreadsheets.google.com/feeds/list/1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk/1/public/values?alt=json

Understanding the published Google Sheet JSON export

Take a moment to read the JSON export and try to understand its structure. It contains much more data than what you need. The records to be imported can be retrieved using this XPath expression: /feed/entry. You would normally have to assign this value to the item_selector configuration of the Migrate Plus’ JSON data parser. But, because the value is the same for all Google Sheets, the module takes care of this automatically. You do not have to set that configuration in the source section. As for the data cells, have a look at the following code snippet to see how they appear on the export:

{
  "feed": {
    "entry": [
      {
        "gsx$uniqueid": {
          "$t": "1"
        },
        "gsx$name": {
          "$t": "One Uno Un"
        },
        "gsx$photo-file": {
          "$t": "P01"
        },
        "gsx$bookref": {
          "$t": "B10"
        }
      }
    ]
  }
}

Tip: Firefox includes a built-in JSON document viewer which helps a lot in understanding the structure of the document. If your browser does not include a similar tool out of the box, look for one in their extensions repository. You can also use a file formatter to pretty print the JSON output.

The following is a list of headers as they appear in the Google Sheet compared to how they appear in the JSON export:

  • unique_id appears like gsx$uniqueid.
  • name appears like gsx$name.
  • photo-file appears like gsx$photo-file.
  • Book Ref appears like gsx$bookref.

So, the header name from Google Sheet gets transformed in the JSON export. They get a prefix of gsx$ and the header name is transformed to all lowercase letters with spaces and most special characters removed. On top of this, the actual cell value, that you will eventually import, is in a $t property one level under the header name. Now, you should create a list of fields to migrate using XPath expressions as selectors. For example, for the Book Ref header, the selector would be gsx$bookref/$t. But that is not the way to configure the Google Sheets data parser. The module makes some assumptions to make the selector clearer. So, the gsx$ prefix and /$t hierarchy are assumed. For the selector, you only need to use the transformed name. In this case: uniqueid, name, photo-file, and bookref.

Configuring the Migrate Google Sheets source plugin

With the JSON export of the Google Sheet and the list of transformed header names, you can proceed to configure the plugin. It will be very similar to configuring a remote JSON migration. The following code snippet shows source configuration for the node migration:

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: google_sheets
  urls: 'http://spreadsheets.google.com/feeds/list/1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk/1/public/values?alt=json'
  fields:
    - name: src_unique_id
      label: 'Unique ID'
      selector: uniqueid
    - name: src_name
      label: 'Name'
      selector: name
    - name: src_photo_file
      label: 'Photo ID'
      selector: photo-file
    - name: src_book_ref
      label: 'Book paragraph ID'
      selector: bookref
  ids:
    src_unique_id:
      type: integer

You use the url plugin, the http fetcher, and the google_sheets parser. The latter is provided by the module. The urls configuration is set to the exported JSON link. The item_selector is not configured because the /feed/entry value is assumed. The fields are configured as in the JSON migration with the caveat of using the transformed header values for the selector. Finally, you need to set the ids key to a combination of fields that uniquely identify each record.

The rest of the migration is almost identical to the JSON example. Small changes were made to prevent machine name conflicts with other examples in the demo repository. For reference, the following snippet shows part of the process, destination, and dependencies section for the Google Sheets migration.

process:
  field_ud_image/target_id:
    plugin: migration_lookup
    migration: udm_microsoft_excel_source_image
    source: src_photo_file
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - udm_microsoft_excel_source_image
    - udm_libreoffice_calc_source_paragraph
  optional: []

Note that the node migration depends on an image and paragraph migration. They are already available in the example. One uses a Microsoft Excel file as the source while the other a LibreOffice Calc document. Both of these migrations will be explained in the next article. Refer to this entry to know how to run migrations that depend on others. For example, you can run: drush migrate:import --tag='UD Sheets Source'.

What did you learn in today’s blog post? Have you migrated from Google Sheets before? If so, what challenges have you found? Did you know the procedure to export a sheet in JSON format? Did you know that the Migrate Google Sheets module is an extension of Migrate Plus? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

Aug 21 2019
Aug 21

Today we will learn how to migrate content from a XML file into Drupal using the Migrate Plus module. We will show how to configure the migration to read files from the local file system and remote locations. We will also talk about the difference between two data parsers provided the module. The example includes node, images, and paragraphs migrations. Let’s get started.

Example configuration of XML source migration

Note: Migrate Plus has many more features. For example, it contains source plugins to import from JSON files and SOAP endpoints. It provides many useful process plugins for DOM manipulation, string replacement, transliteration, etc. The module also lets you define migration plugins as configurations and create groups to share settings. It offers a custom event to modify the source data before processing begins. In today’s blog post, we are focusing on importing XML files. Other features will be covered in future entries.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD XML source migration whose machine name is ud_migrations_xml_source. It comes with four migrations: udm_xml_source_paragraph, udm_xml_source_image, udm_xml_source_node_local, and udm_xml_source_node_remote.

You can get the Migrate Plus module using composer: composer require 'drupal/migrate_plus:^5.0'. This will install the 8.x-5.x branch where new development will happen. This branch was created to introduce breaking changes in preparation for Drupal 9. As of this writing, the 8.x-4.x branch has feature parity with the newer branch. If your Drupal site is not composer-based, you can download the module manually.

Understanding the example set up

This migration will reuse the same configuration from the introduction to paragraph migrations example. Refer to that article for details on the configuration: the destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain XML migrations. The end result will again be nodes containing an image and a paragraph with information about someone’s favorite book. The major difference is that we are going to read from XML. In fact, three of the migrations will read from the same file. The following snippet shows a reduced version of the file to get a sense of its structure:


<?xml version="1.0" encoding="UTF-8" ?>
<data>
<udm_people>
<unique_id>1</unique_id>
<name>Michele Metts</name>
<photo_file>P01</photo_file>
<book_ref>B10</book_ref>
</udm_people>
<udm_people>
...
</udm_people>
<udm_people>
...
</udm_people>
<udm_book_paragraph>
<book_id>B10</book_id>
<book_details>
<title>The definite guide to Drupal 7</title>
<author>Benjamin Melançon et al.</author>
</book_details>
</udm_book_paragraph>
<udm_book_paragraph>
...
</udm_book_paragraph>
<udm_book_paragraph>
...
</udm_book_paragraph>
<udm_photos>
<photo_id>P01</photo_id>
<photo_url>https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg</photo_url>
<photo_dimensions>
<width>240</width>
<height>351</height>
</photo_dimensions>
</udm_photos>
<udm_photos>
...
</udm_photos>
<udm_photos>
...
</udm_photos>

Note: You can literally swap migration sources without changing any other part of the migration.  This is a powerful feature of ETL frameworks like Drupal’s Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some machine names had to be changed to avoid conflicts with other examples in the demo repository.

Migrating nodes from a XML file

In any migration project, understanding the source is very important. For XML migrations, there are two major considerations. First, where in the XML tree hierarchy lies the data that you want to import. It can be at the root of the file or several levels deep in the hierarchy. You use an XPath expression to select a set of nodes from the XML document. In this article, the term element when referring to an XML document node to distinguish it from a Drupal node.  Second, when you get to the set of elements that you want to import, what child elements are going to be made available to the migration. It is possible that each element contains more data than needed. In XML imports, you have to manually include the child elements that will be required for the migration. The following code snippet shows part of the local XML file relevant to the node migration:


<?xml version="1.0" encoding="UTF-8" ?>
<data>
<udm_people>
<unique_id>1</unique_id>
<name>Michele Metts</name>
<photo_file>P01</photo_file>
<book_ref>B10</book_ref>
</udm_people>
<udm_people>
...
</udm_people>
<udm_people>
...
</udm_people>
</data>

The set of elements containing node data lies two levels deep in the hierarchy. Starting with data at the root and then descending one level to udm_people. Each element of this array is an object with four properties:

  • unique_id is the unique identifier for each element returned by the data/udm_people hierarchy.
  • name is the name of a person. This will be used in the node title.
  • photo_file is the unique identifier of an image that was created in a separate migration.
  • book_ref is the unique identifier of a book paragraph that was created in a separate migration.

The following snippet shows the configuration to read a local XML file for the node migration:


source:
  plugin: url
  # This configuration is ignored by the 'xml' data parser plugin.
  # It only has effect when using the 'simple_xml' data parser plugin.
  data_fetcher_plugin: file
  # Set to 'xml' to use XMLReader https://www.php.net/manual/en/book.xmlreader.php
  # Set to 'simple_xml' to use SimpleXML https://www.php.net/manual/en/ref.simplexml.php
  data_parser_plugin: xml
  urls:
    - modules/custom/ud_migrations/ud_migrations_xml_source/sources/udm_data.xml
  # XPath expression. It is common that it starts with a slash (/).
  item_selector: /data/udm_people
  fields:
    - name: src_unique_id
      label: 'Unique ID'
      selector: unique_id
    - name: src_name
      label: 'Name'
      selector: name
    - name: src_photo_file
      label: 'Photo ID'
      selector: photo_file
    - name: src_book_ref
      label: 'Book paragraph ID'
      selector: book_ref
  ids:
    src_unique_id:
      type: integer

The name of the plugin is url. Because we are reading a local file, the data_fetcher_plugin  is set to file and the data_parser_plugin to xml. The urls configuration contains an array of file paths relative to the Drupal root. In the example we are reading from one file only, but you can read from multiple files at once. In that case, it is important that they have a homogeneous structure. The settings that follow will apply equally to all the files listed in urls.

Technical note: Migrate Plus provides two data parser plugins for XML files. xml uses XMLReader while simple_xml uses SimpleXML. The parser to use is configured in the data_parser_plugin configuration. Also note that when you use the xml parser, the data_fetcher_plugin setting is ignored. More details below.

The item_selector configuration indicates where in the XML file lies the set of elements to be migrated. Its value is an XPath expression used to traverse the file hierarchy. In this case, the value is /data/udm_people. Verify that your expression is valid and select the elements you intend to import. It is common that it starts with a slash (/).

fields has to be set to an array. Each element represents a field that will be made available to the migration. The following options can be set:

  • name is required. This is how the field is going to be referenced in the migration. The name itself can be arbitrary. If it contained spaces, you need to put double quotation marks (") around it when referring to it in the migration.
  • label is optional. This is a description used when presenting details about the migration. For example, in the user interface provided by the Migrate Tools module. When defined, you do not use the label to refer to the field. Keep using the name.
  • selector is required. This is another XPath-like string to find the field to import. The value must be relative to the subtree specified by the item_selector configuration. In the example, the fields are direct children of the elements to migrate. Therefore, the XPath expression only includes the element name (e.g., unique_id). If you had nested elements, you could use a slash (/) character to go deeper in the hierarchy. This will be demonstrated in the image and paragraph migrations.

Finally, you specify an ids array of field names that would uniquely identify each record. As already stated, the unique_id field servers that purpose. The following snippet shows part of the process, destination, and dependencies configuration of the node migration:

process:
  field_ud_image/target_id:
    plugin: migration_lookup
    migration: udm_xml_source_image
    source: src_photo_file
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - udm_xml_source_image
    - udm_xml_source_paragraph
  optional: []

The source for the setting the image reference is src_photo_file. Again, this is the name of the field, not the label nor selector. The configuration of the migration lookup plugin and dependencies point to two XML migrations that come with this example. One is for migrating images and the other for migrating paragraphs.

Migrating paragraphs from a XML file

Let’s consider an example where the elements to migrate have many levels of nesting. The following snippets show part of the local XML file and source plugin configuration for the paragraph migration:


<?xml version="1.0" encoding="UTF-8" ?>
<data>
  <udm_book_paragraph>
    <book_id>B10</book_id>
    <book_details>
      <title>The Definitive Guide to Drupal 7</title>
      <author>Benjamin Melançon et al.</author>
    </book_details>
  </udm_book_paragraph>
  <udm_book_paragraph>
    ...
  </udm_book_paragraph>
  <udm_book_paragraph>
    ...
  </udm_book_paragraph>
</data>
source:
  plugin: url
  # This configuration is ignored by the 'xml' data parser plugin.
  # It only has effect when using the 'simple_xml' data parser plugin.
  data_fetcher_plugin: file
  # Set to 'xml' to use XMLReader https://www.php.net/manual/en/book.xmlreader.php
  # Set to 'simple_xml' to use SimpleXML https://www.php.net/manual/en/ref.simplexml.php
  data_parser_plugin: xml
  urls:
    - modules/custom/ud_migrations/ud_migrations_xml_source/sources/udm_data.xml
  # XPath expression. It is common that it starts with a slash (/).
  item_selector: /data/udm_book_paragraph
  fields:
    - name: src_book_id
      label: 'Book ID'
      selector: book_id
    - name: src_book_title
      label: 'Title'
      selector: book_details/title
    - name: src_book_author
      label: 'Author'
      selector: book_details/author
  ids:
    src_book_id:
      type: string

The plugin, data_fetcher_plugin, data_parser_plugin and urls configurations have the same values as in the node migration. The item_selector and ids configurations are slightly different to represent the path to paragraph elements and the unique identifier field, respectively.

The interesting part is the value of the fields configuration. Taking data/udm_book_paragraph as a starting point, the records with paragraph data have a nested structure. Particularly, the book_details element has two children: title and author. To refer to them, the selectors are book_details/title and book_details/author, respectively. Note that you can go as many level deeps in the hierarchy to find the value that should be assigned to the field. Every level in the hierarchy could be separated by a slash (/).

In this example, the target is a single paragraph type. But a similar technique can be used to migrate multiple types. One way to configure the XML file is having two children. paragraph_id would contain the unique identifier for the record. paragraph_data would contain a child element to specify the paragraph type. It would also have an arbitrary number of extra child elements with the data to be migrated. In the process section, you would iterate over the children to map the paragraph fields.

The following snippet shows part of the process configuration of the paragraph migration:

process:
  field_ud_book_paragraph_title: src_book_title
  field_ud_book_paragraph_author: src_book_author

Migrating images from a XML file

Let’s consider an example where the elements to migrate have more data than needed. The following snippets show part of the local XML file and source plugin configuration for the image migration:


<!--?xml version="1.0" encoding="UTF-8" ?-->
<data>
  <udm_photos>
<photo_id>P01</photo_id>
<photo_url>https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg</photo_url>
<photo_dimensions>
      <width>240</width>
      <height>351</height>
    </photo_dimensions>
  </udm_photos>
  <udm_photos>
    ...
  </udm_photos>
  <udm_photos>
    ...
  </udm_photos>
</data>
source:
  plugin: url
  # This configuration is ignored by the 'xml' data parser plugin.
  # It only has effect when using the 'simple_xml' data parser plugin.
  data_fetcher_plugin: file
  # Set to 'xml' to use XMLReader https://www.php.net/manual/en/book.xmlreader.php
  # Set to 'simple_xml' to use SimpleXML https://www.php.net/manual/en/ref.simplexml.php
  data_parser_plugin: xml
  urls:
    - modules/custom/ud_migrations/ud_migrations_xml_source/sources/udm_data.xml
  # XPath expression. It is common that it starts with a slash (/).
  item_selector: /data/udm_photos
  fields:
    - name: src_photo_id
      label: 'Photo ID'
      selector: photo_id
    - name: src_photo_url
      label: 'Photo URL'
      selector: photo_url
  ids:
    src_photo_id:
      type: string

The following snippet shows part of the process configuration of the image migration:

process:
  psf_destination_filename:
    plugin: callback
    callable: basename
    source: src_photo_url

The plugin, data_fetcher_plugin, data_parser_plugin and urls configurations have the same values as in the node migration. The item_selector and ids configurations are slightly different to represent the path to image elements and the unique identifier field, respectively.

The interesting part is the value of the fields configuration. Taking data/udm_photos as a starting point, the elements with image data have extra children that are not used in the migration. Particularly, the photo_dimensions element has two children representing the width and height of the image. To ignore this subtree, you simply omit it from the fields configuration. In case you wanted to use it, the selectors would be photo_dimensions/width and photo_dimensions/height, respectively.

XML file location

Important: What is described in this section only applies when you use either (1) the xml data parser or (2) the simple_xml parser with the file data fetcher.

When using the file data fetcher plugin, you have three options to indicate the location to the XML files in the urls configuration:

  • Use a relative path from the Drupal root. The path should not start with a slash (/). This is the approach used in this demo. For example, modules/custom/my_module/xml_files/example.xml.
  • Use an absolute path pointing to the XML location in the file system. The path should start with a slash (/). For example, /var/www/drupal/modules/custom/my_module/xml_files/example.xml.
  • Use a fully-qualified URL to any built-in wrapper like http, https, ftp, ftps, etc. For example, https://understanddrupal.com/xml-files/example.xml.
  • Use a custom stream wrapper.

Being able to use stream wrappers gives you many more options. For instance:

Migrating remote XML files

Important: What is described in this section only applies when you use the http data fetcher plugin.

Migrate Plus provides another data fetcher plugin named http. Under the hood, it uses the Guzzle HTTP Client library. You can use it to fetch files using any protocol supported by curl like http, https, ftp, ftps, sftp, etc. In a future blog post we will explain this data fetcher in more detail. For now, the udm_xml_source_node_remote migration demonstrates a basic setup for this plugin. Note that only the data_fetcher_plugindata_parser_plugin, and urls configurations are different from the local file example. The following snippet shows part of the configuration to read a remote XML file for the node migration:

source:
  plugin: url
  data_fetcher_plugin: http
  # 'simple_xml' is configured to be able to use the 'http' fetcher.
  data_parser_plugin: simple_xml
  urls:
    - https://sendeyo.com/up/d/478f835718
  item_selector: /data/udm_people
  fields: ...
  ids: ...

And that is how you can use XML files as the source of your migrations. Many more configurations are possible when you use the simple_xml parser with the http fetcher. For example, you can provide authentication information to get access to protected resources. You can also set custom HTTP headers. Examples will be presented in a future entry.

XMLReader vs SimpleXML in Drupal migrations

As noted in the module’s README file, the xml parser plugin uses the XMLReader interface to incrementally parse XML files. The reader acts as a cursor going forward on the document stream and stopping at each node on the way. This should be used for XML sources which are potentially very large. On the other than, the simple_xml parser plugin uses the SimpleXML interface to fully parse XML files. This should be used for XML sources where you need to be able to use complex XPath expressions for your item selectors, or have to access elements outside of the current item element via XPath.

What did you learn in today’s blog post? Have you migrated from XML files before? If so, what challenges have you found? Did you know that you can read local and remote files? Did you know that the data_fetcher_plugin configuration is ignored when using the xml data parser? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series is made possible thanks to these generous sponsors. Contact us if your organization would like to support this documentation project, whether it is the migration series or other topics.

Next: Adding HTTP request headers and authentication to remote JSON and XML in Drupal migrations

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 19 2019
Aug 19

In the previous two blog posts, we learned to migrate data from JSON and XML files. We presented to configure the migrations to fetch remote files. In today's blog post, we will learn how to add HTTP request headers and authentication to the request. . For HTTP authentication, you need to choose among three options: Basic, Digest, and OAuth2. To provide this functionality, the Migrate API leverages the Guzzle HTTP Client library. Usage requirements and limitations will be presented. Let's begin.

Example config to add HTTP headers and authentication

Migrate Plus architecture for remote data fetching

The Migrate Plus module provides an extensible architecture for importing remote files. It makes use of different plugin types to fetch file, add HTTP authentication to the request, and parse the response. The following is an overview of the different plugins and how they work together to allow code and configuration reuse.

Source plugin

The url source plugin is at the core of the implementation. Its purpose is to retrieve data from a list of URLs. Ingrained in the system is the goal to separate the file fetching from the file parsing. The url plugin will delegate both tasks to other plugin types provided by Migrate Plus.

Data fetcher plugins

For file fetching, you have two options. A general-purpose file fetcher for getting files from the local file system or via stream wrappers. This plugin has been explained in detail on the posts about JSON and XML migrations. Because it supports stream wrapper, this plugin is very useful to fetch files from different locations and over different protocols. But it has two major downsides. First, it does not allow setting custom HTTP headers nor authentication parameters. Second, this fetcher is completely ignored if used with the xml or soap data parser (see below).

The second fetcher plugin is http. Under the hood, it uses the Guzzle HTTP Client library. This plugin allows you to define a headers configuration. You can set it to a list of HTTP headers to send along with the request. It also allows you to use authentication plugins (see below). The downside is that you cannot use stream wrappers. Only protocols supported by curl can be used: http, https, ftp, ftps, sftp, etc.

Data parsers plugins

Data parsers are responsible for processing the files considering their type: JSON, XML, or SOAP. These plugins let you select a subtree within the file hierarchy that contains the elements to be imported. Each record might contain more data than what you need for the migration. So, you make a second selection to manually indicate which elements will be made available to the migration. Migrate plus provides four data parses, but only two use the data fetcher plugins. Here is a summary:

  • json can use any of the data fetchers. Offers an extra configuration option called include_raw_data. When set to true, in addition to all the fields manually defined, a new one is attached to the source with the name raw. This contains a copy of the full object currently being processed.
  • simple_xml can use any data fetcher. It uses the SimpleXML class.
  • xml does not use any of the data fetchers. It uses the XMLReader class to directly fetch the file. Therefore, it is not possible to set HTTP headers or authentication.
  • xml does not use any data fetcher. It uses the SoapClient class to directly fetch the file. Therefore, it is not possible to set HTTP headers or authentication.

The difference between xml and simple_xml were presented in the previous article.

Authentication plugins

These plugins add authentication headers to the request. If correct, you could fetch data from protected resources. They work exclusively with the http data fetcher. Therefore, you can use them only with json and simple_xml data parsers. To do that, you set an authentication configuration whose value can be one of the following:

  • basic for HTTP Basic authentication.
  • digest for HTTP Digest authentication.
  • oauth2 for OAuth2 authentication over HTTP.

Below are examples for JSON and XML imports with HTTP headers and authentication configured. The code snippets do not contain real migrations. You can also find them in the ud_migrations_http_headers_authentication directory of the demo repository https://github.com/dinarcon/ud_migrations.

Important: The examples are shown for reference only. Do not store any sensitive data in plain text or commit it to the repository.

JSON and XML Drupal migrations with HTTP request headers and Basic authentication.

source:
  plugin: url
  data_fetcher_plugin: http
  # Choose one data parser.
  data_parser_plugin: json|simple_xml

  urls:
    - https://understanddrupal.com/files/data.json

  item_selector: /data/udm_root

  # This configuration is provided by the http data fetcher plugin.
  # Do not disclose any sensitive information in the headers.
  headers:
    Accept-Encoding: 'gzip, deflate, br'
    Accept-Language: 'en-US,en;q=0.5'
    Custom-Key: 'understand'
    Arbitrary-Header: 'drupal'

  # This configuration is provided by the basic authentication plugin.
  # Credentials should never be saved in plain text nor committed to the repo.
  autorization:
    plugin: basic
    username: totally
    password: insecure

  fields:
    - name: src_unique_id
      label: 'Unique ID'
      selector: unique_id
    - name: src_title
      label: 'Title'
      selector: title
  ids:
    src_unique_id:
      type: integer
process:
  title: src_title
destination:
  plugin: 'entity:node'
  default_bundle: page

JSON and XML Drupal migrations with HTTP request headers and Digest authentication.

source:
  plugin: url
  data_fetcher_plugin: http
  # Choose one data parser.
  data_parser_plugin: json|simple_xml

  urls:
    - https://understanddrupal.com/files/data.json

  item_selector: /data/udm_root

  # This configuration is provided by the http data fetcher plugin.
  # Do not disclose any sensitive information in the headers.
  headers:
    Accept: 'application/json; charset=utf-8'
    Accept-Encoding: 'gzip, deflate, br'
    Accept-Language: 'en-US,en;q=0.5'
    Custom-Key: 'understand'
    Arbitrary-Header: 'drupal'

  # This configuration is provided by the digest authentication plugin.
  # Credentials should never be saved in plain text nor committed to the repo.
  autorization:
    plugin: digest
    username: totally
    password: insecure

  fields:
    - name: src_unique_id
      label: 'Unique ID'
      selector: unique_id
    - name: src_title
      label: 'Title'
      selector: title
  ids:
    src_unique_id:
      type: integer
process:
  title: src_title
destination:
  plugin: 'entity:node'
  default_bundle: page

JSON and XML Drupal migrations with HTTP request headers and OAuth2 authentication.

source:
  plugin: url
  data_fetcher_plugin: http
  # Choose one data parser.
  data_parser_plugin: json|simple_xml

  urls:
    - https://understanddrupal.com/files/data.json

  item_selector: /data/udm_root

  # This configuration is provided by the http data fetcher plugin.
  # Do not disclose any sensitive information in the headers.
  headers:
    Accept: 'application/json; charset=utf-8'
    Accept-Encoding: 'gzip, deflate, br'
    Accept-Language: 'en-US,en;q=0.5'
    Custom-Key: 'understand'
    Arbitrary-Header: 'drupal'

  # This configuration is provided by the oauth2 authentication plugin.
  # Credentials should never be saved in plain text nor committed to the repo.
  autorization:
    plugin: oauth2
    grant_type: client_credentials
    base_uri: https://understanddrupal.com
    token_url: /oauth2/token
    client_id: some_client_id
    client_secret: totally_insecure_secret

  fields:
    - name: src_unique_id
      label: 'Unique ID'
      selector: unique_id
    - name: src_title
      label: 'Title'
      selector: title
  ids:
    src_unique_id:
      type: integer
process:
  title: src_title
destination:
  plugin: 'entity:node'
  default_bundle: page

What did you learn in today’s blog post? Did you know the configuration names for adding HTTP request headers and authentication to your JSON and XML requests? Did you know that this was limited to the parsers that make use of the http fetcher? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 18 2019
Aug 18

Today we will learn how to migrate content from a JSON file into Drupal using the Migrate Plus module. We will show how to configure the migration to read files from the local file system and remote locations. The example includes node, images, and paragraphs migrations. Let’s get started.

Example configuration of JSON source migration

Note: Migrate Plus has many more features. For example, it contains source plugins to import from XML files and SOAP endpoints. It provides many useful process plugins for DOM manipulation, string replacement, transliteration, etc. The module also lets you define migration plugins as configurations and create groups to share settings. It offers a custom event to modify the source data before processing begins. In today’s blog post, we are focusing on importing JSON files. Other features will be covered in future entries.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD JSON source migration whose machine name is ud_migrations_json_source. It comes with four migrations: udm_json_source_paragraph, udm_json_source_image, udm_json_source_node_local, and udm_json_source_node_remote.

You can get the Migrate Plus module using composer: composer require 'drupal/migrate_plus:^5.0'. This will install the 8.x-5.x branch where new development will happen. This branch was created to introduce breaking changes in preparation for Drupal 9. As of this writing, the 8.x-4.x branch has feature parity with the newer branch. If your Drupal site is not composer-based, you can download the module manually.

Understanding the example set up

This migration will reuse the same configuration from the introduction to paragraph migrations example. Refer to that article for details on the configuration: the destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain JSON migrations. The end result will again be nodes containing an image and a paragraph with information about someone’s favorite book. The major difference is that we are going to read from JSON. In fact, three of the migrations will read from the same file. The following snippet shows a reduced version of the file to get a sense of its structure:

{
  "data": {
    "udm_people": [
      {
        "unique_id": 1,
        "name": "Michele Metts",
        "photo_file": "P01",
        "book_ref": "B10"
      },
      {...},
      {...}
    ],
    "udm_book_paragraph": [
      {
        "book_id": "B10",
        "book_details": {
          "title": "The definite guide to Drupal 7",
          "author": "Benjamin Melançon et al."
        }
      },
      {...},
      {...}
    ],
    "udm_photos": [
      {
        "photo_id": "P01",
        "photo_url": "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg",
        "photo_dimensions": [240, 351]
      },
      {...},
      {...}
    ]
  }
}

Note: You can literally swap migration sources without changing any other part of the migration.  This is a powerful feature of ETL frameworks like Drupal’s Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some machine names had to be changed to avoid conflicts with other examples in the demo repository.

Migrating nodes from a JSON file

In any migration project, understanding the source is very important. For JSON migrations, there are two major considerations. First, where in the file hierarchy lies the data that you want to import. It can be at the root of the file or several levels deep in the hierarchy. Second, when you get to the array of records that you want to import, what fields are going to be made available to the migration. It is possible that each record contains more data than needed. For improved performance, it is recommended to manually include only the fields that will be required for the migration. The following code snippet shows part of the local JSON file relevant to the node migration:

{
  "data": {
    "udm_people": [
      {
        "unique_id": 1,
        "name": "Michele Metts",
        "photo_file": "P01",
        "book_ref": "B10"
      },
      {...},
      {...}
    ]
  }
}

The array of records containing node data lies two levels deep in the hierarchy. Starting with data at the root and then descending one level to udm_people. Each element of this array is an object with four properties:

  • unique_id is the unique identifier for each record within the data/udm_people hierarchy.
  • name is the name of a person. This will be used in the node title.
  • photo_file is the unique identifier of an image that was created in a separate migration.
  • book_ref is the unique identifier of a book paragraph that was created in a separate migration.

The following snippet shows the configuration to read a local JSON file for the node migration:

source:
  plugin: url
  data_fetcher_plugin: file
  data_parser_plugin: json
  urls:
    - modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json
  item_selector: data/udm_people
  fields:
    - name: src_unique_id
      label: 'Unique ID'
      selector: unique_id
    - name: src_name
      label: 'Name'
      selector: name
    - name: src_photo_file
      label: 'Photo ID'
      selector: photo_file
    - name: src_book_ref
      label: 'Book paragraph ID'
      selector: book_ref
  ids:
    src_unique_id:
      type: integer

The name of the plugin is url. Because we are reading a local file, the data_fetcher_plugin  is set to file and the data_parser_plugin to json. The urls configuration contains an array of file paths relative to the Drupal root. In the example, we are reading from one file only, but you can read from multiple files at once. In that case, it is important that they have a homogeneous structure. The settings that follow will apply equally to all the files listed in urls.

The item_selector configuration indicates where in the JSON file lies the array of records to be migrated. Its value is an XPath-like string used to traverse the file hierarchy. In this case, the value is data/udm_people. Note that you separate each level in the hierarchy with a slash (/).

fields has to be set to an array. Each element represents a field that will be made available to the migration. The following options can be set:

  • name is required. This is how the field is going to be referenced in the migration. The name itself can be arbitrary. If it contained spaces, you need to put double quotation marks (") around it when referring to it in the migration.
  • label is optional. This is a description used when presenting details about the migration. For example, in the user interface provided by the Migrate Tools module. When defined, you do not use the label to refer to the field. Keep using the name.
  • selector is required. This is another XPath-like string to find the field to import. The value must be relative to the location specified by the item_selector configuration. In the example, the fields are direct children of the records to migrate. Therefore, only the property name is specified (e.g., unique_id). If you had nested objects or arrays, you would use a slash (/) character to go deeper in the hierarchy. This will be demonstrated in the image and paragraph migrations.

Finally, you specify an ids array of field names that would uniquely identify each record. As already stated, the unique_id field servers that purpose. The following snippet shows part of the process, destination, and dependencies configuration of the node migration:

process:
  field_ud_image/target_id:
    plugin: migration_lookup
    migration: udm_json_source_image
    source: src_photo_file
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - udm_json_source_image
    - udm_json_source_paragraph
  optional: []

The source for the setting the image reference is src_photo_file. Again, this is the name of the field, not the label nor selector. The configuration of the migration lookup plugin and dependencies point to two JSON migrations that come with this example. One is for migrating images and the other for migrating paragraphs.

Migrating paragraphs from a JSON file

Let’s consider an example where the records to migrate have many levels of nesting. The following snippets show part of the local JSON file and source plugin configuration for the paragraph migration:

{
  "data": {
    "udm_book_paragraph": [
      {
        "book_id": "B10",
        "book_details": {
          "title": "The definite guide to Drupal 7",
          "author": "Benjamin Melançon et al."
        }
      },
      {...},
      {...}
    ]
}
source:
  plugin: url
  data_fetcher_plugin: file
  data_parser_plugin: json
  urls:
    - modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json
  item_selector: data/udm_book_paragraph
  fields:
    - name: src_book_id
      label: 'Book ID'
      selector: book_id
    - name: src_book_title
      label: 'Title'
      selector: book_details/title
    - name: src_book_author
      label: 'Author'
      selector: book_details/author
  ids:
    src_book_id:
      type: string

The plugin, data_fetcher_plugin, data_parser_plugin and urls configurations have the same values as in the node migration. The item_selector and ids configurations are slightly different to represent the path to paragraph records and the unique identifier field, respectively.

The interesting part is the value of the fields configuration. Taking data/udm_book_paragraph as a starting point, the records with paragraph data have a nested structure. Notice that book_details is an object with two properties: title and author. To refer to them, the selectors are book_details/title and book_details/author, respectively. Note that you can go as many level deeps in the hierarchy to find the value that should be assigned to the field. Every level in the hierarchy would be separated by a slash (/).

In this example, the target is a single paragraph type. But a similar technique can be used to migrate multiple types. One way to configure the JSON file is to have two properties. paragraph_id would contain the unique identifier for the record. paragraph_data would be an object with a property to set the paragraph type. This would also have an arbitrary number of extra properties with the data to be migrated. In the process section, you would iterate over the records to map the paragraph fields.

The following snippet shows part of the process configuration of the paragraph migration:

process:
  field_ud_book_paragraph_title: src_book_title
  field_ud_book_paragraph_author: src_book_author

Migrating images from a JSON file

Let’s consider an example where the records to migrate have more data than needed. The following snippets show part of the local JSON file and source plugin configuration for the image migration:

{
  "data": {
    "udm_photos": [
      {
        "photo_id": "P01",
        "photo_url": "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg",
        "photo_dimensions": [240, 351]
      },
      {...},
      {...}
    ]
  }
}
source:
  plugin: url
  data_fetcher_plugin: file
  data_parser_plugin: json
  urls:
    - modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json
  item_selector: data/udm_photos
  fields:
    - name: src_photo_id
      label: 'Photo ID'
      selector: photo_id
    - name: src_photo_url
      label: 'Photo URL'
      selector: photo_url
  ids:
    src_photo_id:
      type: string

The plugin, data_fetcher_plugin, data_parser_plugin and urls configurations have the same values as in the node migration. The item_selector and ids configurations are slightly different to represent the path to image records and the unique identifier field, respectively.

The interesting part is the value of the fields configuration. Taking data/udm_photos as a starting point, the records with image data have extra properties that are not used in the migration. Particularly, the photo_dimensions property contains an array with two values representing the width and height of the image, respectively. To ignore this property, you simply omit it from the fields configuration. In case you wanted to use it, the selectors would be photo_dimensions/0 for the width and photo_dimensions/1 for the height. Note that you use a zero-based numerical index to get the values out of arrays. Like with objects, a slash (/) is used to separate each level in the hierarchy. You can go as far as necessary in the hierarchy.

The following snippet shows part of the process configuration of the image migration:

process:
  psf_destination_filename:
    plugin: callback
    callable: basename
    source: src_photo_url

JSON file location

When using the file data fetcher plugin, you have three options to indicate the location to the JSON files in the urls configuration:

  • Use a relative path from the Drupal root. The path should not start with a slash (/). This is the approach used in this demo. For example, modules/custom/my_module/json_files/example.json.
  • Use an absolute path pointing to the CSV location in the file system. The path should start with a slash (/). For example, /var/www/drupal/modules/custom/my_module/json_files/example.json.
  • Use a stream wrapper.

Being able to use stream wrappers gives you many more options. For instance:

  • Files located in the public, private, and temporary file systems managed by Drupal. This leverages functionality already available in Drupal core. For example: public://json_files/example.json.
  • Files located in profiles, modules, and themes. You can use the System stream wrapper module or apply this core patch to get this functionality. For example, module://my_module/json_files/example.json.
  • Files located in remote servers including RSS feeds. You can use the Remote stream wrapper module to get this functionality. For example, https://understanddrupal.com/json-files/example.json.

Migrating remote JSON files

Migrate Plus provides another data fetcher plugin named http. You can use it to fetch files using the http and https protocols. Under the hood, it uses the Guzzle HTTP Client library. In a future blog post we will explain this data fetcher in more detail. For now, the udm_json_source_node_remote migration demonstrates a basic setup for this plugin. Note that only the data_fetcher_plugin and urls configurations are different from the local file example. The following snippet shows part of the configuration to read a remote JSON file for the node migration:

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  urls:
    - https://api.myjson.com/bins/110rcr
  item_selector: data/udm_people
  fields: ...
  ids: ...

And that is how you can use JSON files as the source of your migrations. Many more configurations are possible. For example, you can provide authentication information to get access to protected resources. You can also set custom HTTP headers. Examples will be presented in a future entry.

What did you learn in today’s blog post? Have you migrated from JSON files before? If so, what challenges have you found? Did you know that you can read local and remote files? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services.  Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 17 2019
Aug 17

Today we will learn how to migrate content from a Comma-Separated Values (CSV) file into Drupal. We are going to use the latest version of the Migrate Source CSV module which depends on the third-party library league/csv. We will show how configure the source plugin to read files with or without a header row. We will also talk about a new feature that allows you to use stream wrappers to set the file location. Let’s get started.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD CSV source migration whose machine name is ud_migrations_csv_source. It comes with three migrations: udm_csv_source_paragraph, udm_csv_source_image, and udm_csv_source_node.

You can get the Migrate Source CSV module is using composer: composer require drupal/migrate_source_csv. This will also download its dependency: the league/csv library. The example assumes you are using 8.x-3.x branch of the module, which requires composer to be installed. If your Drupal site is not composer-based, you can use the 8.x-2.x branch. Continue reading to learn the difference between the two branches.

Understanding the example set up

This migration will reuse the same configuration from the introduction to paragraph migrations example. Refer to that article for details on the configuration: the destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain JSON migration. The end result will again be nodes containing an image and a paragraph with information about someone’s favorite book. The major difference is that we are going to read from JSON.

Note that you can literally swap migration sources without changing any other part of the migration. This is a powerful feature of ETL frameworks like Drupal’s Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some machine names had to be changed to avoid conflicts with other examples in the demo repository.

Migrating CSV files with a header row

In any migration project, understanding the source is very important. For CSV migrations, the primary thing to consider is whether or not the file contains a row of headers. Other things to consider are what characters to use as delimiter, enclosure, and escape character. For now, let’s consider the following CSV file whose first row serves as column headers:

unique_id,name,photo_file,book_ref
1,Michele Metts,P01,B10
2,Benjamin Melançon,P02,B20
3,Stefan Freudenberg,P03,B30

This file will be used in the node migration. The four columns are used as follows:

  • unique_id is the unique identifier for each record in this CSV file.
  • name is the name of a person. This will be used as the node title.
  • photo_file is the unique identifier of an image that was created in a separate migration.
  • book_ref is the unique identifier of a book paragraph that was created in a separate migration.

The following snippet shows the configuration of the CSV source plugin for the node migration:

source:
  plugin: csv
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_people.csv
ids: [unique_id]

The name of the plugin is csv. Then you define the path pointing to the file itself. In this case, the path is relative to the Drupal root. Finally, you specify an ids array of columns names that would uniquely identify each record. As already stated, the unique_id column servers that purpose. Note that there is no need to specify all the columns names from the CSV file. The plugin will automatically make them available. That is the simplest configuration of the CSV source plugin.

The following snippet shows part of the process, destination, and dependencies configuration of the node migration:

process:
  field_ud_image/target_id:
    plugin: migration_lookup
    migration: udm_csv_source_image
    source: photo_file
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - udm_csv_source_image
    - udm_csv_source_paragraph
optional: []

Note that the source for the setting the image reference is photo_file. In the process pipeline you can directly use any column name that exists in the CSV file. The configuration of the migration lookup plugin and dependencies point to two CSV migrations that come with this example. One is for migrating images and the other for migrating paragraphs.

Migrating CSV files without a header row

Now let’s consider two examples of CSV files that do not have a header row. The following snippets show the example CSV file and source plugin configuration for the paragraph migration:

B10,The definite guide to Drupal 7,Benjamin Melançon et al.
B20,Understanding Drupal Views,Carlos Dinarte
B30,Understanding Drupal Migrations,Mauricio Dinarte

source:
  plugin: csv
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_book_paragraph.csv
  ids: [book_id]
  header_offset: null
  fields:
    - name: book_id
    - name: book_title
- name: 'Book author'

When you do not have a header row, you need to specify two more configuration options. header_offset has to be set to null. fields has to be set to an array where each element represents a column in the CSV file. You include a name for each column following the order in which they appear in the file. The name itself can be arbitrary. If it contained spaces, you need to put quotes (') around it. After that, you set the ids configuration to one or more columns using the names you defined.

In the process section you refer to source columns as usual. You write their name adding quotes if it contained spaces. The following snippet shows how the process section is configured for the paragraph migration:

process:
  field_ud_book_paragraph_title: book_title
field_ud_book_paragraph_author: 'Book author'

The final example will show a slight variation of the previous configuration. The following two snippets show the example CSV file and source plugin configuration for the image migration:

P01,https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg
P02,https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg
P03,https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg

source:
  plugin: csv
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv
  ids: [photo_id]
  header_offset: null
  fields:
    - name: photo_id
      label: 'Photo ID'
    - name: photo_url
label: 'Photo URL'

For each column defined in the fields configuration, you can optionally set a label. This is a description used when presenting details about the migration. For example, in the user interface provided by the Migrate Tools module. When defined, you do not use the label to refer to source columns. You keep using the column name. You can see this in the value of the ids configuration.

The following snippet shows part of the process configuration of the image migration:

process:
  psf_destination_filename:
    plugin: callback
    callable: basename
source: photo_url

CSV file location

When setting the path configuration you have three options to indicate the CSV file location:

  • Use a relative path from the Drupal root. The path should not start with a slash (/). This is the approach used in this demo. For example, modules/custom/my_module/csv_files/example.csv.
  • Use an absolute path pointing to the CSV location in the file system. The path should start with a slash (/). For example, /var/www/drupal/modules/custom/my_module/csv_files/example.csv.
  • Use a stream wrapper. This feature was introduced in the 8.x-3.x branch of the module. Previous versions cannot make use of them.

Being able to use stream wrappers gives you many options for setting the location to the CSV file. For instance:

  • Files located in the public, private, and temporary file systems managed by Drupal. This leverages functionality already available in Drupal core. For example: public://csv_files/example.csv.
  • Files located in profiles, modules, and themes. You can use the System stream wrapper module or apply this core patch to get this functionality. For example, module://my_module/csv_files/example.csv.
  • Files located in remote servers including RSS feeds. You can use the Remote stream wrapper module to get this functionality. For example, https://understanddrupal.com/csv-files/example.csv.

CSV source plugin configuration

The configuration options for the CSV source plugin are very well documented in the source code. They are included here for quick reference:

  • path is required. It contains the path to the CSV file. Starting with the 8.x-3.x branch, stream wrappers are supported.
  • ids is required. It contains an array of column names that uniquely identify each record.
  • header_offset is optional. The index of record to be used as the CSV header and the thereby each record's field name. It defaults to zero (0) because the index is zero-based. For CSV files with no header row the value should be set to null.
  • fields is optional. It contains a nested array of names and labels to use instead of a header row. If set, it will overwrite the column names obtained from header_offset.
  • delimiter is optional. It contains one character column delimiter. It defaults to a comma (,). For example, if your file uses tabs as delimiter, you set this configuration to \t.
  • enclosure is optional. It contains one character used to enclose the column values. Defaults to double quotation marks (").
  • escape is optional. It contains one character used for character escaping in the column values. It defaults to a backslash (****).

Important: The configuration options changed significantly between the 8.x-3.x and 8.x-2.x branches. Refer to this change record for a reference of how to configure the plugin for the 8.x-2.x.

And that is how you can use CSV files as the source of your migrations. Because this is such a common need, it was considered to move the CSV source plugin to Drupal core. The effort is currently on hold and it is unclear if it will materialize during Drupal 8’s lifecycle. The maintainers of the Migrate API are focusing their efforts on other priorities at the moment. You can read this issue to learn about the motivation and context for offering functionality in Drupal core.

Note: The Migrate Spreadsheet module can also be used to migrate data from CSV files. It also supports Microsoft Office Excel and LibreOffice Calc (OpenDocument) files. The module leverages the PhpOffice/PhpSpreadsheet library.

What did you learn in today’s blog post? Have you migrated from CSV files before? Did you know that it is now possible to read files using stream wrappers? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 15 2019
Aug 15

Today we will present an introduction to paragraphs migrations in Drupal. The example consists of migrating paragraphs of one type, then connecting the migrated paragraphs to nodes. A separate image migration is included to demonstrate how they are different. At the end, we will talk about behavior that deletes paragraphs when the host entity is deleted. Let’s get started.

Example mapping for paragraph reference field.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD paragraphs migration introduction whose machine name is ud_migrations_paragraph_intro. It comes with three migrations: ud_migrations_paragraph_intro_paragraph, ud_migrations_paragraph_intro_image, and ud_migrations_paragraph_intro_node. One content type, one paragraph type, and four fields will be created when the module is installed.

Note: Configuration placed in a module’s config/install directory will be copied to Drupal’s active configuration. And if those files have a dependencies/enforced/module key, the configuration will be removed when the listed modules are uninstalled. That is how the content type, the paragraph type, and the fields are automatically created and deleted.

You can get the Paragraph module is using composer: composer require drupal/paragraphs. This will also download its dependency: the Entity Reference Revisions module. If your Drupal site is not composer-based, you can get the code for both modules manually.

Understanding the example set up

The example code creates one paragraph type named UD book paragraph (ud_book_paragraph). It has two “Text (plain)” fields: Title (field_ud_book_paragraph_title) and Author (field_ud_book_paragraph_author). A new UD Paragraphs (ud_paragraphs) content type is also created. This has two fields: Image (field_ud_image) and Favorite book (field_ud_favorite_book) containing references to images and book paragraphs imported in separate migrations. The words in parenthesis represent the machine names of the different elements.

The paragraph migration

Migrating into a paragraph type is very similar to migrating into a content type. You specify the source, process the fields making any required transformation, and set the destination entity and bundle. The following code snippet shows the source, process, and destination sections:

source:
  plugin: embedded_data
  data_rows:
    - book_id: 'B10'
      book_title: 'The definite guide to Drupal 7'
      book_author: 'Benjamin Melançon et al.'
    - book_id: 'B20'
      book_title: 'Understanding Drupal Views'
      book_author: 'Carlos Dinarte'
    - book_id: 'B30'
      book_title: 'Understanding Drupal Migrations'
      book_author: 'Mauricio Dinarte'
  ids:
    book_id:
      type: string
process:
  field_ud_book_paragraph_title: book_title
  field_ud_book_paragraph_author: book_author
destination:
  plugin: 'entity_reference_revisions:paragraph'
  default_bundle: ud_book_paragraph

The most important part of a paragraph migration is setting the destination plugin to entity_reference_revisions:paragraph. This plugin is actually provided by the Entity Reference Revisions module. It is very important to note that paragraphs entities are revisioned. This means that when you want to create a reference to them, you need to provide two IDs: target_id and target_revision_id. Regular entity reference fields like files, images, and taxonomy terms only require the target_id. This will be further explained with the node migration.

The other configuration that you can optionally set in the destination section is default_bundle. The value will be the machine name of the paragraph type you are migrating into. You can do this when all the paragraphs for a particular migration definition file will be of the same type. If that is not the case, you can leave out the default_bundle configuration and add a mapping for the type entity property in the process section.

You can execute the paragraph migration with this command: drush migrate:import
ud_migrations_paragraph_intro_paragraph
. After running the migration, there is not much you can do to verify that it worked. Contrary to other entities, there is no user interface, available out of the box, that lists all paragraphs in the system. One way to verify if the migration worked is to manually create a View that shows paragraphs. Another way is to query the database directly. You can inspect the tables that store the paragraph fields’ data. In this example, the tables would be:

  • paragraph__field_ud_book_paragraph_author for the current author.
  • paragraph__field_ud_book_paragraph_title for the current title.
  • paragraph_r__8c3a9563ac for all the author revisions.
  • paragraph_r__3fa7e9863a for all the title revisions.

Each of those tables contains information about the bundle (paragraph type), the entity id, the revision id, and the migrated field value. Table names are derived from the machine names of the fields. If they are too long, the field name will be hashed to produce a shorter table name. Having to query the database is not ideal. Unfortunately, the options available to check if a paragraph migration worked are limited at the moment.

The node migration

The node migration will serve as the host for both referenced entities: images and paragraphs. The image migration is very similar to the one explained in a previous article. This time, the focus will be the paragraph migration. Both of them are set as dependencies of the node migration, so they need to be executed in advance. The following snippet shows how the source, destinations, and dependencies are set:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      name: 'Michele Metts'
      photo_file: 'P01'
      book_ref: 'B10'
    - unique_id: 2
      name: 'Benjamin Melançon'
      photo_file: 'P02'
      book_ref: 'B20'
    - unique_id: 3
      name: 'Stefan Freudenberg'
      photo_file: 'P03'
      book_ref: 'B30'
  ids:
    unique_id:
      type: integer
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - ud_migrations_paragraph_intro_image
    - ud_migrations_paragraph_intro_paragraph
  optional: []

Note that photo_file and book_ref both contain the unique identifier of records in the image and paragraph migrations, respectively. These can be used with the migration_lookup plugin to map the reference fields in the nodes to be migrated. ud_paragraphs is the machine name of the target content type.

The mapping of the image reference field follows the same pattern than the one explained in the article on migration dependencies. Using the migration_lookup plugin, you indicate which is the migration that should be searched for the images. You also specify which source column contains the unique identifiers that match those in the image migration. This operation will return a single value: the file ID (fid) of the image. This value can be assigned to the target_id subfield of field_ud_image to establish the relationship. The following code snippet shows how to do it:

field_ud_image/target_id:
  plugin: migration_lookup
  migration: ud_migrations_paragraph_intro_image
  source: photo_file

Paragraph field mappings

Before diving into the paragraph field mapping, let’s think about what needs to be done. Paragraphs are revisioned entities. To make a reference to them, you need two IDs: their entity id and their entity revision id. These two values need to be assigned to two subfields of the paragraph reference field: target_id and target_revision_id respectively. You have to come up with a process pipeline that complies with this requirement. There are many ways to do it, and the specifics will depend on your field configuration. In this example, the paragraph reference field allows an unlimited number of paragraphs to be associated, but only of one type: ud_book_paragraph. Another thing to note is that even though the field allows you to add as many paragraphs as you want, the example migrates exactly one paragraph.

With those considerations in mind, the mapping of the paragraph field will be a two step process. First, use the migration_lookup plugin to get a reference to the paragraph. Second, use the fetched values to set the paragraph reference subfields. The following code snippet shows how to do it:

pseudo_mbe_book_paragraph:
  plugin: migration_lookup
  migration: ud_migrations_paragraph_intro_paragraph
  source: book_ref
field_ud_favorite_book:
  plugin: sub_process
  source:
    - '@pseudo_mbe_book_paragraph'
  process:
    target_id: '0'
    target_revision_id: '1'

The first step is a normal migration_lookup procedure. The important difference is that instead of getting a single value, like with images, the paragraph lookup operation will return an array of two values. The format is like [3, 7] where the 3 represents the entity id and the 7 represents the entity revision id of the paragraph. Note that the array keys are not named. To access those values, you would use the index of the elements starting with zero (0). This will be important later. The returned array is stored in the pseudo_mbe_book_paragraph pseudofield.

The second step is to set the target_id and target_revision_id subfields. In this example, field_ud_favorite_book is the machine name paragraph reference field. Remember that it is configured to accept an arbitrary number of paragraphs, and each will require passing an array of two elements. This means you need to process an array of arrays. To do that, you use the sub_process plugin to iterate over an array of paragraph references. In this example, the structure to iterate over would be like this:

[
  [3, 7]
]

Let’s dissect how to do the mapping of the paragraph reference field. The source configuration of the sub_process plugin contains an array of paragraph references. In the example, that array has a single element: the '@pseudo_mbe_book_paragraph' pseudofield. The quotes (') and at sign (@) are required to reuse an element that appears before in the process pipeline. Then, in the process configuration, you set the subfields for the paragraph reference field. It is worth noting that at this point you are iterating over a list of paragraph references, even if that list contains only one element. If you had more than one paragraph to migrate, whatever you defined in process will apply to all of them.

The process configuration is an array of subfield mappings. The left side of the assignment is the name of the subfield you want to set. The right side of the assignment is an array index of the paragraph reference being processed. Remember that this array does not have named-keys, so you use their numerical index to refer to them. The example sets the target_id subfield to the element in the 0 index and the target_revision_id subfield to the element in the one 1 index. Using the example data, this would be target_id: 3 and target_revision_id: 7. The quotes around the numerical indexes are important. If not used, the migration will not find the indexes and the paragraphs will not be associated. The end result of this operation will be something like this:

'field_ud_favorite_book' => array (1) [
  array (2) [
    'target_id' => string (1) "3"
    'target_revision_id' => string (1) "7"
  ]
]

There are three ways to run the migrations: manually, executing dependencies, and using tags. The following code snippet shows the three options:

# 1) Manually.
$ drush migrate:import ud_migrations_paragraph_intro_image
$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node

# 2) Executing depenpencies.
$ drush migrate:import ud_migrations_paragraph_intro_node --execute-dependencies

# 3) Using tags.
$ drush migrate:import --tag='UD Paragraphs Intro'

And that is one way to map paragraph reference fields. In the end, all you have to do is set the target_id and target_revision_id subfields. The process pipeline that gets you to that point can vary depending on how your paragraphs are configured. The following is a non-exhaustive list of things to consider when migrating paragraphs:

  • How many paragraphs types can be referenced?
  • How many paragraphs instances are being migrated? Is this a multivalue field?
  • Do paragraphs have translations?
  • Do paragraphs have revisions?

Do migrated paragraphs disappear upon node rollback?

Paragraphs migrations are affected by a particular behavior of revisioned entities. If the host entity is deleted, and the paragraphs do not have translations, the whole paragraph gets deleted. That means that deleting a node will make the referenced paragraphs’ data to be removed. How does this affect your migration workflow? If the migration of the host entity is rollback, then the paragraphs will be removed, the migrate API will not know about it. In this example, if you run a migrate status command after rolling back the node migration, you will see that the paragraph migration indicated that there are no pending elements to process. The file migration for the images will report the same, but in that case, the images will remain on the system.

In any migration project, it is common that you do rollback operations to test new field mappings or fix errors. Thus, chances are very high that you will stumble upon this behavior. Thanks to Damien McKenna for helping me understand this behavior and tracking it to the rollback() method of the EntityReferenceRevisions destination plugin. So, what do you do to recover the deleted paragraphs? You have to rollback both migrations: node and paragraph. And then, you have to import the two again. The following snippet shows how to do it:

# 1) Rollback both migrations.
$ drush migrate:rollback ud_migrations_paragraph_intro_node
$ drush migrate:rollback ud_migrations_paragraph_intro_paragraph

# 2) Import both migrations againg.

$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node

What did you learn in today’s blog post? Have you migrated paragraphs before? If so, what challenges have you found? Did you know paragraph reference fields require two subfields to be set? Did you that deleting the host entity also deletes referenced paragraphs? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 14 2019
Aug 14

Today we will learn how to migrate addresses into Drupal. We are going to use the field provided by the Address module which depends on the third-party library commerceguys/addressing. When migrating addresses you need to be careful with the data that Drupal expects. The address components can change per country. The way to store those components also varies per country. These and other important consideration will be explained. Let’s get started.

Example address field process mapping.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD address whose machine name is ud_migrations_address. The migration to execute is udm_address. Notice that this migration writes to a content type called UD Address and one field: field_ud_address. This content type and field will be created when the module is installed. They will also be removed when the module is uninstalled. The demo module itself depends on the following modules: address and migrate.

Note: Configuration placed in a module’s config/install directory will be copied to Drupal’s active configuration. And if those files have a dependencies/enforced/module key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created and deleted.

The recommended way to install the Address module is using composer: composer require drupal/address. This will grab the Drupal module and the commerceguys/addressing library that it depends on. If your Drupal site is not composer-based, an alternative is to use the Ludwig module. Read this article if you want to learn more about this option. In the example, it is assumed that the module and its dependency were obtained via composer. Also, keep an eye on the Composer Support in Core Initiative as they make progress.

Source and destination sections

The example will migrate three addresses from the following countries: Nicaragua, Germany, and the United States of America (USA). This makes it possible to show how different countries expect different address data. As usual, for any migration you need to understand the source. The following code snippet shows how the source and destination sections are configured:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      first_name: 'Michele'
      last_name: 'Metts'
      company: 'Agaric LLC'
      city: 'Boston'
      state: 'MA'
      zip: '02111'
      country: 'US'
    - unique_id: 2
      first_name: 'Stefan'
      last_name: 'Freudenberg'
      company: 'Agaric GmbH'
      city: 'Hamburg'
      state: ''
      zip: '21073'
      country: 'DE'
    - unique_id: 3
      first_name: 'Benjamin'
      last_name: 'Melançon'
      company: 'Agaric SA'
      city: 'Managua'
      state: 'Managua'
      zip: ''
      country: 'NI'
  ids:
    unique_id:
      type: integer
destination:
  plugin: 'entity:node'
  default_bundle: ud_address

Note that not every address component is set for all addresses. For example, the Nicaraguan address does not contain a ZIP code. And the German address does not contain a state. Also, the Nicaraguan state is fully spelled out: Managua. On the contrary, the USA state is a two letter abbreviation: MA for Massachusetts. One more thing that might not be apparent is that the USA ZIP code belongs to the state of Massachusetts. All of this is important because the module does validation of addresses. The destination is the custom ud_address content type created by the module.

Available subfields

The Address field has 13 subfields available. They can be found in the schema() method of the AddresItem class. Fields are not required to have a one-to-one mapping between their schema and the form widgets used for entering content. This is particularly true for addresses because input elements, labels, and validations change dynamically based on the selected country. The following is a reference list of all subfields for addresses:

  1. langcode for language code.
  2. country_code for country.
  3. administrative_area for administrative area (e.g., state or province).
  4. locality for locality (e.g. city).
  5. dependent_locality for dependent locality (e.g. neighbourhood).
  6. postal_code for postal or ZIP code.
  7. sorting_code for sorting code.
  8. address_line1 for address line 1.
  9. address_line2 for address line 2.
  10. organization for company.
  11. given_name for first name.
  12. additional_name for middle name.
  13. family_name for last name:

Properly describing an address is not trivial. For example, there are discussions to add a third address line component. Check this issue if you need this functionality or would like to participate in the discussion.

Address subfield mappings

In the example, only 9 out of the 13 subfields will be mapped. The following code snippet shows how to do the processing of the address field:

field_ud_address/given_name: first_name
field_ud_address/family_name: last_name
field_ud_address/organization: company
field_ud_address/address_line1:
  plugin: default_value
  default_value: 'It is a secret ;)'
field_ud_address/address_line2:
  plugin: default_value
  default_value: 'Do not tell anyone :)'
field_ud_address/locality: city
field_ud_address/administrative_area: state
field_ud_address/postal_code: zip
field_ud_address/country_code: country

The mapping is relatively simple. You specify a value for each subfield. The tricky part is to know the name of the subfield and the value to store in it. The format for an address component can change among countries. The easiest way to see what components are expected for each country is to create a node for a content type that has an address field. With this example, you can go to /node/add/ud_address and try it yourself. For simplicity sake, let’s consider only 3 countries:

  • For USA, city, state, and ZIP code are all required. And for state, you have a specific list form which you need to select from.
  • For Germany, the company is moved above first and last name. The ZIP code label changes to Postal code and it is required. The city is also required. It is not possible to set a state.
  • For Nicaragua, the Postal code is optional. The State label changes to Department. It is required and offers a predefined list to choose from. The city is also required.

Pay very close attention. The available subfields will depend on the country. Also, the form labels change per country or language settings. They do not necessarily match the subfield names. Moreover, the values that you see on the screen might not match what is stored in the database. For example, a Nicaraguan address will store the full department name like Managua. On the other hand, a USA address will only store a two-letter code for the state like MA for Massachusetts.

Something else that is not apparent even from the user interface is data validation. For example, let’s consider that you have a USA address and select Massachusetts as the state. Entering the ZIP code 55111 will produce the following error: Zip code field is not in the right format. At first glance, the format is correct, a five-digits code. The real problem is that the Address module is validating if that ZIP code is valid for the selected state. It is not valid for Massachusetts. 55111 is a ZIP code for the state of Minnesota which makes the validation fail. Unfortunately, the error message does not indicate that. Nine-digits ZIP codes are accepted as long as they belong to the state that is selected.

Finding expected values

Values for the same subfield can vary per country. How can you find out which value to use? There are a few ways, but they all require varying levels of technical knowledge or access to resources:

  • You can inspect the source code of the address field widget. When the country and state components are rendered as select input fields (dropdowns), you can have a look at the value attribute for the option that you want to select. This will contain the two-letter code for countries, the two-letter abbreviations for USA states, and the fully spelled string for Nicaraguan departments.
  • You can use the Devel module. Create a node containing an address. Then use the devel tab of the node to inspect how the values are stored. It is not recommended to have the devel module in a production site. In fact, do not deploy the code even if the module is not enabled. This approach should only be used in a local development environment. Make sure no module or configuration is committed to the repo nor deployed.
  • You can inspect the database. Look for the records in a table named node__field_[field_machine_name], if migrating nodes. First create some example nodes via the user interface and then query the table. You will see how Drupal stores the values in the database.

If you know a better way, please share it in the comments.

The commerceguys addressing library

With version 8 came many changes in the way Drupal is developed. Now there is an intentional effort to integrate with the greater PHP ecosystem. This involves using already existing libraries and frameworks, like Symfony. But also, making code written for Drupal available as external libraries that could be used by other projects. commerceguys\addressing is one example of a library that was made available as an external library. That being said, the Address module also makes use of it.

Explaining how the library works or where its fetches its database is beyond the scope of this article. Refer to the library documentation for more details on the topic. We are only going to point out some things that are relevant for the migration. For example, the ZIP code validation happens at the validatePostalCode() method of the AddressFormatConstraintValidator class. There is no need to know this for a migration project. But the key thing to remember is that the migration can be affected by third-party libraries outside of Drupal core or contributed modules. Another example, is the value for the state subfield. Address module expects a subdivision as listed in one of the files in the resources/subdivision directory.

Does the validation really affect the migration? We have already mentioned that the Migrate API bypasses Form API validations. And that is true for address fields as well. You can migrate a USA address with state Florida and ZIP code 55111. Both are invalid because you need to use the two-letter state code FL and use a valid ZIP code within the state. Notwithstanding, the migration will not fail in this case. In fact, if you visit the migrated node you will see that Drupal happily shows the address with the data that you entered. The problems arrives when you need to use the address. If you try to edit the node you will see that the state will not be preselected. And if you try to save the node after selecting Florida you will get the validation error for the ZIP code.

This validation issues might be hard to track because no error will be thrown by the migration. The recommendation is to migrate a sample combination of countries and address components. Then, manually check if editing a node shows the migrated data for all the subfields. Also check that the address passes Form API validations upon saving. This manual testing can save you a lot of time and money down the road. After all, if you have an ecommerce site, you do not want to be shipping your products to wrong or invalid addresses. ;-)

Technical note: The commerceguys/addressing library actually follows ISO standards. Particularly, ISO 3166 for country and state codes. It also uses CLDR and Google's address data. The dataset is stored as part of the library’s code in JSON format.

Migrating countries and zone fields

The Address module offer two more fields types: Country and Zone. Both have only one subfield value which is selected by default. For country, you store the two-letter country code. For zone, you store a serialized version of a Zone object.

What did you learn in today’s blog post? Have you migrated address before? Did you know the full list of subcomponents available? Did you know that data expectations change per country? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 13 2019
Aug 13

Today we will learn how to migrate dates into Drupal. Depending on your field type and configuration, there are various possible combinations. You can store a single date or a date range. You can store only the date component or also include the time. You might have timezones to take into account. Importing the node creation date requires a slightly different configuration. In addition to the examples, a list of things to consider when migrating dates is also presented.

Example syntax for date migrations.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD date whose machine name is ud_migrations_date. The migration to execute is udm_date. Notice that this migration writes to a content type called UD Date and to three fields: field_ud_date, field_ud_date_range, and field_ud_datetime. This content type and fields will be created when the module is installed. They will also be removed when the module is uninstalled. The module itself depends on the following modules provided by Drupal core: datetime, datetime_range, and migrate.

Note: Configuration placed in a module’s config/install directory will be copied to Drupal’s active configuration. And if those files have a dependencies/enforced/module key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created.

PHP date format characters

To migrate dates, you need to be familiar with the format characters of the date PHP function. Basically, you need to find a pattern that matches the date format you need to migrate to and from. For example, January 1, 2019 is described by the F j, Y pattern.

As mentioned in the previous post, you need to pay close attention to how you create the pattern. Upper and lowercase letters represent different things like Y and y for the year with four-digits versus two-digits, respectively. Some date components have subtle variations like d and j for the day with or without leading zeros respectively. Also, take into account white spaces and date component separators. If you need to include a literal letter like T it has to be escaped with \T. If the pattern is wrong, an error will be raised, and the migration will fail.

Date format conversions

For date conversions, you use the format_date plugin. You specify a from_format based on your source and a to_format based on what Drupal expects. In both cases, you will use the PHP date function's format characters to assemble the required patterns. Optionally, you can define the from_timezone and to_timezone configurations if conversions are needed. Just like any other migration, you need to understand your source format. The following code snippet shows the source and destination sections:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      node_title: 'Date example 1'
      node_creation_date: 'January 1, 2019 19:15:30'
      src_date: '2019/12/1'
      src_date_end: '2019/12/31'
      src_datetime: '2019/12/24 19:15:30'
destination:
  plugin: 'entity:node'
  default_bundle: ud_date

Node creation time migration

The node creation time is migrated using the created entity property. The source column that contains the data is node_creation_date. An example value is January 1, 2019 19:15:30. Drupal expects a UNIX timestamp like 1546370130. The following snippet shows how to do the transformation:

created:
  plugin: format_date
  source: node_creation_date
  from_format: 'F j, Y H:i:s'
  to_format: 'U'
  from_timezone: 'UTC'
  to_timezone: 'UTC'

Following the documentation, F j, Y H:i:s is the from_format and U is the to_format. In the example, it is assumed that the source is provided in UTC. UNIX timestamps are expressed in UTC as well. Therefore, the from_timezone and to_timezone are both set to that value. Even though they are the same, it is important to specify both configurations keys. Otherwise, the from timezone might be picked from your server’s configuration. Refer to the article on user migrations for more details on how to migrate when UNIX timestamps are expected.

Date only migration

The Date module provided by core offers two storage options. You can store the date only, or you can choose to store the date and time. First, let’s consider a date only field. The source column that contains the data is src_date. An example value is '2019/12/1'. Drupal expects date only fields to store data in Y-m-d format like '2019-12-01'. No timezones are involved in migrating this field. The following snippet shows how to do the transformation.

field_ud_date/value:
  plugin: format_date
  source: src_date
  from_format: 'Y/m/j'
  to_format: 'Y-m-d'

Date range migration

The Date Range module provided by Drupal core allows you to have a start and an end date in a single field. The src_date and src_date_end source columns contain the start and end date, respectively. This migration is very similar to date only fields. The difference is that you need to import an extra subfield to store the end date. The following snippet shows how to do the transformation:

field_ud_date_range/value: '@field_ud_date/value'
field_ud_date_range/end_value:
  plugin: format_date
  source: src_date_end
  from_format: 'Y/m/j'
  to_format: 'Y-m-d'

The value subfield stores the start date. The source column used in the example is the same used for the field_ud_date field. Drupal uses the same format internally for date only and date range fields. Considering these two things, it is possible to reuse the field_ud_date mapping to set the start date of the field_ud_date_range field. To do it, you type the name of the previously mapped field in quotes (') and precede it with an at sign (@). Details on this syntax can be found in the blog post about the migrate process pipeline. One important detail is that when field_ud_date was mapped, the value subfield was specified: field_ud_date/value. Because of this, when reusing that mapping, you must also specify the subfield: '@field_ud_date/value'. The end_value subfield stores the end date. The mapping is similar to field_ud_date expect that the source column is src_date_end.

Note: The Date Range module does not come enabled by default. To be able to use it in the example, it is set as a dependency of demo migration module.

Datetime migration

A date and time field stores its value in Y-m-d\TH:i:s format. Note it does not include a timezone. Instead, UTC is assumed by default. In the example, the source column that contains the data is src_datetime. An example value is 2019/12/24 19:15:30. Let’s assume that all dates are provided with a timezone value of America/Managua. The following snippet shows how to do the transformation:

field_ud_datetime/value:
  plugin: format_date
  source: src_datetime
  from_format: 'Y/m/j H:i:s'
  to_format: 'Y-m-d\TH:i:s'
  from_timezone: 'America/Managua'
  to_timezone: 'UTC'

If you need the timezone to be dynamic, things get a bit harder. The from_timezone and to_timezone settings expect a literal value. It is not possible to read a source column to set these configurations. An alternative is that your source column includes timezone information like 2019/12/24 19:15:30 -07:00. In that case, you would need to tweak the from_format to include the timezone component and leave out the from_timezone configuration.

Things to consider

Date migrations can be tricky because they can be affected by things outside of the Migrate API. Here is a non-exhaustive list of things to consider:

  • For date and time fields, the transformation might be affected by your server’s timezone if you do not manually set the from_timezone configuration.
  • People might see the date and time according to the preferences in their user profile. That is, two users might see a different value for the same migrated field if their preferred timezones are not the same.
  • For date only fields, the user might see a time depending on the format used to display them. A list of available formats can be found at /admin/config/regional/date-time.
  • A field can always be configured to be presented in a specific timezone. This would override the site’s timezone and the user’s preferred timezone.

What did you learn in today’s blog post? Did you know that entity properties and date fields expect different destination formats? Did you know how to do timezone conversions? What challenges have you found when migrating dates and times? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 11 2019
Aug 11

Today we complete the user migration example. In the previous post, we covered how to migrate email, timezone, username, password, and status. This time, we cover creation date, roles, and profile pictures. The source, destination, and dependencies configurations were explained already. Therefore, we are jumping straight to the process transformations in this entry.

Example field mapping for user migration

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD users whose machine name is ud_migrations_users. The two migrations to execute are udm_user_pictures and udm_users. Notice that both migrations belong to the same module. Refer to this article to learn where the module should be placed.

The example assumes Drupal was installed using the standard installation profile. Particularly, we depend on a Picture (user_picture) image field attached to the user entity. The word in parenthesis represents the machine name of the image field.

The explanation below is only for the user migration. It depends on a file migration to get the profile pictures. One motivation to have two migrations is for the images to be deleted if the file migration is rolled back. Note that other techniques exist for migrating images without having to create a separate migration. We have covered two of them in the articles about subfields and constants and pseudofields.

Migrating user creation date

Have a look at the previous post for details on the source values. For reference, the user creation time is provided by the member_since column, and one of the values is April 4, 2014. The following snippet shows how the various user date related properties are set:

created:
  plugin: format_date
  source: member_since
  from_format: 'F j, Y'
  to_format: 'U'
changed: '@created'
access: '@created'
login: '@created'

The created, entity property stores a UNIX timestamp of when the user was added to Drupal. The value itself is an integer number representing the number of seconds since the epoch. For example, 280299600 represents Sun, 19 Nov 1978 05:00:00 GMT. Kudos to the readers who knew this is Drupal's default expire HTTP header. Bonus points if you knew it was chosen in honor of someone’s birthdate. ;-)

Back to the migration, you need to transform the provided date from Month day, year format to a UNIX timestamp. To do this, you use the format_date plugin. The from_format is set to F j, Y which means your source date consists of:

  • The full textual representation of a month: April.
  • Followed by a space character.
  • Followed by the day of the month without leading zeros: 4.
  • Followed by a comma and another space character.
  • Followed by the full numeric representation of a year using four digits: 2014

If the value of from_format does not make sense, you are not alone. It is actually assembled from format characters of the date PHP function. When you need to specify the from and to formats, you basically need to look at the documentation and assemble a string that matches the desired date format. You need to pay close attention because upper and lowercase letters represent different things like Y and y for the year with four-digits versus two-digits respectively. Some date components have subtle variations like d and j for the day with or without leading zeros respectively. Also, take into account white spaces and date component separators. To finish the plugin configuration, you need to set the to_format configuration to something that produces a UNIX timestamp. If you look again at the documentation, you will see that U does the job.

The changed, access, and login entity properties are also dates in UNIX timestamp format. changed indicates when the user account was last updated. access indicates when the user last accessed the site. login indicated when the user last logged in. For brevity, the same value assigned to created is also assigned to these three entity properties. The at sign (@) means copy the value of a previous mapping in the process pipeline. If needed, each property can be set to a different value or left unassigned. None is actually required.

Migrating user roles

For reference, the roles are provided by the user_roles column, and one of the values is forum moderator, forum admin. It is a comma separated list of roles from the legacy system which need to be mapped to Drupal roles. It is possible that the user_roles column is not provided at all in the source. The following snippet shows how the roles are set:

roles:
  - plugin: skip_on_empty
    method: process
    source: user_roles
  - plugin: explode
    delimiter: ','
  - plugin: callback
    callable: trim
  - plugin: static_map
    map:
      'forum admin': administrator
      'webmaster': administrator
    default_value: null

First, the skip_on_empty plugin is used to skip the processing of the roles if the source column is missing. Then, the explode plugin is used to break the list into an array of strings representing the roles. Next, the callback plugin invokes the trim PHP function to remove any leading or trailing whitespace from the role names. Finally, the static_map plugin is used to manually map values from the legacy system to Drupal roles. All of these plugins have been explained previously. Refer to other articles in the series or the plugin documentation for details on how to use and configure them.

There are some things that are worth mentioning about migrating roles using this particular process pipeline. If the comma separated list includes spaces before or after the role name, you need to trim the value because the static map will perform an equality check. Having extraneous space characters will produce a mismatch.

Also, you do not need to map the anonymous or authenticated roles. Drupal users are assumed to be authenticated and cannot be anonymous. Any other role needs to be mapped manually to its machine name. You can find the machine name of any role in its edit page. In the example, only two out of four roles are mapped. Any role that is not found in the static map will be assigned the value null as indicated in the default_value configuration. After processing the null value will be ignored, and no role will be assigned. But you could use this feature to assign a default role in case the static map does not produce a match.

Migrating profile pictures

For reference, the profile picture is provided by the user_photo column, and one of the values is P01. This value corresponds to the unique identifier of one record in the udm_user_pictures file migration, which is part of the same demo module.  It is important to note that the user_picture field is not a user entity property. The field is created by the standard installation profile and attached to the user entity. You can find its configuration in the “Manage fields” tab of the “Account settings” configuration page at /admin/config/people/accounts. The following snippet shows how profile pictures are set:

user_picture/target_id:
  plugin: migration_lookup
  migration: udm_user_pictures
  source: user_photo

Image fields are entity references. Their target_id property needs to be an integer number containing the file id (fid) of the image. This can be obtained using the migration_lookup plugin. Details on how to configure it can be found in this article. You could simply use user_picture as your field mapping because target_id is the default subfield and could be omitted. Also note that the alt subfield is not mapped. If present, its value will be used for the alternative text of the image. But if it is not specified, like in this example, Drupal will automatically generate an alternative text out of the username. An example value would be: Profile picture for user michele.

Technical note: The user entity contains other properties you can write to. For a list of available options, check the baseFieldDefinitions() method of the User class defining the entity. Note that more properties can be available up in the class hierarchy.

And with that, we wrap up the user migration example. We covered how to migrate a user’s mail, timezone, username, password, status, creation date, roles, and profile picture. Along the way, we presented various process plugins that had not been used previously in the series. We showed a couple of examples of process plugin chaining to make sure the migrated data is valid and in the format expected by Drupal.

What did you learn in today’s blog post? Did you know how to process dates for user entity properties? Have you migrated user roles before? Did you know how to import profile pictures? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series is made possible thanks to these generous sponsors. Contact us if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 10 2019
Aug 10

Today we are going to learn how to migrate users into Drupal. The example code will be explained in two blog posts. In this one, we cover the migration of email, timezone, username, password, and status. In the next one, we will cover creation date, roles, and profile pictures. Several techniques will be implemented to ensure that the migrated data is valid. For example, making sure that usernames are not duplicated.

Although the example is standalone, we will build on many of the concepts that had already been covered in the series. For instance, a file migration is included to import images used as profile pictures. This topic has been explained in detail in a previous post, and the example code is pretty similar. Therefore, no explanation is provided about the file migration to keep the focus on the user migration. Feel free to read other posts in the series if you need a refresher.

Example field mapping for user migration

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD users whose machine name is ud_migrations_users. The two migrations to execute are udm_user_pictures and udm_users. Notice that both migrations belong to the same module. Refer to this article to learn where the module should be placed.

The example assumes Drupal was installed using the standard installation profile. Particularly, we depend on a Picture (user_picture) image field attached to the user entity. The word in parenthesis represents the machine name of the image field.

The explanation below is only for the user migration. It depends on a file migration to get the profile pictures. One motivation to have two migrations is for the images to be deleted if the file migration is rolled back. Note that other techniques exist for migrating images without having to create a separate migration. We have covered two of them in the articles about subfields and constants and pseudofields.

Understanding the source

It is very important to understand the format of your source data. This will guide the transformation process required to produce the expected destination format. For this example, it is assumed that the legacy system from which users are being imported did not have unique usernames. Emails were used to uniquely identify users, but that is not desired in the new Drupal site. Instead, a username will be created from a public_name source column. Special measures will be taken to prevent duplication as Drupal usernames must be unique. Two more things to consider. First, source passwords are provided in plain text (never do this!). Second, some elements might be missing in the source like roles and profile picture. The following snippet shows a sample record for the source section:

source:
  plugin: embedded_data
  data_rows:
    - legacy_id: 101
      public_name: 'Michele'
      user_email: '[email protected]'
      timezone: 'America/New_York'
      user_password: 'totally insecure password 1'
      user_status: 'active'
      member_since: 'January 1, 2011'
      user_roles: 'forum moderator, forum admin'
      user_photo: 'P01'
  ids:
    legacy_id:
      type: integer

Configuring the destination and dependencies

The destination section specifies that user is the target entity. When that is the case, you can set an optional md5_passwords configuration. If it is set to true, the system will take an MD5 hashed password and convert it to the encryption algorithm that Drupal uses. For more information password migrations refer to these articles for basic and advanced use cases. To migrate the profile pictures, a separate migration is created. The dependency of user on file is added explicitly. Refer to these articles more information on migrating images and files and setting dependencies. The following code snippet shows how the destination and dependencies are set:

destination:
  plugin: 'entity:user'
  md5_passwords: true
migration_dependencies:
  required:
    - udm_user_pictures
  optional: []

Processing the fields

The interesting part of a user migration is the field mapping. The specific transformation will depend on your source, but some arguably complex cases will be addressed in the example. Let’s start with the basics: verbatim copies from source to destination. The following snippet shows three mappings:

mail: user_email
init: user_email
timezone: user_timezone

The mail, init, and timezone entity properties are copied directly from the source. Both mail and init are email addresses. The difference is that mail stores the current email, while init stores the one used when the account was first created. The former might change if the user updates its profile, while the latter will never change. The timezone needs to be a string taken from a specific set of values. Refer to this page for a list of supported timezones.

name:
  - plugin: machine_name
    source: public_name
  - plugin: make_unique_entity_field
    entity_type: user
    field: name
    postfix: _

The name, entity property stores the username. This has to be unique in the system. If the source data contained a unique value for each record, it could be used to set the username. None of the unique source columns (eg., legacy_id) is suitable to be used as username. Therefore, extra processing is needed. The machine_name plugin converts the public_name source column into transliterated string with some restrictions: any character that is not a number or letter will be converted to an underscore. The transformed value is sent to the make_unique_entity_field. This plugin makes sure its input value is not repeated in the whole system for a particular entity field. In this example, the username will be unique. The plugin is configured indicating which entity type and field (property) you want to check. If an equal value already exists, a new one is created appending what you define as postfix plus a number. In this example, there are two records with public_name set to Benjamin. Eventually, the usernames produced by running the process plugins chain will be: benjamin and benjamin_1.

process:
  pass:
    plugin: callback
    callable: md5
    source: user_password
destination:
  plugin: 'entity:user'
  md5_passwords: true

The pass, entity property stores the user’s password. In this example, the source provides the passwords in plain text. Needless to say, that is a terrible idea. But let’s work with it for now. Drupal uses portable PHP password hashes implemented by PhpassHashedPassword. Understanding the details of how Drupal converts one algorithm to another will be left as an exercise for the curious reader. In this example, we are going to take advantage of a feature provided by the migrate API to automatically convert MD5 hashes to the algorithm used by Drupal. The callback plugin is configured to use the md5 PHP function to convert the plain text password into a hashed version. The last part of the puzzle is set, in the process section, the md5_passwords configuration to true. This will take care of converting the already md5-hashed password to the value expected by Drupal.

Note: MD5-hash passwords are insecure. In the example, the password is encrypted with MD5 as an intermediate step only. Drupal uses other algorithms to store passwords securely.

status:
  plugin: static_map
  source: user_status
  map:
    inactive: 0
    active: 1

The status, entity property stores whether a user is active or blocked from the system. The source user_status values are strings, but Drupal stores this data as a boolean. A value of zero (0) indicates that the user is blocked while a value of one (1) indicates that it is active. The static_map plugin is used to manually map the values from source to destination. This plugin expects a map configuration containing an array of key-value mappings. The value from the source is on the left. The value expected by Drupal is on the right.

Technical note: Booleans are true or false values. Even though Drupal treats the status property as a boolean, it is internally stored as a tiny int in the database. That is why the numbers zero or one are used in the example. For this particular case, using a number or a boolean value on the right side of the mapping produces the same result.

In the next blog post, we will continue with the user migration. Particularly, we will explain how to migrate the user creation time, roles, and profile pictures.

What did you learn in today’s blog post? Have you migrated user passwords before, either in plain text or hashed? Did you know how to prevent duplicates for values that need to be unique in the system? Were you aware of the plugin that allows you to manually map values from source to destination? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

Next: Migrating users into Drupal - Part 2

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Aug 09 2019
Aug 09

Today we continue the conversation about migration dependencies with a hierarchical taxonomy terms example. Along the way, we will present the process and syntax for migrating into multivalue fields. The example consists of two separate migrations. One to import taxonomy terms accounting for term hierarchy. And another to import into a multivalue taxonomy term field. Following this approach, any node and taxonomy term created by the migration process will be removed from the system upon rollback.

Syntax for multivalue field migration.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD multivalue taxonomy terms whose machine name is ud_migrations_multivalue_terms. The two migrations to execute are udm_dependencies_multivalue_term and udm_dependencies_multivalue_node. Notice that both migrations belong to the same module. Refer to this article to learn where the module should be placed.

The example assumes Drupal was installed using the standard installation profile. Particularly, a Tags (tags) taxonomy vocabulary, an Article (article) content type, and a Tags (field_tags) field that accepts multiple values. The words in parenthesis represent the machine name of each element.

Migrating taxonomy terms and their hierarchy

The example data for the taxonomy terms migration is fruits and fruit varieties. Each row will contain the name and description of the fruit. Additionally, it is possible to define a parent term to establish hierarchy. For example, “Red grape” is a child of “Grape”. Note that no numerical identifier is provided. Instead, the value of the <code>name</code> is used as a <code>string</code> identifier for the migration. If term names could change over time, it is recommended to have another column that did not change (e.g., an autoincrementing number). The following snippet shows how the source section is configured:

source:
  plugin: embedded_data
  data_rows:
    - fruit_name: 'Grape'
      fruit_description: 'Eat fresh or prepare some jelly.'
    - fruit_name: 'Red grape'
      fruit_description: 'Sweet grape'
      fruit_parent: 'Grape'
    - fruit_name: 'Pear'
      fruit_description: 'Eat fresh or prepare a jam.'
  ids:
    fruit_name:
      type: string

The destination is quite short. The target entity is set to taxonomy terms. Additionally, you indicate which vocabulary to migrate into. If you have terms that would be stored in different vocabularies, you can use the <code>vid</code> property in the process section to assign the target vocabulary. If you write to a single one, the <code>default_bundle</code> key in the destination can be used instead. The following snippet shows how the destination section is configured:

destination:
  plugin: 'entity:taxonomy_term'
  default_bundle: tags

For the process section, three entity properties are set: name, description, and parent. The first two are strings copied directly from the source. In the case of <code>parent</code>, it is an entity reference to another taxonomy term. It stores the taxonomy term id (<code>tid</code>) of the parent term. To assign its value, the <code>migration_lookup</code> plugin is configured similar to the previous example. The difference is that, in this case, the migration to reference is the same one being defined. This sets an important consideration. Parent terms should be migrated before their children. This way, they can be found by the look up operation. Also note that the look up value is the term name itself, because that is what this migration set as the unique identifier in the source section. The following snippet shows how the process section is configured:

process:
  name: fruit_name
  description: fruit_description
  parent:
    plugin: migration_lookup
    migration: udm_dependencies_multivalue_term
    source: fruit_parent

Technical note: The taxonomy term entity contains other properties you can write to. For a list of available options check the baseFieldDefinitions() method of the Term class defining the entity. Note that more properties can be available up in the class hierarchy.

Migrating multivalue taxonomy terms fields

The next step is to create a node migration that can write to a multivalue taxonomy term field. To stay on point, only one more field will be set: the title, which is required by the node entity. Read this change record for more information on how the Migrate API processes Entity API validation. The following snippet shows how the source section is configured for the node migration:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      thoughtful_title: 'Amazing recipe'
      fruit_list: 'Green apple, Banana, Pear'
    - unique_id: 2
      thoughtful_title: 'Fruit-less recipe'
  ids:
    unique_id:
      type: integer

The fruits column contains a comma separated list of taxonomies to apply. Note that the values match the unique identifiers of the taxonomy term migration. If you had used numbers as migration identifiers there, you would have to use those numbers in this migration to refer to the terms. An example of that was presented in the previous post. Also note that there is one record that has no terms associated. This will be considered during the field mapping. The following snippet shows how the process section is configured for the node migration:

process:
  title: thoughtful_title
  field_tags:
    - plugin: skip_on_empty
      source: fruit_list
      method: process
      message: 'No fruit_list listed.'
    - plugin: explode
      delimiter: ','
    - plugin: migration_lookup
      migration: udm_dependencies_multivalue_term

The title of the node is a verbatim copy of the thoughtful_title column. The Tags fields, mapped using its machine name field_tags, uses three chained process plugins. The skip_on_empty plugin reads the value of the fruit_list column and skips the processing of this field if no value is provided. This is done to accommodate the fact that some records in the source do not specify tags. Note that the method configuration key is set to process. This indicates that only this field should be skipped and not the entire record. Ultimately, tags are optional in this context and nodes should still be imported even if no tag is associated.

The explode plugin allows you to break a string into an array, using a delimiter to determine where to make the cut. Later, this array is passed to the migration_lookup plugin specifying the term migration as the one to use for the look up operation. Again, the taxonomy term names are used here because they are the unique identifiers of the term migration. Note that neither of these plugins has a source configuration. This is because when process plugins are chained, the result of one plugin is sent as source to be transformed by the next one in line. The end result is an array of taxonomy term ids that will be assigned to field_tags. The migration_lookup is able to process single values and arrays.

The last part of the migration is specifying the process section and any dependencies. Refer to this article for more details on setting migration dependencies. The following snippet shows how both are configured for the node migration:

destination:
  plugin: 'entity:node'
  default_bundle: article
migration_dependencies:
  required:
    - udm_dependencies_multivalue_term
  optional: []

More syntactic sugar

One way to set multivalue fields in Drupal migrations is assigning its value to an array. Another option is to set each value manually using field deltas. Deltas are integer numbers starting with zero (0) and incrementing by one (1) for each element of a multivalue field. Although you could set any delta in the Migrate API, consider the field definition in Drupal. It is possible that limits had been set to the number of values a field could hold. You can specify deltas and subfields at the same time. The full syntax is field_name/field_delta/subfield. The following example shows the syntax for a multivalue image field:

process:
  field_photos/0/target_id: source_fid_first
  field_photos/0/alt: source_alt_first
  field_photos/1/target_id: source_fid_second
  field_photos/1/alt: source_alt_second
  field_photos/2/target_id: source_fid_third
  field_photos/2/alt: source_alt_third

Manually setting a multivalue field is less flexible and error-prone. In today’s example, we showed how to accommodate for the list of terms not being provided. Imagine having to that for each delta and subfield combination, but the functionality is there in case you need it. In the end, Drupal offers more syntactic sugar so you can write shorted field mappings. Additionally, there are various process plugins that can handle arrays for setting multivalue fields.

Note: There are other ways to migrate multivalue fields. For example, when using the entity_generate plugin provided by Migrate Plus, there is no need to create a separate taxonomy term migration. This plugin is able to create the terms on the fly while running the import process. The caveat is that terms created this way are not deleted upon rollback.

What did you learn in today’s blog post? Have you ever done a taxonomy term migration before? Were you aware of how to migrate hierarchical entities? Did you know you can manually import multivalue fields using deltas? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 09 2019
Aug 09

One of Drupal’s biggest strengths is its data modeling capabilities. You can break the information that you need to store into individual fields and group them in content types. You can also take advantage of default behavior provided by entities like nodes, users, taxonomy terms, files, etc. Once the data has been modeled and saved into the system, Drupal will keep track of the relationship between them. Today we will learn about migration dependencies in Drupal.

As we have seen throughout the series, the Migrate API can be used to write to different entities. One restriction though is that each migration definition can only target one type of entity at a time. Sometimes, a piece of content has references to other elements. For example, a node that includes entity reference fields to users, taxonomy terms, and images. The recommended way to get them into Drupal is writing one migration definition for each. Then, you specify the relationships that exist among them.

Snippet of migration dependency definition

Breaking up migrations

When you break up your migration project into multiple, smaller migrations they are easier to manage and you have more control of process pipeline. Depending on how you write them, you can rest assured that imported data is properly deleted if you ever have to rollback the migration. You can also enforce that certain elements exist in the system before others that depend on them can be created. In today’s example, we are going to leverage the example from the previous post to demonstrate this. The portraits imported in the file migration will be used in the image field of nodes of type article.

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD migration dependencies introduction whose machine name is ud_migrations_dependencies_intro. Last time the udm_dependencies_intro_image was imported. This time udm_dependencies_intro_node will be executed. Notice that both migrations belong to the same module. Refer to this article to learn where the module should be placed.

Writing the source and destination definition

To keep things simple, the example will only write the node title and assign the image field. A constant will be provided to create the alternative text for the images. The following snippet shows how the source section is configured:

source:
  constants:
    PHOTO_DESCRIPTION_PREFIX: 'Photo of'
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      name: 'Michele Metts'
      photo_file: 'P01'
    - unique_id: 2
      name: 'David Valdez'
      photo_file: 'P03'
    - unique_id: 3
      name: 'Clayton Dewey'
      photo_file: 'P04'
  ids:
    unique_id:
      type: integer

Remember that in this migration you want to use files that have already been imported. Therefore, no URLs to the image files are provided. Instead, you need a reference to the other migration. Particularly, you need a reference to the unique identifiers for each element of the file migration. In the process section, this value will be used to look up the portrait that will be assigned to the image field.

The destination section is quite short. You only specify that the target is a node entity and the content type is article. Remember that you need to use the machine name of the content type. If you need a refresher on how this is set up, have a look at the articles in the series. It is recommended to read them in order as some examples expand on topics that had been previously covered. The following snippet shows how the destination section is configured:

destination:
  plugin: 'entity:node'
  default_bundle: article

Using previously imported files in image fields

To be able to reuse the previously imported files, the migrate_lookup plugin is used. Additionally, an alternative text for the image is created using a contact plugin concat plugin. The following snippet shows how the process section is configured:

process:
  title: name
  field_image/target_id:
    plugin: migration_lookup
    migration: udm_dependencies_intro_image
    source: photo_file
  field_image/alt:
    plugin: concat
    source:
      - constants/PHOTO_DESCRIPTION_PREFIX
      - name
    delimiter: ' '

In Drupal, files and images are entity reference fields. That means, they only store a pointer to the file, not the file itself. The pointer is an integer number representing the file ID (fid) inside Drupal. The migration_lookup plugin allows you to query the file migration so imported elements can be reused in node migration.

The migration option indicates which migration to query specifying its migration id. Additionally, you indicate which columns in your source match the unique identifiers of the migration to query. In this case, the values of the photo_file column in udm_dependencies_intro_node matches those of the photo_url column in udm_dependencies_intro_image. If a match is found, this plugin will return the file ID which can be directly assigned to the target_id of the image field. That is how the relationship between the two migrations is established.

Note: The migration_lookup plugin allows you to query more than one migration at a time. Refer to the documentation for details on how to set that up and why you would do it. It also offers additional configuration options.

As a good accessibility practice, an alternative text is set for the image using the alt subfield. Other than that, only the node title is set. And with that, you have two migrations connected between them. If you were to rollback both of them, no file or node would remain in the system.

Being explicit about migration dependencies

The node migration depends on the file migration. That it, it is required for the files to be migrated first before they can be used to as images for the nodes. In fact, in the provided example, if you were to import the nodes before the files, the migration would fail and no node would be created. You can be explicit about migration dependencies. To do it, add a new configuration option to the node migration that lists which migrations it depends on. The following snippet shows how this is configured:

migration_dependencies:
  required:
    - udm_dependencies_intro_image
  optional: []

The migration_dependencies key goes at the root level of the YAML definition file. It accepts two configuration options: required and optional. Both accept an array of migration ids. The required migrations are hard prerequisites. They need to be executed in advance or the system will refuse to import the current one. The optional migrations do not have to be executed in advance. But if you were to execute multiple migrations at a time, the system will run them in the order suggested by the dependency hierarchy. Learn more about migration dependencies in this article. Also, check this comment on Drupal.org in case you have problems where the system reports that certain dependencies are not met.

Now that the dependency among migrations has been explicitly established you have two options. Either import each migration manually in the expected order. Or, import the parent migration using the --execute-dependencies flag. When you do that, the system will take care of determining the order in which all migrations need to be imported. The following two snippets will produce the same result for the demo module:

$ drush migrate:import udm_dependencies_intro_image
$ drush migrate:import udm_dependencies_intro_node
$ drush migrate:import udm_dependencies_intro_node --execute-dependencies

In this example, there are only two migrations, but you can have as many as needed. For example, a node with references to users, taxonomy terms, paragraphs, etc. Also note that the parent entity does not have to be a node. Users, taxonomy terms, and paragraphs are all fieldable entities. They can contain references the same way nodes do. In future entries, we will talk again about migration dependencies and provide more examples.

Tagging migrations

The core Migrate API offers another mechanism to execute multiple migrations at a time. You can tag them. To do that you add a migration_tags key at the root level of the YML definition file. Its value an array of arbitrary tag names to assign to the migration. Once set, you run them using the migrate import command with the --tag flag. You can also rollback migrations per tag. The first snippet shows how to set the tags and the second how to execute them:

migration_tags:
  - UD Articles
  - UD Example
$ drush migrate:import --tag=UD Articles,UD Example
$ drush migrate:rollback --tag=UD Articles,UD Example

It is important to note that tags and dependencies are different concepts. They allow you to run multiple migrations at a time. It is possible that a migration definition file contains both, either, or neither. The tag system is used extensively in Drupal core for migrations related to upgrading to Drupal 8 from previous versions. For example, you might want to run all migrations tagged ‘Drupal 7’ if you are coming from that version. It is possible to specify more than one tag when running the migrate import command separating each with a comma (,).

Note: The Migrate Plus module offers migration groups to organize migrations similarly to how tags work. This will be covered in a future entry. Just keep in mind that tags are provided out of the box by the Migrate API. On the other hand, migrations groups depend on a contributed module.

What did you learn in today’s blog post? Have you used the migration_lookup plugin to query imported elements from a separate migration? Did you know you can set required and optional dependencies? Have you used tags to organize your migrations? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 08 2019
Aug 08

We have already covered two of many ways to migrate images into Drupal. One example allows you to set the image subfields manually. The other example uses a process plugin that accomplishes the same result using plugin configuration options. Although valid ways to migrate images, these approaches have an important limitation. The files and images are not removed from the system upon rollback. In the previous blog post, we talked further about this topic. Today, we are going to perform an image migration that will clear after itself when it is rolled back. Note that in Drupal images are a special case of files. Even though the example will migrate images, the same approach can be used to import any type of file. This migration will also serve as the basis for explaining migration dependencies in the next blog post.

Code snippet for file entity migration

File entity migrate destination

All the examples so far have been about creating nodes. The migrate API is a full ETL framework able to write to different destinations. In the case of Drupal, the target can be other content entities like files, users, taxonomy terms, comments, etc. Writing to content entities is straightforward. For example, to migrate into files, the process section is configured like this:

destination:
  plugin: 'entity:file'

You use a plugin whose name is entity: followed by the machine name of your target entity. Other possible values that could be used are user, taxonomy_term, and comment. Remember that each migration definition file can only write to one destination.

Source section definition

The source of a migration is independent of its destination. The following code snippet shows the source definition for the image migration example:

source:
  constants:
    SOURCE_DOMAIN: 'https://agaric.coop'
    DRUPAL_FILE_DIRECTORY: 'public://portrait/'
  plugin: embedded_data
  data_rows:
    - photo_id: 'P01'
      photo_url: 'sites/default/files/2018-12/micky-cropped.jpg'
    - photo_id: 'P02'
      photo_url: ''
    - photo_id: 'P03'
      photo_url: 'sites/default/files/pictures/picture-94-1480090110.jpg'
    - photo_id: 'P04'
      photo_url: 'sites/default/files/2019-01/clayton-profile-medium.jpeg'
  ids:
    photo_id:
      type: string

Note that the source contains relative paths to the images. Eventually, we will need an absolute path to them. Therefore, the SOURCE_DOMAIN constant is created to assemble the absolute path in the process pipeline. Also, note that one of the rows contains an empty photo_url. No file can be created without a proper URL. In the process section we will accommodate for this. An alternative could be to filter out invalid data in a source clean up operation before executing the migration.

Another important thing to note is that the row identifier photo_id is of type string. You need to explicitly tell the system the name and type of the identifiers you want to use. The configuration for this varies slightly from one source plugin to another. For the embedded_data plugin, you do it using the ids configuration key. It is possible to have more than one source column as identifier. For example, if the combination of two columns (e.g. name and date of birth) are required to uniquely identify each element (e.g. person) in the source.

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD migration dependencies introduction whose machine name is ud_migrations_dependencies_intro. The migration to run is udm_dependencies_intro_image. Refer to this article to learn where the module should be placed.

Process section definition

The fields to map in the process section will depend on the target. For files and images, only one entity property is required: uri. It has to be set as a reference to the file using stream wrappers. In this example, the public stream (public://) is used to store the images in a location that is publicly accessible by any visitor to the site. If the file was already in the system and we knew the path the whole process section for this migration could be reduced to two lines:

process:
  uri: source_column_file_uri

That is rarely the case though. Fortunately, there are many process plugins that allow you to transform the available data. When combined with constants and pseudofields, you can come up with creative solutions to produce the format expected by your destination.

Skipping invalid records

The source for this migration contains one record that lacks the URL to the photo. No image can be imported without a valid path. Let’s accommodate for this. In the same step, a pseudofield will be created to extract the name of the file out of its path.

psf_destination_filename:
  - plugin: callback
    callable: basename
    source: photo_url
  - plugin: skip_on_empty
    method: row
    message: 'Cannot import empty image filename.'

The psf_destination_filename pseudofield uses the callback plugin to derive the filename from the relative path to the image. This is accomplished using the basename PHP function. Also, taking advantage of plugin chaining, the system is instructed to skip process the row if no filename could be obtained. For example, because an empty source value was provided. This is done by the skip_on_empty which is also configured log a message to indicate what happened. In this case, the message is hardcoded. You can make it dynamic to include the ID of the row that was skipped using other process plugins. This is left as an exercise to the curious reader. Feel free to share your answer in the comments below.

Tip: To read the messages log during any migration, execute the following Drush command: drush migrate:messages [migration-id].

Creating the destination URI

The next step is to create the location where the file is going to be saved in the system. For this, the psf_destination_full_path pseudofield is used to concatenate the value of a constant defined in the source and the file named obtained in the previous step. As explained before, order is important when using pseudofields as part of the migrate process pipeline. The following snippet shows how to do it:

psf_destination_full_path:
  - plugin: concat
    source:
      - constants/DRUPAL_FILE_DIRECTORY
      - '@psf_destination_filename'
  - plugin: urlencode

The end result of this operation would be something like public://portrait/micky-cropped.jpg. The URI specifies that the image should be stored inside a portrait subdirectory inside Drupal’s public file system. Copying files to specific subdirectories is not required, but it helps with file organizations. Also, some hosting providers might impose limitations on the number of files per directory. Specifying subdirectories for your file migrations is a recommended practice.

Also note that after the URI is created, it gets encoded using the urlencode plugin. This will replace special characters to an equivalent string literal. For example, é and ç will be converted to %C3%A9 and %C3%A7 respectively. Space characters will be changed to %20. The end result is an equivalent URI that can be used inside Drupal, as part of an email, or via another medium. Always encode any URI when working with Drupal migrations.

Creating the source URI

The next step is to create assemble an absolute path for the source image. For this, you concatenate the domain stored in a source constant and the image relative path stored in a source column. The following snippet shows how to do it:

psf_source_image_path:
  - plugin: concat
    delimiter: '/'
    source:
      - constants/SOURCE_DOMAIN
      - photo_url
  - plugin: urlencode

The end result of this operation will be something like https://agaric.coop/sites/default/files/2018-12/micky-cropped.jpg. Note that the concat and urlencode plugins are used just like in the previous step. A subtle difference is that a delimiter is specifying in the concatenation step. This is because, contrary to the DRUPAL_FILE_DIRECTORY constant, the SOURCE_DOMAIN constant does not end with a slash (/). This was done intentionally to highlight things. First, it is important to understand your source data. Second, you can transform it as needed by using various process plugins.

Copying the image file to Drupal

Only two tasks remain to complete this image migration: download the image and assign the uri property of the file entity. Luckily, both steps can be accomplished at the same time using the file_copy plugin. The following snippet shows how to do it:

uri:
  plugin: file_copy
  source:
    - '@psf_source_image_path'
    - '@psf_destination_full_path'
  file_exists: 'rename'
  move: FALSE

The source configuration of file_copy plugin expects an array of two values: the URI to copy the file from and the URI to copy the file to. Optionally, you can specify what happens if a file with the same name exists in the destination directory. In this case, we are instructing the system to rename the file to prevent name clashes. The way this is done is appending the string _X to the filename and before the file extension. The X is a number starting with zero (0) that keeps incrementing until the filename is unique. The move flag is also optional. If set to TRUE it tells the system that the file should be moved instead of copied. As you can guess, Drupal does not have access to the file system in the remote server. The configuration option is shown for completeness, but does not have any effect in this example.

In addition to downloading the image and place it inside Drupal’s file system, the file_copy also returns the destination URI. That is why this plugin can be used to assign the uri destination property. And that’s it, you have successfully imported images into Drupal! Clever use of the process pipeline, isn’t it? ;-)

One important thing to note is an image’s alternative text, title, width, and height are not associated with the file entity. That information is actually stored in a field of type image. This will be illustrated in the next article. To reiterate, the same approach to migrate images can be used to migrate any file type.

Technical note: The file entity contains other properties you can write to. For a list of available options check the baseFieldDefinitions() method of the File class defining the entity. Note that more properties can be available up in the class hierarchy. Also, this entity does not have multiple bundles like the node entity does.

What did you learn in today’s blog post? Had you created file migrations before? If so, had you followed a different approach? Did you know that you can do complex data transformations using process plugins? Did you know you can skip the processing of a row if the required data is not available? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 07 2019
Aug 07

We have presented several examples as part of this migration blog post series. They started very simple and have been increasing in complexity. Until now, we have been rather optimistic. Get the sample code, install any module dependency, enable the module that defines the migration, and execute it assuming everything works on the first try. But Drupal migrations often involve a bit of trial and error. At the very least, it is an iterative process. Today we are going to talk about what happens after import and rollback operations, how to recover from a failed migration, and some tips for writing definition files.

List of drush commands used in drupal migration workflows

Importing and rolling back migrations

When working on a migration project, it is common to write many migration definition files. Even if you were to have only one, it is very likely that your destination will require many field mappings. Running an import operation to get the data into Drupal is the first step. With so many moving parts, it is easy not to get the expected results on the first try. When that happens, you can run a rollback operation. This instructs the system to revert anything that was introduced when then migration was initially imported. After rolling back, you can make changes to the migration definition file and rebuild Drupal’s cache for the system to pick up your changes. Finally, you can do another import operation. Repeat this process until you get the results you expect. The following code snippet shows a basic Drupal migration workflow:

# 1) Run the migration.
$ drush migrate:import udm_subfields

# 2) Rollback migration because the expected results were not obtained.
$ drush migrate:rollback udm_subfields

# 3) Change the migration definition file.

# 4) Rebuild caches for changes to be picked up.
$ drush cache:rebuild

# 5) Run the migration again
$ drush migrate:import udm_subfields

The example above assumes you are using Drush to run the migration commands. Specifically, the commands provided by Migrate Run or Migrate Tools. You pick one or the other, but not both as the commands provided for two modules are the same. If you were to have both enabled, they will conflict with each other and fail.
Another thing to note is that the example uses Drush 9. There were major refactorings between versions 8 and 9 which included changes to the name of the commands. Finally, udm_subfields is the id of the migration to run. You can find the full code in this article.

Tip: You can use Drush command aliases to write shorter commands. Type drush [command-name] --help for a list of the available aliases.

Technical note: To pick up changes to the definition file, you need to rebuild Drupal’s caches. This is the procedure to follow when creating the YAML files using Migrate API core features and placing them under the migrations directory. It is also possible to define migrations as configuration entities using the Migrate Plus module. In those cases, the YAML files follow a different naming convention and are placed under the config/install directory. For picking up changes, in this case, you need to sync the YAML definition using configuration management workflows. This will be covered in a future entry.

Stopping and resetting migrations

Sometimes, you do not get the expected results due to an oversight in setting a value. On other occasions, fatal PHP errors can occur when running the migration. The Migrate API might not be able to recover from such errors. For example, using a non-existent PHP function with the callback plugin. Give it a try by modifying the example in this article. When these errors happen, the migration is left in a state where no import or rollback operations could be performed.

You can check the state of any migration by running the drush migrate:status command. Ideally, you want them in Idle state. When something fails during import or rollback, you would get the Importing or Rolling back states. To get the migration back to Idle, you stop the migration and reset its status. The following snippet shows how to do it:

# 1) Run the migration.
$ drush migrate:import udm_process_intro

# 2) Some non recoverable error occurs. Check the status of the migration.
$ drush migrate:status udm_process_intro

# 3) Stop the migration.
$ drush migrate:stop udm_process_intro

# 4) Reset the status to idle.
$ drush migrate:reset-status udm_process_intro

# 5) Rebuild caches for changes to be picked up.
$ drush cache:rebuild

# 6) Rollback migration because the expexted results were not obtained.
$ drush migrate:rollback udm_process_intro

# 7) Change the migration definition file.

# 8) Rebuild caches for changes to be picked up.
$ drush cache:rebuild

# 9) Run the migration again.
$ drush migrate:import udm_process_intro

Tip: The errors thrown by the Migrate API might not provide enough information to determine what went wrong. An excellent way to familiarize yourselves with the possible errors is by intentionally braking working migrations. In the example repository of this series, there are many migrations you can modify. Try anything that comes to mind: not leaving a space after a colon (:) in a key-value assignment; not using proper indentation; using wrong subfield names; using invalid values in property assignments; etc. You might be surprised by how Migrate API deals with such errors. Also, note that many other Drupal APIs are involved. For example, you might get a YAML file parse error, or an Entity API save error. When you have seen an error before, it is usually faster to identify the cause and fix it in the future.

What happens when you rollback a Drupal migration?

In an ideal scenario, when a migration is rolled back, it cleans after itself. That means, it removes any entity that was created during the import operation: nodes, taxonomy terms, files, etc. Unfortunately, that is not always the case. It is very important to understand this when planning and executing migrations. For example, you might not want to leave taxonomy terms or files that are no longer in use. Whether any dependent entity is removed or not has to do with how plugins or entities work.

For example, when using the file_import or image_import plugins provided by Migrate File, the created files and images are not removed from the system upon rollback. When using the entity_generate plugin from Migrate Plus, the create entity also remains in the system after a rollback operation.

In the next blog post, we are going to start talking about migration dependencies. What happens with dependent migrations (e.g., files and paragraphs) when the migration for host entity (e.g., node) is rolled back? In this case, the Migrate API will perform an entity delete operation on the node. When this happens, referenced files are kept in the system, but paragraphs are automatically deleted. For the curious, this behavior for paragraphs is actually determined by its module dependency: Entity Reference Revisions. We will talk more about paragraphs migrations in future blog posts.

The moral of the story is that the behavior migration system might be affected by other Drupal APIs. And in the case of rollback operations, make sure to read the documentation or test manually to find out when migrations clean after themselves and when they do not.

Note: The focus of this section was content entity migrations. The general idea can be applied to configuration entities or any custom target of the ETL process.

Re-import or update migrations

We just mentioned that Migrate API issues an entity delete action when rolling back a migration. This has another important side effect. Entity IDs (nid, uid, tid, fid, etc.) are going to change every time you rollback an import again. Depending on auto generated IDs is generally not a good idea. But keep it in mind in case your workflow might be affected. For example, if you are running migrations in a content staging environment, references to the migrated entities can break if their IDs change. Also, if you were to manually update the migrated entities to clean up edge cases, those changes would be lost if you rollback and import again. Finally, keep in mind test data might remain in the system, as described in the previous section, which could find its way to production environments.

An alternative to rolling back a migration is to not execute this operation at all. Instead, you run an import operation again using the update flag. This tells the system that in addition to migrating unprocessed items from the source, you also want to update items that were previously imported using their current values. To do this, the Migrate API relies on source identifiers and map tables. You might want to consider this option when your source changes overtime, when you have a large number of records to import, or when you want to execute the same migration many times on a schedule.

Note: On import operations, the Migrate API issues an entity save action.

Tips for writing Drupal migrations

When working on migration projects, you might end up with many migration definition files. They can set dependencies on each other. Each file might contain a significant number of field mappings. There are many things you can do to make Drupal migrations more straightforward. For example, practicing with different migration scenarios and studying working examples. As a reference to help you in the process of migrating into Drupal, consider these tips:

  • Start from an existing migration. Look for an example online that does something close to what you need and modify it to your requirements.
  • Pay close attention to the syntax of the YAML file. An extraneous space or wrong indentation level can break the whole migration.
  • Read the documentation to know which source, process, and destination plugins are available. One might exist already that does exactly what you need.
  • Make sure to read the documentation for the specific plugins you are using. Many times a plugin offer optional configurations. Understand the tools at your disposal and find creative ways to combine them.
  • Look for contributed modules that might offer more plugins or upgrade paths from previous versions of Drupal. The Migrate ecosystem is vibrant, and lots of people are contributing to it.
  • When writing the migration pipeline, map one field at a time. Problems are easier to isolate if there is only one thing that could break at a time.
  • When mapping a field, work on one subfield at a time if possible. Some field types like images and addresses offer many subfields. Again, try to isolate errors by introducing individual changes each time.
  • Commit to your code repository any and every change that produces right results. That way, you can go back in time and recover a partially working migration.
  • Learn about debugging migrations. We will talk about this topic in a future blog post.
  • See help from the community. Migrate maintainers and enthusiasts are very active and responsive in the #migrate channel of Drupal slack.
  • If you feel stuck, take a break from the computer and come back to it later. Resting can do wonders in finding solutions to hard problems.

What did you learn in today’s blog post? Did you know what happens upon importing and rolling back a migration? Did you know that in some cases, data might remain in the system even after rollback operations? Do you have a use case for running migrations with the update flag? Do you have any other advice on writing migrations? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 06 2019
Aug 06

So far we have learned how to write basic Drupal migrations and use process plugins to transform data to meet the format expected by the destination. In the previous entry we learned one of many approaches to migrating images. In today’s example, we will change it a bit to introduce two new migration concepts: constants and pseudofields. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the migrate process pipeline.

Syntax for constants and pseudofields in the Drupal process migration pipeline

Setting and using constants

In the Migrate API, a constant is an arbitrary value that can be used later in the process pipeline. They are set as direct children of  the source section. You write a constants key whose value is a list of name-value pairs. Even though they are defined in the source section, they are independent of the particular source plugin in use. The following code snippet shows a generalization for settings and using constants:

source:
  constants:
    MY_STRING: 'http://understanddrupal.com'
    MY_INTEGER: 31
    MY_DECIMAL: 3.1415927
    MY_ARRAY:
      - 'dinarcon'
      - 'dinartecc'
  plugin: source_plugin_name
  source_plugin_config_1: source_config_value_1
  source_plugin_config_2: source_config_value_2
process:
  process_destination_1: constants/MY_INTEGER
  process_destination_2:
    plugin: concat
    source: constants/MY_ARRAY
    delimiter: ' '

You can set as many constants as you need. Although not required by the API, it is a common convention to write the constant names in all uppercase and using underscores (_) to separate words. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other column provided by the source plugin. Note that you use the constant you need to name the full hierarchy under the source section. That is, the word constant and the name itself separated by a slash (/) symbol. They can be used to copy their value directly to the destination or as part of any process plugin configuration.

Technical note: The word constants for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of your particular source plugin. A reason to use a different name is that your source actually contains a column named constants. In that case you could use defaults or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:

source:
  defaults:
    MY_VALUE: 'http://understanddrupal.com'
  plugin: source_plugin_name
  source_plugin_config: source_config_value
process:
  process_destination: defaults/MY_VALUE

Setting and using pseudofields

Similar to constants, pseudofields stores arbitrary values for use later in the process pipeline. There are some key differences. Pseudofields are set in the process section. The name is arbitrary as long as it does not conflict with a property name or field name of the destination. The value can be set to a verbatim copy from the source (a column or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for settings and using pseudofields:

source:
  constants:
    MY_BASE_URL: 'http://understanddrupal.com'
  plugin: source_plugin_name
  source_plugin_config_1: source_config_value_1
  source_plugin_config_2: source_config_value_2
process:
  title: source_column_title
  my_pseudofield_1:
    plugin: concat
    source:
      - constants/MY_BASE_URL
      - source_column_relative_url
    delimiter: '/'
  my_pseudofield_2:
    plugin: urlencode
    source: '@my_pseudofield_1'
  field_link/uri: '@my_pseudofield_2'
  field_link/title: '@title'

In the above example, my_pseudofield_1 is set to the result of a concat process transformation that joins a constant and a column from the source section. The result value is later used as part of a urlencode process transformation. Note that to use the value from my_pseudofield_1 you have to enclose it in quotes (') and prepend an at sign (@) to the name. The new value obtained from URL encode operation is stored in my_pseudofield_2. This last pseudofield is used to set the value of the URI subfield for field_link. The example could be simplified, for example, by using a single pseudofield and chaining process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.

Technical note: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You might have to look at the source for the entity and the configuration of the bundle. In the case of a node migration, look at the baseFieldDefinitions() method of the Node class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the “Manage fields” section of the content type you are migrating into. The Field API prefixes any field created via the administration interface with the string field_. This reduces the likelihood of name clashes. Other than these two name restrictions, anything else can be used. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.

Understanding Drupal Migrate API process pipeline

The migrate process pipeline is a mechanism by which the value of any destination property, field, or pseudofield that has been set can be used by anything defined later in the process section. The fact that using a pseudofield requires to enclose its name in quotes and prepend an at sign is actually a requirement of the process pipeline. Let’s see some examples using a node migration:

  • To use the title property of the node entity, you would write @title
  • To use the field_body field of the Basic page content type, you would write @field_body
  • To use the my_temp_value pseudofield, you would write @my_temp_value

In the process pipeline, these values can be used just like constants and columns from the source. The only restriction is that they need to be set before being used. For those familiar with the rewrite results feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in quotes and prepend it with an at sign, you are telling the migrate API to look for that element in the process section instead of the source section.

Migrating images using the image_import plugin

Let’s practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous entry. The Migrate Files module provides another process plugin named image_import that allows you to directly set all the subfield values in the plugin configuration itself.

As in previous examples, we will create a new module and write a migration definition file to perform the migration. It is assumed that Drupal was installed using the standard installation profile. The code snippets will be compact to focus on particular elements of the migration. The full code is available at https://github.com/dinarcon/ud_migrations The module name is UD Migration constants and pseudofields and its machine name is ud_migrations_constants_pseudofields. The id of the example migration is udm_constants_pseudofields. Refer to this article for instructions on how to enable the module and run the migration. Make sure to download and enable the Migrate Files module. Otherwise, you will get an error like: “In DiscoveryTrait.php line 53: The "file_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...”. Let’s see part of the source definition:

source:
  constants:
    BASE_URL: 'https://agaric.coop'
    PHOTO_DESCRIPTION_PREFIX: 'Photo of'
  plugin: embedded_data
  data_rows:
    -
      unique_id: 1
      name: 'Michele Metts'
      photo_url: 'sites/default/files/2018-12/micky-cropped.jpg'
      photo_width: '587'
      photo_height: '657'

Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image. Note that this time, the photo_url does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is https://agaric.coop so that value is stored in the BASE_URL constant which is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant stores the prefix to add to the name to create a photo description. Now, let’s see the process definition:

process:
  title: name
  psf_image_url:
    plugin: concat
    source:
      - constants/BASE_URL
      - photo_url
    delimiter: '/'
  psf_image_description:
    plugin: concat
    source:
      - constants/PHOTO_DESCRIPTION_PREFIX
      - name
    delimiter: ' '
  field_image:
    plugin: image_import
    source: '@psf_image_url'
    reuse: TRUE
    alt: '@psf_image_description'
    title: '@title'
    width: photo_width
    height: photo_height

The title node property is set directly to the value of the name column from the source. Then, two pseudofields. psf_image_url stores a valid absolute URL to the image using the BASE_URL constant and the photo_url column from the source. psf_image_description uses the PHOTO_DESCRIPTION_PREFIX constant and the name column from the source to store a description for the image.

For the field_image field, the image_import plugin is used. This time, the subfields are not set manually. Instead, they are assigned using plugin configuration keys. The absence of the id_only configuration allows for this. The URL to the image is set in the source key and uses the psf_image_url pseudofield. The alt key allows you to set the alternative attribute for the image and in this case the psf_image_description pseudofield is used. For the title subfield sets the text of a subfield with the same name and in this case it is assigned the value of the title node property which was set at the beginning of the process pipeline. Remember that not only psedufields are available. Finally, the width and height configuration uses the columns from the source to set the values of the corresponding subfields.

What did you learn in today’s blog post? Did you know you can define constants in your source as data placeholders for use in the process section? Were you aware that pseudofields can be created in the process section to store intermediary data for process definitions that come next? Have you ever wondered what is the migration process pipeline and how it works? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 05 2019
Aug 05

In the previous entry, we learned how to use process plugins to transform data between source and destination. Some Drupal fields have multiple components. For example, formatted text fields store the text to display and the text format to apply. Image fields store a reference to the file, alternative, and title text, width, and height. The migrate API refers to a field’s component as a subfield. Today we will learn how to migrate into them and know which subfields are available.

Examples of migrating into subfields

Getting the example code

Today’s example will consist of migrating data into the `Body` and `Image` fields of the `Article` content type that are available out of the box. This assumes that Drupal was installed using the `standard` installation profile. As in previous examples, we will create a new module and write a migration definition file to perform the migration. The code snippets will be compact to focus on particular elements of the migration. The full code snippet is available at https://github.com/dinarcon/ud_migrations The module name is `UD Migration Subfields` and its machine name is `ud_migrations_subfields`. The `id` of the example migration is `udm_subfields`. Refer to this article for instructions on how to enable the module and run the migration.

source:
  plugin: embedded_data
  data_rows:
    -
      unique_id: 1
      name: 'Michele Metts'
      profile: 'freescholar on Drupal.org'
      photo_url: 'https://agaric.coop/sites/default/files/2018-12/micky-cropped.jpg'
      photo_description: 'Photo of Michele Metts'
      photo_width: '587'
      photo_height: '657'

Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image.

Migrating formatted text

The `Body` field is of type `Text (formatted, long, with summary)`. This type of field has three components: the full text (value) to present, a summary text, and a text format. The Migrate API allows you to write to each component separately defining subfields targets. The next code snippets shows how to do it:

process:
  field_text_with_summay/value: source_value
  field_text_with_summay/summary: source_summary
  field_text_with_summay/format: source_format

The syntax to migrate into subfields is the machine name of the field and the subfield name separated by a slash (/). Then, a colon (:), a space, and the value. You can set the value to a source column name for a verbatim copy or use any combination of process plugins. It is not required to migrate into all subfields. Each field determines what components are required, so it is possible that not all subfields are set. In this example, only the value and text format will be set.

process:
  body/value: profile
  body/format:
    plugin: default_value
    default_value: restricted_html

The `value` subfield is set to the `profile` source column. As you can see in the first snippet, it contains HTML markup. An `a` tag to be precise. Because we want the tag to be rendered as a link, a text format that allows such tag needs to be specified. There is no information about text formats in the source, but Drupal comes with a couple we can choose from. In this case, we use the `Restricted HTML` text format. Note that the `default_value` plugin is used and set to `restricted_html`. When setting text formats, it is necessary to use its machine name. You can find them in the configuration page for each text format. For `Restricted HTML` that is /admin/config/content/formats/manage/restricted_html.

Note: Text formats are a whole different subject that even has security implications. To keep the discussion on topic, we will only give some recommendations. When you need to migrate HTML markup, you need to know which tags appear in your source, which ones you want to allow in Drupal, and select a text format that accepts what you have whitelisted and filter out any dangerous tags like `script`. As a general rule, you should avoid setting the `format` subfield to use the `Full HTML` text format.

Migrating images

There are different approaches to migrating images. Today, we are going to use the Migrate Files module. It is important to note that Drupal treats images as files with extra properties and behavior. Any approach used to migrate files can be adapted to migrate images.

process:
  field_image/target_id:
    plugin: file_import
    source: photo_url
    reuse: TRUE
    id_only: TRUE
  field_image/alt: photo_description
  field_image/title: photo_description
  field_image/width: photo_width
  field_image/height: photo_height

When migrating any field you have to use their machine in the mapping section. For the `Image` field, the machine name is `field_image`. Knowing that, you set each of its subfields:

  • `target_id` stores an integer number which Drupal uses as a reference to the file.
  • `alt` stores a string that represents the alternative text. Always set one for better accessibility.
  • `title` stores a string that represents the title attribute.
  • `width` stores an integer number which represents the width in pixels.
  • `height` stores an integer number which represents the height in pixels.

For the `target_id`, the plugin `file_import` is used. This plugin requires a `source` configuration value with a URL to the file. In this case, the `photo_url` column from the source section is used. The `reuse` flag indicates that if a file with the same location and name exists, it should be used instead of downloading a new copy. When working on migrations, it is common to run them over and over until you get the expected results. Using the `reuse` flag will avoid creating multiple references or copies of the image file, depending on the plugin configuration. The `id_only` flag is set so that the plugin only returns that file identifier used by Drupal instead of an entity reference array. This is done because each subfield is being set manually. For the rest of the subfields (`alt`, `title`, `width`, and `height`) the value is a verbatim copy from the source.

Note: The Migrate Files module offers another plugin named `image_import`. That one allows you to set all the subfields as part of the plugin configuration. An example of its use will be shown in the next article. This example uses the `file_import` plugin to emphasize the configuration of the image subfields.

Which subfields are available?

Some fields have many subfields. Address fields, for example, have 13 subfields. How can you know which ones are available? The answer is found in the class that provides the field type. Once you find the class, look for the `schema` method. The subfields are contained in the `columns` array of the value returned by the `schema` method. Let’s see some examples:

  • The `Text (plain)` field is provided by the StringItem class.
  • The `Number (integer)` field is provided by the IntegerItem class.
  • The `Text (formatted, long, with summary)` is provided by the TextWithSummaryItem class.
  • The `Image` field is provided by the ImageItem class.

The `schema` method defines the database columns used by the field to store its data. When migrating into subfields, you are actually migrating into those particular database columns. Any restriction set by the database schema needs to be respected. That is why you do not use units when migrating width and height for images. The database only expects an integer number representing the corresponding values in pixels. Because of object-oriented practices, sometimes you need to look at the parent class to know all the subfields that are available.

Another option is to connect to the database and check the table structures. For example, the `Image` field stores its data in the `node__field_image` table. Among others, this table has five columns named after the field’s machine name and the subfield:

  • field_image_target_id
  • field_image_alt
  • field_image_title
  • field_image_width
  • field_image_height

Looking at the source code or the database schema is arguably not straightforward. This information is included for reference to those who want to explore the Migrate API in more detail. You can look for migrations examples to see what subfields are available. I might even provide a list in a future blog post. ;-)

Tip: You can use Drupal Console for code introspection and analysis of database table structure. Also, many plugins are defined by classes that end with the string `Item`. You can use your IDEs search feature to find the class using the name of the field as hint.

Default subfields

Every Drupal field has at least one subfield. For example, `Text (plain)` and `Number (integer)` defines only the `value` subfield. The following code snippets are equivalent:

process:
  field_string/value: source_value_string
  field_integer/value: source_value_integer
process:
  field_string: source_value_string
  field_integer: source_value_integer

In examples from previous days, no subfield has been manually set, but Drupal knows what to do. As we have mentioned, the Migrate API offers syntactic sugar to write shorter migration definition files. This is another example. You can safely skip the default subfield and manually set the others as needed. For `File` and `Image` fields, the default subfield is `target_id`. How does the Migrate API know what subfield is the default? You need to check the code again.

The default subfield is determined by the return value of `mainPropertyName` method of the class providing the field type. Again, object oriented practices might require looking at the parent classes to find this method. In the case of the `Image` field, it is provided by ImageItem which extends FileItem which extends EntityReferenceItem. It is the latter that contains the `mainPropertyName` returning the string `target_id`.

What did you learn in today’s blog post? Were you aware of the concept of subfields? Did you ever wonder what are the possible destination targets (subfields) for each field type? Did you know that the Migrate API finds the default subfield for you? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 03 2019
Aug 03

In the previous entry, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination or to meet business requirements. Today we will learn more about process plugins and how they work as part of the Drupal migration pipeline.

Syntax for process plugin definition and chaining

Syntactic sugar

The Migrate API offers a lot of syntactic sugar to make it easier to write migration definition files. Field mappings in the process section are an example of this. Each of them requires a process plugin to be defined. If none is manually set, then the get plugin is assumed. The following two code snippets are equivalent in functionality.

process:
  title: creative_title

process:
  title:
    plugin: get
    source: creative_title

The get process plugin simply copies a value from the source to the destination without making any changes. Because this is a common operation, get is considered the default. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:

process:
  destination_field:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
    config_3: value_3

The process plugin is configured within an extra level of indentation under the destination field. The plugin key is required and determines which plugin to use. Then, a list of configuration options follows. Refer to the documentation of each plugin to know what options are available. Some configuration options will be required while others will be optional. For example, the concat plugin requires a source, but the delimiter is optional. An example of its use appears later in this entry.

Providing default values

Sometimes, the destination requires a property or field to be set, but that information is not present in the source. Imagine you are migrating nodes. As we have mentioned, it is recommended to write one migration file per content type. If you know in advance that for a particular migration you will always create nodes of type Basic page, then it would be redundant to have a column in the source with the same value for every row. The data might not be needed. Or it might not exist. In any case, the default_value plugin can be used to provide a value when the data is not available in the source.

source: ...
process:
  type:
    plugin: default_value
    default_value: page
destination:
  plugin: 'entity:node'

The above example sets the type property for all nodes in this migration to page, which is the machine name of the Basic page content type. Do not confuse the name of the plugin with the name of its configuration property as they happen to be the same: default_value. Also note that because a (content) type is manually set in the process section, the default_bundle key in the destination section is no longer required. You can see the latter being used in the example of writing your Drupal migration blog post.

Concatenating values

Consider the following migration request: you have a source listing people with first and last name in separate columns. Both are capitalized. The two values need to be put together (concatenated) and used as the title of nodes of type Basic page. The character casing needs to be changed so that only the first letter of each word is capitalized. If there is a need to display them in all caps, CSS can be used for presentation. For example: FELIX DELATTRE would be transformed to Felix Delattre.

Tip: Question business requirements when they might produce undesired results. For instance, if you were to implement this feature as requested DAMIEN MCKENNA would be transformed to Damien Mckenna. That is not the correct capitalization for the last name McKenna. If automatic transformation is not possible or feasible for all variations of the source data, take notes and perform manual updates after the initial migration. Evaluate as many use cases as possible and bring them to the client’s attention.

To implement this feature, let’s create a new module ud_migrations_process_intro, create a migrations folder, and write a migration definition file called udm_process_intro.yml inside it. Follow the instructions in this entry to find the proper location and folder structure or download the sample module from https://github.com/dinarcon/ud_migrations It is the one named UD Process Plugins Introduction and machine name udm_process_intro. For this example, we assume a Drupal installation using the standard installation profile which comes with the Basic Page content type. Let’s see how to handle the concatenation of first an last name.

id: udm_process_intro
label: 'UD Process Plugins Introduction'
source:
  plugin: embedded_data
  data_rows:
    -
      unique_id: 1
      first_name: 'FELIX'
      last_name: 'DELATTRE'
    -
      unique_id: 2
      first_name: 'BENJAMIN'
      last_name: 'MELANÇON'
    -
      unique_id: 3
      first_name: 'STEFAN'
      last_name: 'FREUDENBERG'
  ids:
    unique_id:
      type: integer
process:
  type:
    plugin: default_value
    default_value: page
  title:
    plugin: concat
    source:
      - first_name
      - last_name
    delimiter: ' '
destination:
  plugin: 'entity:node'

The concat plugin can be used to glue together an arbitrary number of strings. Its source property contains an array of all the values that you want put together. The delimiter is an optional parameter that defines a string to add between the elements as they are concatenated. If not set, there will be no separation between the elements in the concatenated result. This plugin has an important limitation. You cannot use strings literals as part of what you want to concatenate. For example, joining the string Hello with the value of the first_name column. All the values to concatenate need to be columns in the source or fields already available in the process pipeline. We will talk about the latter in a future blog post.

To execute the above migration, you need to enable the ud_migrations_process_intro module. Assuming you have Migrate Run installed, open a terminal, switch directories to your Drupal docroot, and execute the following command: drush migrate:import udm_process_intro Refer to this entry if the migration fails. If it works, you will see three basic pages whose title contains the names of some of my Drupal mentors. #DrupalThanks

Chaining process plugins

Good progress so far, but the feature has not been fully implemented. You still need to change the capitalization so that only the first letter of each word in the resulting title is uppercase. Thankfully, the Migrate API allows chaining of process plugins. This works similarly to unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned to the destination field. Let’s see this in action:

id: udm_process_intro
label: 'UD Process Plugins Introduction'
source: ...
process:
  type: ...
  title:
    -
      plugin: concat
      source:
        - first_name
        - last_name
      delimiter: ' '
    -
      plugin: callback
      callable: mb_strtolower
    -
      plugin: callback
      callable: ucwords
destination: ...

The callback process plugin pass a value to a PHP function and returns its result. The function to call is specified in the callable configuration option. Note that this plugin expects a source option containing a column from the source or value of the process pipeline. That value is sent as the first argument to the function. Because we are using the callback plugin as part of a chain, the source is assumed to be the last output of the previous plugin. Hence, there is no need to define a source. So, we concatenate the columns, make them all lowercase, and then capitalize each word.

Relying on direct PHP function calls should be a last resort. Better alternatives include writing your own process plugins which encapsulates your business logic separate of the migration definition. The callback plugin comes with its own limitation. For example, you cannot pass extra parameters to the callable function. It will receive the specified value as its first argument and nothing else. In the above example, we could combine the calls to mb_strtolower() and ucwords() into a single call to mb_convert_case($source, MB_CASE_TITLE) if passing extra parameters were allowed.

Tip: You should have a good understanding of your source and destination formats. In this example, one of the values to want to transform is MELANÇON. Because of the cedilla (ç) using strtolower() is not adequate in this case since it would leave that character uppercase (melanÇon). Multibyte string functions (mb_*) are required for proper transformation. ucwords() is not one of them and would present similar issues if the first letter of the words are special characters. Attention should be given to the character encoding of the tables in your destination database.

Technical note: mb_strtolower is a function provided by the mbstring PHP extension. It does not come enabled by default or you might not have it installed altogether. In those cases, the function would not be available when Drupal tries to call it. The following error is produced when trying to call a function that is not available: The "callable" must be a valid function or method. For Drupal and this particular function that error would never be triggered, even if the extension is missing. That is because Drupal core depends on some Symfony packages which in turn depend on the symfony/polyfill-mbstring package. The latter provides a polyfill) for mb_* functions that has been leveraged since version 8.6.x of Drupal.

What did you learn in today’s blog post? Did you know that syntactic sugar allows you to write shorter plugin definitions? Were you aware of process plugin chaining to perform multiple transformations over the same data? Had you considered character encoding on the source and destination when planning your migrations? Are you making your best effort to avoid the callback process plugin? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 02 2019
Aug 02

In the previous entry, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and running migrations. Now, let’s write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the standard installation profile which comes with the Basic Page content type.

As we progress through the series, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of today's lesson is learning the structure of a migration definition file and how to run it.

Writing the migration definition file

The migration definition file needs to live in a module. So, let’s create a custom one named ud_migrations_first and set Drupal core’s migrate module as dependencies in the *.info.yml file.

Now, let’s create a folder called migrations and inside it a file called udm_first.yml. Note that the extension is yml, not yaml. The content of the file will be:


type: module
name: UD First Migration
description: 'Example of basic Drupal migration. Learn more at https://understanddrupal.com/migrations.'
package: Understand Drupal
core: 8.x
dependencies:
  - drupal:migrate

The final folder structure will look like:


id: udm_first
label: 'UD First migration'
source:
  plugin: embedded_data
  data_rows:
    -
      unique_id: 1
      creative_title: 'The versatility of Drupal fields'
      engaging_content: 'Fields are Drupal''s atomic data storage mechanism...'
    -
      unique_id: 2
      creative_title: 'What is a view in Drupal? How do they work?'
      engaging_content: 'In Drupal, a view is a listing of information. It can a list of nodes, users, comments, taxonomy terms, files, etc...'
  ids:
    unique_id:
      type: integer
process:
  title: creative_title
  body: engaging_content
destination:
  plugin: 'entity:node'
  default_bundle: page

YAML is a key-value format with optional nesting of elements. They are very sensitive to white spaces and indentation. For example, they require at least one space character after the colon symbol (:) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files.

A quick glimpse at the file reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys that the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options in future entries.

Let’s review each key-value pair in the file. For id, it is customary to set its value to match the filename containing the migration definition, but without the .yml extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The id value should be alphanumeric characters, optionally using underscores to separate words. As for the label key, it is a human readable string used to name the migration in various interfaces.

In this example we are using the embedded_data source plugin. It allows you to define the data to migrate right inside the definition file. To configure it, you define a data_rows key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing “columns” of data to be imported.

A common use case for the embedded_data plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present introduction to Drupal workshops. To save time, I use this plugin to create nodes which are later used in the views creation explanation. Check this repository for an example of this. Note that it uses a different directory structure to define the migrations. That will be explained in future blog posts.

For the destination we are using the entity:node plugin which allows you to create nodes of any content type. The default_bundle key indicates that all nodes to be created will be of type “Basic page”, by default. It is important to note that the value of the default_bundle key is the machine name of the content type. You can find it at /admin/structure/types/manage/page In general, the Migrate API uses machine names for the values. As we explore the system, we will point out when they are used and where to find the right ones.

In the process section you map columns from the source to node properties and fields. The keys are entity property names or the field machine names. In this case, we are setting values for the title of the node and its body field. You can find the field machine names in the content type configuration page: /admin/structure/types/manage/page/fields. Values can be copied directly from the source or transformed via process plugins. This example makes a verbatim copy of the values from the source to the destination. The column names in the source are not required to match the destination property or field name. In this example they are purposely different to make them easier to identify.

You can download the example code from https://github.com/dinarcon/ud_migrations The example above is actually in a submodule in that repository. The same repository will be used for many examples throughout series. Download the whole repository into the ./modules/custom directory of the Drupal installation and enable the “UD First Migration” module.

Running the migration

Let’s use Drush to run the migrations with the commands provided by Migrate Run. Open a terminal, switch directories to Drupal’s webroot, and execute the following commands.


$ drush pm:enable -y migrate migrate_run ud_migrations_first
$ drush migrate:status
$ drush migrate:import udm_first

The first command enables the core migrate module, the runner, and the custom module holding the migration definition file. The second command shows a list of all migrations available in the system. Only one should be listed with the migration ID udm_first. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. Congratulations, you have successfully run your first Drupal migration!!!

Or maybe not? Drupal migrations can fail in many ways and sometimes the error messages are not very descriptive. In upcoming blog posts we will talk about recommended workflows and strategies for debugging migrations. For now, let’s mention a couple of things that could go wrong with this example. If after running the drush migrate:status command you do not see the udm_first migration, make sure that the ud_migrations_first module is enabled. If it is enabled, and you do not see it, rebuild the cache by running drush cache:rebuild.

If you see the migration, but you get a yaml parse error when running the migrate:import command check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the “Basic page” content type is page, not basic_page. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: drush migrate:rollback udm_first, then fix the default_bundle value, rebuild the cache, and import again.

Note: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let’s keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements for running migration projects. If you decide to use Migrate Tools make sure to uninstall Migrate Run. Both provide the same Drush commands and conflict with each other if the two are enabled.

What did you learn in today’s blog post? Did you know that Migrate Plus and Migrate Tools are not hard requirements for Drupal migrations projects? Did you know you can place your YAML files in a migrations directory? What advice would you give to someone writing their first migration? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your friends and colleagues.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether the migration series or other topics.

Aug 01 2019
Aug 01

The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store them in Drupal. It is in fact a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. Its primarily use, thought, is to create Drupal content entities: nodes, users, files, comments, etc. The API is thoroughly documented and their maintainers are very active in the #migration slack channel for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. Today we are starting a blog post series that will cover different migrate concepts so that you can apply them to your particular project.

Understanding the ETL process

Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let’s try to understand the general idea by following a real life analogy: baking bread. To make some bread you need to obtain various ingredients: wheat flour, salt, yeast, etc. (extracting). Then, you need to combine them in a process that involves mixing and baking (transforming). Finally, when the bread is ready you put it into shelves for display in the bakery (loading). In Drupal, each step is performed by a Migrate plugin:

The extract step is provided by source plugins.
The transform step is provided by process plugins.
The load step is provided by destination plugins.

As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, XML documents, JSON and SOAP feeds, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, Google Sheets, and much more.

The list of core process plugins is impressive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transform operations. Migrate Plus offers more process plugins for DOM manipulation, string replacement, transliteration, etc.

Drupal core provides destination plugins for content and configuration entities. Most of the time targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This is often used when upgrading sites from Drupal 6 or 7 to Drupal 8. Via a combination of source, process, and destination plugins it is possible to write Commerce Product Variations, Paragraphs, and more.

Technical note: The Migrate API defines another plugin type: `id_map`. They are used to map source IDs to destination IDs. This allows the system to keep track of records that have been imported and roll them back if needed.

Drupal migrations: a two step process

Performing a Drupal migration is a two step process: writing the migration definitions and executing them. Migration definitions are written in YAML format. These files contain information about the how to fetch data from the source, how to process the data, and how to store it in the destination. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read form a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use as many process plugins as needed to convert your data from the format defined in the source to the format expected in the destination.

A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the elements to process in a particular migration.Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using the Migrate Tools module which provide Drush commands and a user interface (UI) to run migrations. Note that the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic we will cover in the future. For now, we are going to stick to Drupal core’s mechanisms of defining migrations. Contributed modules like Migrate Scheduler, Migrate Manifest, and Migrate Run offer alternatives for executing migrations.

 

This blog post series is made possible thanks to these generous sponsors. Contact Understand DRupal if your organization would like to support this documentation project, whether the migration series or other topics.

Apr 11 2019
Apr 11

Test your Drupal site's functionality in a human-readable format.

Behavior-driven development is a great way to write tests for code because it uses language that real humans can understand. Once you learn about BDD and its benefits, you may want to implement it in your next project. Let's see how to implement BDD in Drupal using Behat with the Mink extension.

Cross-posted from opensource.com.

Install and configure the tools

Since it is good practice to use Composer to manage a Drupal site's dependencies, use it to install the tools for BDD tests: Behat, Mink, and the Behat Drupal Extension. The Behat Drupal Extension lists Behat and Mink among its dependencies, so you can get all of the tools by installing the Behat Drupal Extension package:

composer require drupal/drupal-extension --dev

Mink allows you to write tests in a human-readable format. For example:

Given I am registered user,
When I visit the homepage,
Then I should see a personalized news feed

Because these tests are supposed to emulate user interaction, you can assume they will be executed within a web browser. That is where Mink comes into play. There are various browser emulators, such as Goutte and Selenium, and they all behave differently and have very different APIs. Mink allows you to write a test once and execute it in different browser emulators. In layman's terms, Mink allows you to control a browser programmatically to emulate a user's action.

Now that you have the tools installed, you should have a behat command available. If you run it:

./vendor/bin/behat

you should get an error, like:

FeatureContext context class not found and can not be used

Start by initializing Behat:

./vendor/bin/behat --init

This will create two folders and one file, which we will revisit later; for now, running behat without the extra parameters should not yield an error. Instead, you should see an output similar to this:

No scenarios
No steps
0m0.00s (7.70Mb)

Now you are ready to write your first test, for example, to verify that website visitors can leave a message using the site-wide contact form.

Writing test scenarios

By default, Behat will look for files in the features folder that's created when the project is initialized. The file inside that folder should have the .feature extension. Let's tests the site-wide contact form. Create a file contact-form.feature in the features folder with the following content:

Feature: Contact form
  In order to send a message to the site administrators
  As a visitor
  I should be able to use the site-wide contact form

  Scenario: A visitor can use the site-wide contact form
    Given I am at "contact/feedback"
    When I fill in "name" with "John Doe"
    And I fill in "mail" with "[email protected]"
    And I fill in "subject" with "Hello world"
    And I fill in "message" with "Lorem Ipsum"
    And I press "Send message"
    Then I should see the text "Your message has been sent."

Behat tests are written in Gherkin, a human-readable format that follows the Context–Action–Outcome pattern. It consists of several special keywords that, when parsed, will execute commands to emulate a user's interaction with the website.

The sentences that start with the keywords Given, When, and Then indicate the Context, Action, and Outcome, respectively. They are called Steps and they should be written from the perspective of the user performing the action. Behat will read them and execute the corresponding Step Definitions. (More on this later.)

This example instructs the browser to visit a page under the "contact/feedback" link, fill in some field values, press a button, and check whether a message is present on the page to verify that the action worked. Run the test; your output should look similar to this:

1 scenario (1 undefined)
7 steps (7 undefined)
0m0.01s (8.01Mb)

 >> default suite has undefined steps. Please choose the context to generate snippets:

  [0] None
  [1] FeatureContext
 >

Type 0 at the prompt to select the None option. This verifies that Behat found the test and tried to execute it, but it is complaining about undefined steps. These are the Step Definitions, PHP code that will execute the tasks required to fulfill the step. You can check which steps definitions are available by running:

./vendor/bin/behat -dl

Currently there are no step definitions, so you shouldn't see any output. You could write your own, but for now, you can use some provided by the Mink extension and the Behat Drupal Extension. Create a behat.yml file at the same level as the Features folder—not inside it—with the following contents:

default:
  suites:
    default:
      contexts:
       - FeatureContext
        - Drupal\DrupalExtension\Context\DrupalContext
        - Drupal\DrupalExtension\Context\MinkContext
        - Drupal\DrupalExtension\Context\MessageContext
        - Drupal\DrupalExtension\Context\DrushContext
  extensions:
    Behat\MinkExtension:
      goutte: ~

Steps definitions are provided through Contexts. When you initialized Behat, it created a FeatureContext without any step definitions. In the example above, we are updating the configuration file to include this empty context along with others provided by the Drupal Behat Extension. Running ./vendor/bin/behat -dl again produces a list of 120+ steps you can use; here is a trimmed version of the output:

default | Given I am an anonymous user
default | When I visit :path
default | When I click :link
default | Then I (should )see the text :text

Now you can perform lots of actions. Run the tests again with ./vendor/bin/behat. The test should fail with an error similar to:

  Scenario: A visitor can use the site-wide contact form        # features/contact-form.feature:8
        And I am at "contact/feedback"                          # Drupal\DrupalExtension\Context\MinkContext::assertAtPath()
        When I fill in "name" with "John Doe"                   # Drupal\DrupalExtension\Context\MinkContext::fillField()
        And I fill in "mail" with "[email protected]"                # Drupal\DrupalExtension\Context\MinkContext::fillField()
        And I fill in "subject" with "Hello world"              # Drupal\DrupalExtension\Context\MinkContext::fillField()
        Form field with id|name|label|value|placeholder "subject" not found. (Behat\Mink\Exception\ElementNotFoundException)

        And I fill in "message" with "Lorem Ipsum"              # Drupal\DrupalExtension\Context\MinkContext::fillField()

        And I press "Send message"                              # Drupal\DrupalExtension\Context\MinkContext::pressButton()

        Then I should see the text "Your message has been sent." # Drupal\DrupalExtension\Context\MinkContext::assertTextVisible()

 --- Failed scenarios:

        features/contact-form.feature:8 1 scenario (1 failed) 7 steps (3 passed, 1 failed, 3 skipped) 0m0.10s (12.84Mb)

The output shows that the first three steps—visiting the contact page and filling in the name and subject fields—worked. But the test fails when the user tries to enter the subject, then it skips the rest of the steps. These steps require you to use the name attribute of the HTML tag that renders the form field.

When I created the test, I purposely used the proper values for the name and address fields so they would pass.  When in doubt, use your browser's developer tools to inspect the source code and find the proper values you should use. By doing this, I found I should use subject[0][value] for the subject and message[0][value] for the message. When I update my test to use those values and run it again, it should pass with flying colors and produce an output similar to:

1 scenario (1 passed)
7 steps (7 passed)
0m0.29s (12.88Mb)

Success! The test passes! In case you are wondering, I'm using the Goutte browser. It is a command line browser, and the driver to use it with Behat is installed as a dependency of the Behat Drupal Extension package.

Other things to note

As mentioned above, BDD tests should be written from the perspective of the user performing the action. Users don't think in terms of HTML name attributes. That is why writing tests using subject[0][value] and message[0][value] is both cryptic and not very user friendly. You can improve this by creating custom steps at features/bootstrap/FeatureContext.php, which was generated when Behat initialized.

Also, if you run the test several times, you will find that it starts failing. This is because Drupal, by default, imposes a limit of five submissions per hour. Each time you run the test, it's like a real user is performing the action. Once the limit is reached, you'll get an error on the Drupal interface. The test fails because the expected success message is missing.

This illustrates the importance of debugging your tests. There are some steps that can help with this, like Then print last drush output and Then I break. Better yet is using a real debugger, like Xdebug. You can also install other packages that provide more step definitions specifically for debugging purposes, like Behatch and Nuvole's extension,. For example, you can configure Behat to take a screenshot of the state of the browser when a test fails (if this capability is provided by the driver you're using).

Regarding drivers and browser emulators, Goutte doesn't support JavaScript. If a feature depends on JavaScript, you can test it by using the Selenium2Driver in combination with Geckodriver and Firefox. Every driver and browser has different features and capabilities. For example, the Goutte driver provides access to the response's HTTP status code, but the Selenium2Driver doesn't. (You can read more about drivers in Mink and Behat.) For Behat to pickup a javascript enabled driver/browser you need to annotate the scenario using the @javascript tag. Example:

Feature:
 (feature description)

  @javascript
  Scenario: An editor can select the author of a node from an autocomplete field
    (list of steps)

Another tag that is useful for Drupal sites is @api. This instructs the Behat Drupal Extension to use a driver that can perform operations specific to Drupal; for example, creating users and nodes for your tests. Although you could follow the registration process to create a user and assign roles, it is easier to simply use a step like Given I am logged in as a user with the "Authenticated user" role. For this to work, you need to specify whether you want to use the Drupal or Drush driver. Make sure to update your behat.yml file accordingly. For example, to use the Drupal driver:

default:
  extensions:
    Drupal\DrupalExtension:
      blackbox: ~
      api_driver: drupal
      drupal:
        drupal_root: ./relative/path/to/drupal

I hope this introduction to BDD testing in Drupal serves you well. If you have questions, feel free to add a comment below, send me an email at [email protected] (or through the Agaric contact form) or a tweet at @dinarcon.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web