Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

A slick migration trick - convert columns to multi-value field with subfields

Parent Feed: 

In the previous post on A custom quantity price discount for Drupal Commerce we created a compound field for price breaks, which was composed by two subfields. One for a quantity threshold, the other for the price at that threshold.

That post covered everything needed to get quantity discounts working within Drupal Commerce, but for this particular project, we also had to find a way to populate these price breaks through the integration with their Sage accounting system.

Source Data

We're starting with their existing CSV exports, and one of them is a file that provides up to 5 pricebreaks for each project. These files get copied up to the webserver each hour, and then the migration into Drupal will get triggered.

Here's what the data looks like in the source CSV file:

Pricebreak CSV Data

... This file has other pricing data, for customer-specific pricing, so our migration skips those rows. The key thing is that there are 5 pairs of columns for the threshold and pricing, called "BreakQuery1" - "BreakQuery5" and "DiscountMarkup1" - "DiscountMarkup5". They use a very large number (99999999) for the largest tier.

So our migration needs to match these up into pairs, and add each pair as a delta in the price break field for that product variation.

Migration Configuration

Setting up migrations is covered elsewhere, as is importing/exporting configurations. For a project like this, we simply copy an existing migration configuration to a new file, edit it with a text editor, and import it into the site's configuration. For this migration, that was all we needed -- there is no supporting code whatsoever, the entire migration is handled with just a YAML file.

We're going to name this migration after the source file, im_pricecode, so the basic setup goes like this:

  1. Export the site configuration.
  2. Copy another CSV migration file to a new "migrate_plus.migration.im_pricecode.yml" file.
  3. Inside the file, delete the uuid line, and set the id to "im_pricecode". When the configuration is imported, Drupal will assign a new UUID, which will get added when you next export the configuration, and the id needs to match the filename.
  4. Add migration tags/group as appropriate.

That's the basic setup. There are 3 crucial parts to a migration configuration: source, process, and destination. All of the heavy lifting here is in the process block, so we'll tackle that last.

Here's our source section:

source:
  plugin: csv
  path: /path/to/IM_PriceCode.csv
  delimiter: ','
  enclosure: '"'
  header_row_count: 1
  keys:
    - ItemCode
    - PriceCodeRecord
  track_changes: true

This section needs to get set to the data source. (We already have plans to swap this for a webservice call down the road). In this case, we're using the "csv" source plugin provided by the Migrate Source CSV module. Of particular note here:

  • header_row_count here indicates that the first line of the file contains column headers, which will be automatically available on the source $row object in the migration, for processing.
  • keys need to specify a field or set of fields that uniquely identify the row in the CSV file. In our case, the file also contains customer-specific pricing data -- this makes the ItemCode appear multiple times in the file, so we need to add the second column to get a unique key.
  • track_changes stores a hash of the source row in the migration table. On future migration runs, the source row is checked against this hash to determine if there's a change -- if there is, the row will get updated, if not, it will be skipped.

Next, the Destination:

destination:
  plugin: 'entity:commerce_product_variation'
  destination_module: commerce_product
  default_bundle: default
migration_dependencies:
  required:
    - ci_item_variation

The destination is pretty self-explanatory. However, there's one crucial thing here: we are running this migration on top of entities that have already been created by another migration, "ci_item_variation". We need to declare that this migration runs after ci_item_variation, by using migration_dependencies.

Finally, we get to our "process" block. Let's take this a field at a time:

process:
  variation_id:
    -
      plugin: migration_lookup
      migration: ci_item_variation
      source: ItemCode
      no_stub: true
    -
      plugin: skip_on_empty
      method: row

variation_id is the primary key for commerce_product_variation entities. We are running this through two plugins: migration_lookup (formerly known as just migration) and skip_on_empty.

The migration_lookup loads up the existing product_variation, by looking up the variation_id on the previous ci_item_variation migration. We use "no_stub" to make sure we don't create any new variations from possible bad data or irrelevant rows. The second plugin, skip_on_empty, is another safety check to be sure we're not migrating bad data in.

pricing_method:
  -
    plugin: static_map
    source: PricingMethod
    map:
      O: O
    default_value: ''
  -
    plugin: skip_on_empty
    method: row

PricingMethod is what indicates the type of pricing each row in the source file contains. We don't need to migrate this field, but we do want to use it as a filter -- we only want to migrate rows that have this set to "O". So we chain together two more plugins to achieve this filtering, and we assign the result to a dummy field that won't ever get used.

The first static_map plugin maps the "O" rows to "O", and everything else gets set to an empty string.

The next plugin, skip_on_empty, simply skips rows that have the empty string set -- which is now everything other than rows with an "O" in this column.

Now we get to the fun stuff:

break1:
  plugin: get
  source:
    - BreakQuantity1
    - DiscountMarkup1
break2:
  plugin: get
  source:
    - BreakQuantity2
    - DiscountMarkup2
break3:
  plugin: get
  source:
    - BreakQuantity3
    - DiscountMarkup3
break4:
  plugin: get
  source:
    - BreakQuantity4
    - DiscountMarkup4
break5:
  plugin: get
  source:
    - BreakQuantity5
    - DiscountMarkup5

The whole trick to getting the 10 columns collapsed into up to 5 instances of our quantity price break field, each with two sub-fields, is to rearrange the data into a format that matches the field. This involves creating two layers of dummy fields. The first layer are these 5 dummy fields, break1 - break5, which group the BreakQuantityN and DiscountMarkupN fields together. The resut are these 5 dummy fields that each have a BreakQuantity and a DiscountMarkup.

One crucial point here is the order -- these are now set as an array with two values. If we take a look at the source data, the row for ItemCode 23311 will have an array that looks like:

$row->break1 = [
  0 => "9",
  1 => "5.1",
];
$row->break2 = [
  0 => "99999999",
  1 => "4.85",
];
$row->break3 = [
  0 => "0",
  1 => "0",
];
...

Remember that each of these is an indexed array, with 0 and 1 as keys.

Now we create our second layer -- a dummy field that combines the other 5 dummy fields:

breaks:
  plugin: get
  source:
    - '@break1'
    - '@break2'
    - '@break3'
    - '@break4'
    - '@break5'

Prefixing a fieldname with "@" means use the row from the destination (the one we're building up) and not the original source row -- this lets us access the dummy fields we just created. The YAML parser requires them to be quoted.

So now we have a value built up that pretty much matches the entity field structure:

$row->breaks = [
  0 => [
    0 => "9",
    1 => "5.1",
  ],
  1 => [
    0 => "99999999",
    1 => "4.85",
  ],
  2 => [
    0 => "0",
    1 => "0",
  ],
  ...
];

Now we assign this to our real destination field:

field_qty_price_breaks:
  plugin: iterator
  source: '@breaks'
  process:
    threshold:
      plugin: skip_on_empty
      method: process
      source: '0'
    price: '1'

For the actual field values, we need to map the values to the actual subfields -- and we want to skip the empty pricebreak columns. The iterator plugin lets us loop through the breaks, and then assign the subfields with further processing.

The threshold sub-field is 0 if this break is unused, so we can use "skip_on_empty" to skip it -- this time we use the "process" method to only skip this field, and not the entire row. Because the threshold is the first value in the indexed array for each pricebreak, this is the "0" source (and needs to be quoted to be interpreted as an index).

The price sub-field is simply the second value in each of the pricebreak fields, with index "1".

Migrate is cool!

We love working with migrate module in Drupal 8 because of how powerful it is, and how easy to customize. We routinely create process plugins to quickly do some custom transformations, or create source plugins to connect to custom data sources, and I thought I would have to do one of those to solve this problem -- but it turns out some YAML and dummy fields worked with no additional code, pretty much the first time!

Hat tip to MTech for the basic technique! If you have an improvement or suggestion for other cool migrate tricks, leave a comment below! Or if you have a data integration you need done, reach out...

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web