Feeds

Author

Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
Apr 07 2020
Apr 07

Views module has long been the killer feature of Drupal, making it easy for a site builder or skilled administrator to essentially create complex SQL queries through a web interface, without knowing SQL. All kinds of things are possible through views - relationships, filters, sorting, access control, aggregation, argument handling, and more.

What makes it even more powerful are views handlers. If there is some corner case you need to address, it's not that hard for a skilled developer to create a custom views handler to insert the SQL you need into the query, while exposing a nice configuration form for the site builder. There are tutorials and instructions all over the place to guide you in creating various kinds of field handlers, and you can also alter the query views generates through some "hook" code.

Yet certain problems can be really hard to solve with views, and recently I came across two that were very similar:

  1. A company holds multi-day seminars, and stores each day as a separate date range entry in a multi-value field, but wanted to list upcoming seminars with a single date range formatted with no duplication -- e.g. instead of "April 20, 2020 - April 20, 2020, April 21, 2020 - April 21, 2020, April 22, 2020 - April 22, 2020" they wanted it to just say "April 20 - 22, 2020". And not have the same class appear 3 times.
  2. For each event, they wanted a report showing how many people have registered to attend each upcoming seminar.

These seem like simple, obvious requests -- and yet they are nearly impossible to implement in views without some serious behind the scenes coding.

Deciding upon an approach

How do you solve a problem like this? There are many different ways to go about this. As an experienced Drupal developer, I came up with 4 different ways, and I'm sure there's others:

  1. Create an aggregate view
  2. Create a custom views field handler
  3. Alter the views query
  4. Use a views subquery join handler and create an aggregate field.

Here's a brief run-down of these, with the one I chose and why.

Create an aggregate view

This is the obvious answer for creating reports: Check the box under "Advanced" to make the entire view an aggregate view. This changes each field to either be aggregated by some SQL aggregation function, or be used as a unique row.

This approach is ok for summary reports, but it feels a bit like a sledgehammer, especially for the date range field -- it's a lot of extra work to set up an aggregate view, and it's extremely tricky to get right, especially when you go to filter out data to match exactly what you want. And, I have the tingly sense that I've tried this before and failed -- there are certain things you cannot do when you aggregate the entire view -- I can't necessarily put my finger on just what doesn't work in these cases, but I know it's not going to work in the end.

Create a custom views field handler

Drupal Console makes it really easy to generate a plugin for Drupal 8, and views handlers are all plugins. It's actually really easy to make a custom views handler, especially if you can base it on an existing handler and just override whatever it is you need to make custom.

This seemed at first to be the way to go. For years, taxonomy fields were multi-valued -- e.g. multiple terms can be associated with a single node -- and the taxonomy field handlers have options to select whether you want an entirely new row for each term, or to collapse all the terms into a single field, perhaps separated by a comma.

You would think this would work well for the date field, and it can certainly give us a count of attendees to an event with a little magic sql under the hood.

I whipped up a field handler for the date, grabbing the earliest start date, the latest end date, and doing some ugly but workable code to show what the client wanted. And hit the problem with this approach: We didn't need "just" a field, but also a filter and a sort.

We only wanted to show future seminars, and we wanted them sorted by date. But as soon as I added a date filter or sort, because there were multiple values, each seminar ended up with multiple rows, one for each of the date range values. Crap.

I could make a filter handler. And a sort handler. But to get them to all use the same query started to feel -- dirty. Lots of repeating code, lots of altering queries or checking on whether the other handlers were already instantiated, lots of stuff that just felt wrong.

Especially when I was in the views_data structures trying to hook them up -- it seemed entirely too hard.

Alter the views query

I probably spent upwards of 12 hours scouring the Internet, and the Drupal codebase for help or example code that might get me there. This seemed to be a problem that people just rolled their sleeves up and hacked their way through -- usually using a views_query_alter.

No doubt, this is a fast way to get the job done -- just hook into the generated query, and alter it as you see fit.

The biggest problem is, years down the road when you need to change something, you'll look at the view and say WTF? How does this thing even work? You end up with a views UI that just plain lies to you -- it does not make it clear what is really happening.

I'll use a views_query_alter in a pinch, to get the job done -- but if you're reaching for this, it's usually because something wasn't done right.

Use a subquery join

Scouring the codebase, I found a views join handler called "subquery". That's exactly what I want to do -- create a small aggregate subquery, and join the main query to that. This solves both my problems -- I can create essentially a few aggregate fields while keeping the main sql query non-aggregated.

The problem is, I don't think this handler even works! The details are here: Issue #3125146 on Drupal.org. I could not find a single place in the Drupal code base, or on the Internet, where this plugin was used. And it does not make sense to me how the SQL it generates is useful.

But the concept is extremely useful -- having a subquery join makes a lot of sense. The challenge was, how to create and use it.

Implementing an aggregate "dummy" views field

So how to do it? The plugin/handler itself was the easy part. What was extremely hard was to figure out how to hook it up.

In Drupal 8, the views_data structure basically is a registry of all the views handlers, mapped to the data structures they can handle. It is populated through the use of old-style Drupal hooks -- to add items to the views_data structure, you implement a "hook_views_data()" function in your module, and to change existing items, you implement a "hook_views_data_alter()" function. While I found snippets here and there on what to add here, it wasn't until I found a series of posts by Oleksandr Trotsenko about Drupal Views for Developers that I figured out how to hook this all up.

Go read that series first to learn how to create handlers and the views data structure. But there's some huge missing pieces about this data structure that I hope to illuminate here. Because that's where the true power lies...

The subquery join handler

First, the join handler that should be in core:

database = $container->get('database');
    return $instance;
  }

  /**
   * Builds the SQL for the join this object represents.
   *
   * @param \Drupal\Core\Database\Query\SelectInterface $select_query
   *   The select query object.
   * @param string $table
   *   The base table to join.
   * @param \Drupal\views\Plugin\views\query\QueryPluginBase $view_query
   *   The source views query.
   */
  public function buildJoin($select_query, $table, $view_query) {
    $alias = $this->configuration['subquery_alias'];
    $subquery = $this->database->select($this->configuration['subquery_table'], $alias);

    if (!empty($this->configuration['subquery_fields'])) {
      foreach ($this->configuration['subquery_fields'] as $field_alias=>$field) {
        $subquery->addField($alias, $field, $field_alias);
      }
    }
    if (!empty($this->configuration['subquery_expressions'])) {
      foreach ($this->configuration['subquery_expressions'] as $field_alias=>$expression) {
        $subquery->addExpression($expression, $field_alias);
      }
    }
    if (!empty($this->configuration['subquery_groupby'])) {
      $subquery->groupBy($this->configuration['subquery_groupby']);
    }
    if (!empty($this->configuration['subquery_where'])) {
      foreach ($this->configuration['subquery_where'] as $condition) {
        $subquery->where($condition);
      }
    }

    $right_table = $subquery;

    $left_table = $view_query->getTableInfo($this->leftTable);
    $left_field = "$left_table[alias].$this->leftField";

    // Add our join condition, using a subquery on the left instead of a field.
    $condition = "$left_field = $table[alias].$this->field";
    $arguments = [];

    // Tack on the extra.
    // This is just copied verbatim from the parent class, which itself has a
    //   bug: https://www.drupal.org/node/1118100.
    if (isset($this->extra)) {
      if (is_array($this->extra)) {
        $extras = [];
        foreach ($this->extra as $info) {
          // Figure out the table name. Remember, only use aliases provided
          // if at all possible.
          $join_table = '';
          if (!array_key_exists('table', $info)) {
            $join_table = $table['alias'] . '.';
          }
          elseif (isset($info['table'])) {
            $join_table = $info['table'] . '.';
          }

          $placeholder = ':views_join_condition_' . $select_query->nextPlaceholder();

          if (is_array($info['value'])) {
            $operator = !empty($info['operator']) ? $info['operator'] : 'IN';
            // Transform from IN() notation to = notation if just one value.
            if (count($info['value']) == 1) {
              $info['value'] = array_shift($info['value']);
              $operator = $operator == 'NOT IN' ? '!=' : '=';
            }
          }
          else {
            $operator = !empty($info['operator']) ? $info['operator'] : '=';
          }

          $extras[] = "$join_table$info[field] $operator $placeholder";
          $arguments[$placeholder] = $info['value'];
        }

        if ($extras) {
          if (count($extras) == 1) {
            $condition .= ' AND ' . array_shift($extras);
          }
          else {
            $condition .= ' AND (' . implode(' ' . $this->extraOperator . ' ', $extras) . ')';
          }
        }
      }
      elseif ($this->extra && is_string($this->extra)) {
        $condition .= " AND ($this->extra)";
      }
    }
    $select_query->addJoin($this->type, $right_table, $table['alias'], $condition, $arguments);
  }
}

This is basically what the subquery join handler in core should look like.

The "standard" join handler does have a "table formula" configuration that looks like is meant to provide the functionality we're trying to add here. That works, if you create a join handler programmatically and assign a DB Select query object to the "table formula" configuration -- but that's not possible to do from a views_data structure, because query objects aren't something you can serialize into configuration. And at the very bottom of the buildJoin method, the $select_query->addJoin method's second parameter is treated as a table name if it's a string -- and only added as a subquery if it's a Select object.

So the only way to join a subquery is to build it inside this buildJoin() method, which means we need to pass everything needed to create that subquery in as configuration strings.

Views Data Structure

Here's the meat of the entire thing: Views Data. With a join handler like above, you can create all kinds of aggregate fields, without having to create a handler -- you can just let the subquery handle the aggregation and then use existing handlers for the field, filter, and sort.

This ends up being much more elegant, and less confusing, than any of the other alternatives. However this structure is really lacking documentation, so that's what I hope to illuminate here. Read the comments inline...

/**
 * Implements hook_views_data().
 */
function mymodule_views_data() {
  // The top level key is usually the entity type, or a database table, but can actually be anything.
  // Inside that top level key, you need to specify a literal 'table' element, and any number of "fields"
    $data['custom_registrants']['table'] = [
      // The crucial thing to add is the 'join' key
      'join' => [
        // Under the join, the top level key is the "base table" -- e.g. what table is the left side of our join.
        // THIS IS IMPORTANT! Make sure this is the base table for your entity type -- if this base table is in
        // the view, then any fields defined as siblings to the 'table' key will be available in the view.
        'commerce_product_field_data' => [
          // IMPORTANT! 'join_id' must match the Plugin ID defined by the join plugin -- e.g. in the @ViewsJoin annotation.
          'join_id' => 'rng_subquery',
          // The following are configurations available inside the handler.
          // subquery_alias is used inside the subquery for the main table.
          'subquery_alias' => 'mymodule_summary',
          'subquery_table' => 'registrant',
          // the key is the alias for the field, the value is the database column name.
          'subquery_fields' => [
            'target_id' => 'event__target_id',
          ],
          // Expressions can use aggregate functions. The key is the alias for the expression.
          'subquery_expressions' => [
            'num_registrants' => 'count(id)',
          ],
          'subquery_where' => [
            "event__target_type = 'commerce_product'",
          ],
          'subquery_groupby' => 'target_id',
          // This is the id field of the base table (the key for the parent array).
          'left_field' => 'product_id',
          // This field should be an alias of a field inside the subquery.
          'field' => 'target_id',
          // This is the right side of the join -- will be the alias for the entire subquery, and must match
          // the root key of this array structure.
          'table' => 'custom_registrants',
        ],
      ],
    ];
    // This is a sibling of the "table" -- num_registrants is considered a "dummy" field on the "custom_registrants"
    // table -- which itself is the alias of the subquery.
    $data['custom_registrants']['num_registrants'] = [
      'title' => t('Attendee Count'),
      'help' => t('Count of attendees registered for an event'),
      'group' => t('Product'),
      'field' => [
        'title' => t('Attendee count'),
        'help' => t('Count of current attendees'),
        // What type of entity is this field available on
        'entity_type' => 'commerce_product',
        // This is one of the field aliases inside the subquery
        'field_name' => 'num_registrants',
        // This is the field handler plugin to use
        'id' => 'standard',
      ],
      'filter' => [
        'id' => 'numeric',
        'title' => t('Num Registrants'),
        'help' => t('Filter based on the count of attendees'),
        'entity_type' => 'commerce_product',
        'field_name' => 'num_registrants',
      ],
      'sort' => [
        'id' => 'standard',
        'title' => t('Num Registrants'),
        'help' => t('Sort based on count of attendees'),
        'real field' => 'num_registrants',
      ],
    ];

    return $data;
}

Now, that may seem like a lot of boilerplate. But the solution is elegant -- 6 lines for an array that makes it so this field can be added to the available columns to sort is all it takes! Now on a report that uses it, you can click the column header to sort by the most attended course to the least attended (or vice versa) without needing a custom handler. And the filter was just as easy!

Apr 07 2020
Apr 07

As we onboard a slew of new clients due to our joining forces with FuseIQ, I wanted to take a moment to explain our stance on maintenance, particularly around applying non-security updates for Drupal and WordPress.

Many people have a tentative approach to applying updates. "If it ain't broke, don't fix it!" is a saying we've all heard for generations, and sometimes it's hard to see changes as anything more than a risk you take that might potentially break things. But that's almost like saying "If I can't see it, it can't hurt me" -- in times of pandemic, does anyone really believe that?

In security circles, there's a saying, "all bugs are security issues." The point being, anything the software does incorrectly bears some level of cost or risk to some set of users. Most people understand by now that if you don't fix a critical security issue, on the Internet you're likely to get found and hacked. But really, what is the cost of not applying a minor, non-security related issue?

Technical Debt

The concept of "technical debt" is that it is the sum total of all the stuff currently broken or outdated in your systems. It's usually related to what your ideal system might be, if you could have that today. If you do a great job building a site that does everything you need it to do, it may have no technical debt on the day of launch (quite unlikely, but at least possible), but this does not last. Why not?

  • Updates to the CMS software
  • Updates to plugins or modules
  • Updates to the software language
  • Updates to the underlying server packages
  • Updates to the underlying operating system
  • Updates to web browsers
  • New development toolkits that do more
  • New devices that are substantially different than those that existed when you launched
  • New or changing business requirements
  • New partnerships
  • New customer preferences
  • New standards for search
  • New ways to reach customers
  • A pandemic changes your whole way of doing business
  • Your entire business model has to change to keep up

... something in that list is likely changing every day. Hopefully mostly in small ways, but as we've all seen, sometimes in drastic ways. Regardless, if you are not keeping your site up to date, you are accruing "technical debt" that will need to get paid sooner or later, or else your site becomes less effective.

And that is the true cost of doing nothing -- the opportunity cost of losing customers you might not have lost if you fully supported the newest device, or were able to communicate effectively to your customers how you can still deliver value to them when their entire world has changed.

The cost of "Security updates only"

Some sites are more complex than others, and some are more brittle than others, more likely to break in unexpected ways if anything changes. This is why many people suggest only applying security updates, and ignoring updates not marked as security-related.

But there are some serious downsides to this approach:

  1. Minor, incremental updates individually are far less risky than major updates -- going from one version to the next is far less likely than skipping 8 versions to apply a critical security update. Which means if there is a critical security update, the site is much more likely to break when you apply it, compared to if you had applied all the interim updates along the way.
  2. Environment changes sometimes get forced on you by hosting companies -- and many non-security updates fix issues with new versions. Not applying "regular" updates means more things broken on your site if this happens, and often more developer time to fix.
  3. It's far easier to automate updating everything, compared to just updating one security fix at a time.
  4. Most non-security releases fix bugs, or add new features, which might benefit your users -- making your site operate better or give you new abilities that help you stay relevant.
  5. If you're not fully up-to-date, and it takes longer to apply and test a security update, your site might be vulnerable to attack for a longer window. We've seen vulnerabilities where the time it took to get exploited was less than 2 days after a vulnerability was disclosed -- if it takes a week to update your site, you might have to pay the consequences of having a hacked site.

So in short, if your site is not kept fully up to date, it accrues a lot more technical debt. It becomes more expensive every time you apply an update, it carries more risk, and your users don't benefit from any of the improvements that might come with other regular updates.

"But," you might ask, "isn't constant updating going to take a lot more of my time, cause more frequent breakages, and cost me more?"

How to keep Technical Debt under control

You're always going to have some level of technical debt. Your website is never going to be perfect. But it can be plenty good enough to be a huge value to your business or organization -- you just need to care for it in similar ways as you would care for any other property.

If you think about a physical store, it's pretty clear there are regular maintenance needs. Once the store is built, the fixtures installed, inventory purchased and the shelves stocked, you still have constant things that need to happen:

  • Daily janitorial service
  • Fix any broken windows
  • Manage and check security systems
  • Deep-clean carpets occasionally
  • Fix holes in the roof
  • Redesign the storefront
  • Train the staff
  • Come up with new merchandising fixtures to highlight specials

... the point is, successful retailers are constantly doing stuff to their stores, and constantly working with staff to improve sales. Your website, whether or not you do e-commerce, is exactly the same in this regard.

What would happen to your store if you did not have janitorial services? If you left holes in the windows, or the roof? If you did not do a fresh paint coat now and then, or change up the storefront? If you did not train your staff?

The point is, if your website is valuable to your business, you should stay on top of maintenance, and be constantly experimenting to see what works and how to improve sales. If you're not doing this, you will be falling behind your competitors who do.

How Freelock manages updates

As the store owner, you should not be doing janitorial work. You can outsource that easily. (Sure you can joke about being the head janitor, and pick up a broom now and then if you'd like, but it's not the job that only you can do).

Freelock can do all the maintenance work for you, for far less cost than you doing it yourself, or even having a staff member do it. We have automated a large part of the process -- particularly automatically running two kinds of tests with every release, backing up sites before and after every release, and checking every site every night for changes.

It's far easier, and less costly, to apply all updates any time we touch a site, than to limit updates to just security releases. And with tests in place to catch things that break, we can do this with high confidence that the update does not cause major issues.

It is fairly common for an update to cause a minor issue, however. And this is exactly where technical debt comes back in -- it's far easier to pay that upgrade cost one small issue at a time, as we go -- instead of ending up having to fix a dozen small issues that combine into one big showstopping issue, all at once, under pressure due to a known security risk. And by "cost", in this case it's the cost of demanding your attention and ours when we're both swamped with other demands.

So... When we take over a site, there is a higher-than-usual cost to get you set up, brought completely up-to-date, and create the tests to cover your particular critical site needs. Once all of that setup is done, our maintenance cost tends to be lower than many other firms, thanks to our automation as well as our hands-on approach to providing fixes -- once we've hit an issue on one site, and resolved it, we usually can apply that fix immediately to any other site we manage that has the same issue.

Our "protection plan" is the base maintenance we provide, for either Drupal or WordPress. With these plans, we generally apply all updates to all our managed sites, on a monthly basis. We monitor security lists, and if there's a critical security vulnerability we judge to affect your site, we apply that within 1 business day, and security vulnerabilities we deem not a risk for your site we usually apply within 1 week.

Feel free to reach out, or comment below, if you have any questions or feedback!

Cheers,

John

Feb 27 2020
Feb 27

Seems like every day this month I've answered the same question: Why should I use Drupal instead of WordPress? And this is the answer I've come up with. They are entirely different applications, about as different as Microsoft Word is from Microsoft Excel.

WordPress is first and foremost a blogging tool, and it has become widely adopted by designers who are trying to make pretty-looking sites. There's no shortage of beautiful WordPress sites, and if what you need is a relatively simple marketing site with rich content, it's a great tool.

Drupal, on the other hand, is more like a general purpose database tool that can be skinned to look great, but that's not its strength. Much like Excel, it's great for managing lots of items that have the same characteristics. And it's built to work extremely well for this purpose -- managing scads of information.

If you're making a flyer for a yard sale, you wouldn't necessarily pick Excel -- Word is quick and easy to make something like that, and you can pull in all sorts of designs, paste it all in and make something that looks pretty good. You can make a table to show the differences between a couple products, but if you have hundreds of products you start to run into problems.

You can create a bunch of comparisons in Word. There's a full blown macro system built in -- you can certainly do most of what you can do in Excel using Word -- but that doesn't mean it's a good idea!

Wait a minute. WordPress is a CMS, and has a database!

Yes, WordPress calls itself a "content management system." And it has the basic architecture necessary for a publishing workflow. There's an API for creating different post types, adding custom fields, and all the same stuff that Drupal can do.

But... There is no user interface for creating these fields in WordPress core -- you either need to get one of many different (usually proprietary) add-on plugins, or have a developer write code to set up these post types.

In contrast, creating new content types, adding fields, and adding relationships between content types is all in Drupal core, and is the fundamental starting point for building a new Drupal site.

In WordPress, your first decision is typically to pick a theme, and a lot of themes come with a bunch of extra functionality that help you actually build the site. In Drupal, your first decision is how to organize your content -- the theme can be tacked on towards the end of the process, and easily changed at any point.

But everybody says to use WordPress! Are they wrong?

It's not wrong to use Word -- it's a matter of choosing the right tool for the job. Are you creating a brochure? That's way easier to do in Word than Excel. Are you trying to keep track of who has registered for an event? You can do it in Word, but if you do it in Excel you can add filters and calculated columns to keep track of which registrations have been paid, and which you need to collect payment at the door.

You might well be able to find a plugin to help you to do this in Word -- but now suddenly it's not really Word you're using to do your registrations, it's some obscure plugin -- whereas tabulating data is the essence of what Excel does.

That's exactly the difference between WordPress and Drupal -- when you get beyond the core layout, blogging, and basic site functionality that comes with Word, you are suddenly in plugin land, and the core functionality is no longer shared by tens of millions of WordPress users -- only by the other users of that specific plugin.

And when you need a different plugin for different functionality, you need to cross your fingers that it doesn't blow up your site, have some weird conflict.

Drupal is built for this kind of integration of different kinds of things. Need to show events by date? Add a calendar. Need to show them by location? Add a map, and set up geolocation. Need to charge for them? Add commerce. Want your users to rate them? Add a voting module. Everything works together in Drupal, and builds upon everything else -- instead of being one-off islands of functionality that may or may not work with the other things you need.

Ok, I lied. Drupal isn't like Excel, it's more like Access.

A lot more people are familiar with Excel than Access, because it's useful in so many different situations. Access is more of a general purpose database -- compared to Excel, it's a bit harder to prototype things in, but way more powerful. In the 1990s and early 2000s, Access (and competitors like Filemaker Pro) were popular tools to build out entire business operation systems for many small businesses and non-profits.

This is exactly the kind of thing Drupal can do extremely well. It's even organized the same way -- it's really not hard to take an Access database and port everything in it over to Drupal -- and bring the entire application to the web, make it available to your mobile devices, tablets, remote workers, etc. The same business analysis skills even apply. For a huge number of scenarios, most of the work is that of an administrator -- you only really need to bring in a developer to make it look good, or add automation.

Drupal is not "just" a CMS -- it's a general purpose database system you can use to revolutionize your entire business operations. And it can be your website, too! This is particularly useful if you want to remove barriers to how you interact with your customers, organize your internal business processes, or automate some painstaking operational task.

If you would like to learn more about how to make your business run better using Drupal or similar technology, give us a call! We'd be happy to explore some ideas with you.

Sep 09 2019
Sep 09

When you build a new website, going live is relatively easy. You get ahold of a domain name, point it at a webhost, put the website code there, and you're up and running!

After a site is live, it gets a lot more complicated.

What's important about deployment?

If you have a simple brochure site, deploying updates doesn't have to be complicated. The more your site does, the more complex deployment becomes. A deployment plan can help you stay out of trouble, keep your site online, minimize data loss. So when going live with an update to a site, you should ask:

  • How much downtime is acceptable?
  • How much testing do we need before we make a change to the production site?
  • What data could we lose, from the production site?
  • What might go wrong with this deployment strategy?
  • How can we recover if something does go wrong?

A good deployment plan should make you smile with comfort, knowing you have all the bases covered. Are you smiling? If not, read on.

Common deployment strategies

Here are the main strategies we've seen or used for deployment:

  • Do all work in the production environment so there's nothing to deploy
  • Copy the entire new site into the production environment
  • Compile/build a site and put the result into the production environment
  • Dev/Stage/Production pipeline
  • Blue/Green deployments

Let's take a deeper look at each one.

No Deployment - work in production

All too often, this is what you get if you aren't careful hiring a freelancer. This really seems to be the standard approach for most WordPress sites, which to me is horrifying.

Coding is often a process of trying something, breaking things, and then fixing them. Rinse and repeat. If you're doing this on a live production website, your site visitors will see broken pages, weird issue, or sometimes nothing at all. If your site is already getting traffic, working in production is irresponsible, dangerous. Especially if you aren't extremely careful about backups, and aren't extremely proficient.

The only benefit of "no deployment" deployment strategies is that it's cheap -- you're saving the cost of managing a copy of your site, and deploying changes.

Copy site to production

This also seems to be a pretty common way of deploying sites -- simply copy the new site in its entirety to the production server and make it live.

For major upgrades, such as going from Drupal 7 to Drupal 8, or changing from one platform to an entirely different one, this is the main strategy we use. And there are definitely times when this strategy makes sense. However, for day-to-day maintenance, theme refreshes, or most typical deployments, this is not a very good approach.

If your production site has a database, and regular changes to it, you need to be extremely careful to not lose production data. For example, if your site allows user comments, or takes e-commerce orders, or manages sales leads, if you simply copy a new site up you risk losing something.

Save this one for entirely new sites. Don't do this for day to day work -- unless your site doesn't even have a database.

Build site and deploy

"Static site generators" like Gatsby and Jeckyll have become quite popular recently, because they generate static sites that do not have a database -- greatly simplifying security. If you're running a full-blown Content Management System (CMS) like Drupal or WordPress, you're putting an application with bugs on the Internet where anyone can attack it. If your site is just a collection of files, they can't really attack it -- they can attack the hosting environment but your site itself has far less "attack surface" for an attacker to go after.

Gatsby in particular is becoming quite popular as a front-end to Drupal and WordPress -- you write your content in a private CMS on a private LAN, not reachable from the Internet, export the entire site using Gatsby (the build step), and then copy the resulting code up to the web host (much like the previous deployment strategy).

If you use this approach, you still need to consider how to keep your CMS up to date, though if it's not online, updating it in place becomes a far more reasonable proposition.

Dev/Stage/Production pipeline

Now we've reached what we consider to be the "standard" deployment practice -- run 3 copies of your site:

  • Dev, or Development -- this copy is where you do the development work, or at least integrate all the various developer copies, and conduct the initial round of testing.
  • Stage, or Test -- The main purpose of this copy is to test the deployment process itself, and understand what might break when you roll out to production.
  • Production, or Live -- The site that's available to the public.

In general, code flows from dev to production, whereas content/data flows from production to dev. If your site takes orders, collects data using forms, supports ratings/reviews or comments, or does anything sophisticated, you'll probably end up with this deployment strategy.

Several of the more professional hosts, like Pantheon, Acquia, WP Engine, and others provide these 3 environments along with easy ways to deploy code up to prod, and copy data down from prod.

Many larger companies or highly technical startups have built out "continuous integration/continuous deployment" on pipelines along these lines -- including Freelock. "Continuous Integration" basically kicks off automatic tests after code is pushed to a particular branch, and "Continuous Deployment" automates the deployment of code to production when tests have passed.

This is the key service we provide to nearly all our clients -- two different kinds of testing, a fully automatic pipeline, with automatic backups, release scheduling, release note management, and more. And we've build our pipeline to work with a variety of hosts including Pantheon and Acquia but also bare Linux servers at any cloud provider.

The main downsides of this type of deployment is that it can be slow to deploy, very hard to set up, prone to breaking as code and standards evolve, and different platforms have different challenges around deploying configuration changes. For example, when you move a WordPress database to another location, you need to do a string search/replacement in the database to update the URL and the disk location, and you may need to do manual steps after the code gets deployed. Drupal, on the other hand, may put the site in maintenance mode for a few minutes as database updates get applied.

All in all, when done well, this is a great deployment strategy, but can be very expensive to maintain. That's why our service is such a great value -- we do all the hard work of keeping it running smoothly across many dozens of clients, have automated a lot of the hard bits, and streamlined the setup.

Blue/Green deployments

If even a minute of downtime costs a significant amount of income, you may want to consider a Blue/Green deployment strategy. This is a strategy made for "high availability" -- doing your utmost to both minimize maintenance windows, and provide a rock-solid roll-back option if something goes awry.

With a Blue/Green deployment strategy, you essentially create two full production environments -- "blue" and "green". One of them is live at any given instance, the other is in standby. When you want to deploy an update, you deploy all the new code and configuration changes to the offline environment, and when it's all ready to go, you simply "promote" it to be the live one. For example, if Blue is live, you deploy everything to Green, possibly using a normal dev/stage/prod deployment process. The configuration changes happen while the green site is offline, so the public never gets a "down for maintenance" message. When it's all ready, you promote Green to live, and Blue becomes the offline standby copy. And if you discover a problem after going live, you simply promote Blue back to live, and Green goes into standby where it can get fixed.

There is a big downside here -- if your site takes orders, or otherwise changes the production database, there's a window where you could lose data, much like the "Copy Site to Production" strategy. You might be able to somewhat mitigate this by setting the live site to "read only" but still available, while you copy the database to the standby site and then apply config and promote. Or you might be able to create a "journal" or log of changes that you replay on the new site after it gets promoted. Or move to a micro-service architecture -- but then you're just moving the problem into individual microservices that still need a deployment strategy.

Which deployment strategy is best?

There is no "best" deployment strategy -- it's all about tradeoffs, and what is most appropriate for a particular site's situation. If you break up your site into multiple pieces, you may end up using multiple strategies -- but each one might be quite a bit simpler than trying to update the whole. On the other hand, that might actually lower availability, as various pieces end up with different maintenance schedules.

If you're running a PHP-based CMS, and you want to rest easy that your site is up-to-date, running correctly, and with a solid recovery plan if something goes wrong, we can help with that!

Jun 12 2019
Jun 12

Glitzy websites are all the rage these days. Everybody seems to be looking for easy ways to create multimedia-rich pages with ease. Yet there is a big downside to the current trend of page builders -- if you're not careful, you might end up making your long term content management far harder than it should be.

WordPress 5 made its Gutenberg "Block editor" the standard for all WordPress sites going forward. Drupal 8.7 added a new "Layout Builder" in its core, adding sophisticated layout capabilities. Both of these are playing catchup to Software-as-a-Service (SaaS) offerings like Squarespace and Weebly -- and a whole bunch of 3rd party modules and plugins that have been filling the gap so far.

The goal for all of these is to make it easy to interleave text with photography, video, galleries, and animations using something approaching a drag-and-drop interface. Yet how they go about doing this varies drastically under the hood. In broad strokes, you can group all of these layout builders into one of 3 categories:

Broad categories of layout builders

  Field-Oriented Repeating Templates Embedded Layouts Module or Plugin

Drupal "Layout Builder"

Drupal Panels, Panelizer

Display Suite

Custom themes, page-type templates

Drupal "Paragraphs"

Field Collections

Entity References/Inline Entity Form

Commerce Variations

WordPress Gutenberg

Drupal Entity Embed

Vast majority of WP layout plugins

Where items are stored Individual fields are in individual database tables/columns `Multiple entities are linked together to build up a page Everything is dropped into a huge text field Best for Managing huge numbers of similar items Keeping content and presentation separate, to allow re-skinning down the road, while still making rich authoring relatively easy Very easy authoring Drawbacks Slightly less flexible -- harder to change up the sequence of rich elements Not as organized as field-based layouts, harder to extract, search, and aggregate information Very poor at reusing information on other pages, inconsistent look across the site, hard to update overall look and feel, finicky to use and get "just right", often has accessibility issues

That's the summary. Now let's take a look under the hood...

How do layout builders store their data, and why should I care?

Which is the best tool -- Excel or Word? Entirely depends on the job you're trying to do, of course. Yet these layout builders are as different as Word and Excel -- some are excellent at creating long documents with lots of variation, while others are far better at preserving data so you can show it in a chart, do math, and more. You wouldn't pick Excel to write an essay, for example, and you shouldn't pick Word to track your finances.

If you are creating a rich landing page for a campaign, a layout builder that takes the Embedded approach can get you there quickly. Lots of support for drag-and-drop, lots of ways to quickly get a result. You can build 5 more while you're at it -- but now try to compare things across 50 of these one-off pages -- now suddenly not having a template and simple fields to fill in makes the job much harder. You create pages for a bunch of products, and then you go to create a product comparison chart, and you're building that table by hand, cut-and-paste.

Or say for example you are a research institution, publishing research papers from dozens of contributors. You can make a nice landing page for each paper, with sections for the author's biography, the category, methodology, supporting organizations, and various other items -- but if you don't put each of these in its own field, it gets a lot trickier to build a nice search interface that will help your visitors find what they are looking for.

What is Content Management?

There are plenty of definitions of Content Management out there, mostly by vendors looking to sell you on a specific system, or pedantic descriptions of how big companies (Enterprises) need all this sophisticated stuff. While we are a vendor trying to sell you on something, let's take a moment to clear away all the B.S.

Website Content Management is about creating, maintaining, curating, and cultivating on your website for the entire life of the website. The problem with this focus on Layout Builders is that all the focus is on that very first step -- Creating. It ignores the rest of the lifecycle of your content.

At Freelock, we believe the longer you keep good content on your website, the more valuable it becomes. And keeping old content maintained and relatively fresh is a big part of that job. A Content Management System can help you keep your old content fresh -- keeping it looking consistent with rest of your site, bringing in new traffic, guiding people down your sales funnel to become customers, providing reputation and longevity that assure your customers you're not just another fly-by-night operation.

Embedding all of your rich content into one-off pages hampers this very process, especially when you want to change the look-and-feel of your website -- or find, re-use, or change the overall experience of your oldest content. Let's drill down into these different types of builders to see how they compare, for the longer term.

Field Oriented Layout Builders -- the Excel approach

Drupal Layout Builder Adding a custom block to a page using Layout Builder

Drupal excels at information architecture, and so the layout builder Drupal chose to include in its core supports this way of thinking about content. With the ability to easily create fields on different content types, and aggregate content using the powerful "Views" module, Drupal is all about information reusability.

There are dozens of different kinds of fields out there, and an even larger number of ways to use each one. For example, if you add a date field for an event, you can show it on a list of upcoming (or past) events automatically. You can show it on a calendar view. You can show it in a list, or a set of cards.

Add a geolocation field, and now you can show it on a map -- and you can filter that for upcoming events near me. Add a price and now you can create a "facet" that lets you see items between certain price ranges. All of this power builds on all of the other kinds of information stored in fields, and makes it easy to manage hundreds, thousands of similar items.

The new Drupal Layout Builder lets you easily create a template for showing each of these fields in a particular spot on the page, create multiple columns, drag-and-drop items around. In addition, you can create one-off blocks to highlight certain items, and reuse that on other items -- change up the layout entirely on a single item, if you wish.

Managing field layouts in the future

Down the road, if a product is no longer a special snowflake, hit a button and it reverts to the same layout as all the rest of your products -- the layout is stored in "just" another field on the item.

If you want to show a Google Map or a thumbnail linked to a file, you would have a field for the location to show and another field for the media. Then when you place the location field on the layout template, you would pick the "map" renderer to show a map for the field, and when you want to show the downloadable file, you could specify the size to generate the thumbnail and place it where you want it -- and it will look consistent across all the items in your site.

Want to change your design? Swap out the Google Map renderer for OpenStreetmaps, and all of the maps on your site use the new service immediately. Change the thumbnail size for the document, and move it to the left sidebar, and it's done across your site.

Embedded Layouts - the Word approach

Gutenberg in action Gutenberg editor in action

The new WordPress Gutenberg editor is the poster child for the opposite way of creating rich layouts. Instead of having a field for each kind of data, you start with a blank page and a collection of blocks you can drop into it.

Honestly, I like using Gutenberg -- once you figure out the basics, it's mostly fun to use. Its killer feature is management of "Reusable Blocks" -- create a chunk of boilerplate, save it as a reusable block, and then you can reuse it on any other Gutenberg page. You can keep it in your page as a "reusable block" or you can convert it to a regular block and edit it.

You can create entire templates this way.

This... this is awesome for creating proposals! Or reports, or anything you need to do once, and don't care much about how it will look in 5 years.

It's very rapid for creating pages, and if you are constantly editing some key landing pages, Gutenberg seems like a fine way to go.

However, for content that's going to stick around for years, especially through a site redesign, it's going to be a bit of a nightmare. And right from the start it stops being useful for a huge number of scenarios modern sites are trying to support.

Very little design control

One thing a CMS attempts to do is make your site look consistent. One challenge with Gutenberg and other approaches that largely dump styles as well as content into a big text area is that it makes it much easier to violate your site's design, leading to ugly, confusing, jarring sites. Having spent several years as a professional writer, seeing garish colors and inconsistent fonts and font sizes makes me shudder. I don't want to have to think about what it looks like -- I just want to write it and trust the designer to make it look good.

Useful data is embedded, hard to reuse

I see blocks for "Product Comparisons" for Gutenberg. Wow, drop these in and you get some boxes where you can enter stuff -- cool!

But... you have to enter that stuff. And it already exists, on the product pages. Wait a second -- I thought this was supposed to make my job easier? And... hey, I have a new product that just varies in two of these areas. Which page did I put that product comparison on?

Managing changes in the future

Back to the earlier scenarios, now I want to switch from Google Maps to OpenStreetmap. To make this change, I need to do a search and replace -- find every map on my site, and generate a new widget from the new map provider. Lots of manual work. Maybe I can find or create a script, but even so, it feels a little brittle -- if I chose a different map style on one page, I might not find that one. And change my document thumbnail to move it to the other side of the page and shrink the thumbnail? Geez, I have dozens of those to do now.

This is the big "mistake" of embedded layouts -- managing content down the road.

And this is not new to Gutenberg -- the vast majority of page builders for WordPress essentially work the same way, embedding "short codes" into the body, and the only way to find them is search.

This is part of why I've heard many shops say you just need to start over and redo your website from scratch every few years.

If you've kept your content separate from the design, that is entirely not true -- having to rebuild your site is entirely the result of having your design too entwined with your content.

Repeating Templates -- a Hybrid

Nested Paragraphs Nested Paragraphs

In between these two extremes, there is a third way. The best current example of this approach is the Paragraphs module for Drupal.

Compared to field-based layouts, you can easily make pages with a bunch of varied layouts, changing the layout as desired, one row at a time. If you do this very much with a field-based layout, you end up with a bunch of blocks hanging out that can clutter other parts of your CMS, and you end up constantly tweaking the design to get a result that looks good.

Compared to Embedded layouts, your content is still entirely separate from the design, making it easy to re-skin down the road. And you can still use fields that can be extracted and compared/searched/reused, although doing that effectively takes a fair amount of upfront planning.

We typically create a set of paragraph types, such as:

  • Plain text
  • Pull quote
  • Image and text (image to the left or right)
  • Photo Gallery
  • Large media (video or image)
  • Columns, cards
  • Tab set, Accordion set
  • Slide carousel
  • Embed a view

When creating your content, you can add these in any order you like. We can provide particular classes for color variations, locked down to your brand's style guide.

The design remains very tightly controlled. Data is not necessarily easily reused -- but you can have a section of Paragraphs on content that still uses fields for all of the data management scenarios you like.

Because everything is templated, a re-skin of the site can make changes to one of these templates and it instantly applies everywhere it is used.

So which layout type should I use?

Should you use Excel or Word? Well, of course, you should use both, if you need them. There are very compelling reasons to use fields -- they are essential to Content Management, and many, many ways they make your work much easier. But there are times when dashing out a quick document, or web page, is needed right now.

By making Gutenberg its default editor, WordPress has gone entirely to the right side of that table -- they are trying to focus on being a good page builder, potentially at the expense of content management. Do you need content management? Depends on how much content you have to manage! If you're only talking about having a nice brochure site, and a steady stream of blog or news posts, this is all fine. But the more sophisticated your needs, the more you're starting to go against the grain. You can add fields to WordPress, and create views of content -- but this involves either creating some code or finding plugins that will help you do this -- many of which are not free/open source (a discussion for another post).

With Drupal, on the other hand, you can choose all three. You can even have all 3 on the same website! We are already using Gutenberg in Drupal on some sites, and we're using Paragraphs almost everywhere. Meanwhile we are very impressed with the new Layout Builder, and find it just the thing for making attractive layouts for certain types of content. You can have your Word, and your Excel too!

Jun 03 2019
Jun 03

Just ran across a sad story where Digital Ocean is accused of killing a startup:

... the startup spun up a bunch of servers to run some large batches of data processing, which triggered an abuse alert that shut off their account.

While this might have been a bit over-aggressive on Digital Ocean's part, overall, we would side with Digital Ocean here -- we are seeing more attacks from virtual hosts, presumably from attackers that spin up a bunch of virtual machines to attack other servers and then shut them down when the attack is over -- with utility pricing an attacker can do a lot of damage at pretty low cost.

But this really has little to do with Digital Ocean, specifically -- there are a bunch of ways reliance on a single vendor can cause major business disruption. We've written about other incidents along these lines before. No matter what service you use, you need to consider the risks associated with that service -- many of which have nothing to do with who the service is. Risks like:

  • Your account credentials get compromised, and an attacker deletes everything (or something) in it
  • Ransomware that gains access to your credentials, and encrypts your data
  • Hardware failure
  • Vendor goes out of business
  • Vendor gets acquired and changes their terms, or gets shut down (Google is notorious for shutting down services that "only" have 10 million users, and frequently acquires software companies)
  • Vendor otherwise changes their service in a way that no longer meets your needs
  • An attacker compromises your account through another account with insufficient separation from yours

All of these risks have various ways of mitigating them -- measures such as two-factor authentication, keeping offline backups or backups at an entirely unrelated service, using commodity services that make your data portable to other systems.

If you're ignoring these risks, you are making yourself vulnerable. In this case, Digital Ocean is already a commodity provider of virtual machines, so it's easy to switch to another provider if something goes horribly wrong there -- if you have your data safely backed up somewhere else. Apparently for many companies, this is a big "if"...

Many of the same risks apply to Amazon Web Services (AWS), which is considered the gold standard for this type of hosting.

We are customers of both services. We back up all of our Digital Ocean data at AWS, and all of our AWS data in Google Cloud Platform (GCP). We keep our configuration in code so we can spin up replacement servers at any of those servers and restore all data within an hour or two -- and we help our clients do the same.

The question is, how valuable is your website? If it's not worth much, or you're just starting out, it may make sense to take shortcuts and spend your resources elsewhere. But if it's worth much to your business, you should be protecting it from these risks!

And feel free to reach out if we can help you with some disaster recovery planning, or implementing redundant backups of your Drupal or WordPress site!

May 08 2019
May 08

New versions of Drupal core dropped today, to fix a file handling issue.

After assessing the patches, statements, and risks associated with this update, we have decided this is an important update to apply, but not urgent for most of the sites we manage.

Exploitation of the flaw takes two things:

  • The ability to upload a malicious file with "PHAR" encoding embedded -- note this could masquerade as an otherwise innocent file such as a graphics file
  • The ability to pass a file path including the "phar://" stream wrapper prefix to a filesystem command in the code.

The Drupal security team hints that this requires some level of administrative access.

We are updating all of our client sites through our normal testing pipeline over the next few days, prioritizing any sites that allow untrusted user uploads.

If your site is on our Protection Plan, allows user uploads, and you have not received a release notification by the end of the day, please reply to our security notice and let us know, and we will expedite the updates to your site!

On another note, our take on this vulnerability is that it is a pretty fundamental issue with PHP, with a lot of different ways an exploit might happen. Drupal is really good in this area in that it has a file management API which most contributed modules use, which provides a central place to put in protection for this kind of attack. This greatly helps the entire Drupal ecosystem protect against this kind of attack!

Other systems (*cough* WordPress) leave much of this up to the plugin authors. We see numerous other PHP CMSs releasing security updates for this today, with one notable absence...

For more info, see the original security advisory.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web