Oct 16 2019
Oct 16

October 16, 2019

Unit tests are the fastest, most reliable kinds of tests: they confirm that the smallest units of your code, i.e. class methods, work as expected.

Unit tests do not require a full environment with a database and external libraries; this makes unit tests extremely fast.

In this article we will look at how to take any PHP code – a Drupal site or module, or indeed any other PHP codebase unrelated to Drupal – and start unit testing it today. We’ll start by setting up tests which work for any PHP code, and then we’ll see how to run your tests on the Drupal testbot if you so desire.

This article accompanies a talk I gave about unit testing at Drupalcamp Ottawa on Octoboer 18, 2019, here are the accompanying slides.

Before we start testing

Unit tests are useless unless they are run on every change (commit) to a codebase through continuous integration (CI). And it’s excruciatingly painful to make CI work without some sort of platform-agnostic DevOps setup (we’ll use a Docker-based workflow), so before we even start testing, we’ll set up CI and Docker.

Docker for all things

In the context of this article, we’ll define DevOps as a way to embed all dependencies within our code, meaning we want to limit the number of dependencies on our computer or CI server to run our code. To do this, we will start by installing and starting Docker Desktop.

Once you’ve set it up, confirm you have Docker running:

docker -v
# Docker version 19.03.2, build 6a30dfc

At this point, we can be assured that any code we run through Docker will run on any machine which has Docker installed. In this article we’ll use mostly PHPUnit, so instead of installing and configuring PHPUnit on our computer and our CI server and our colleagues’ computers, we can simply make sure our computer and our CI server have Docker installed, and run:

docker run --rm phpunit/phpunit --version

The first time this is run on an environment, it should result in:

Unable to find image 'phpunit/phpunit:latest' locally
latest: Pulling from phpunit/phpunit
Digest: sha256:bbbb143951f55fe93dbfed9adf130cae8623a1948f5a458e1aabbd175f7cb0b6
Status: Downloaded newer image for phpunit/phpunit:latest
PHPUnit 6.5.13 by Sebastian Bergmann, Julien Breux (Docker) and contributors.

On subsequent runs it will result in:

PHPUnit 6.5.13 by Sebastian Bergmann, Julien Breux (Docker) and contributors.

Installing PHPUnit can also be done through Composer. In this article we won’t use Composer because

  • that would require us to manage a specific version of PHP on each machine;
  • Composer does not work for programming languages other than PHP (say, for example, we want to unit test Javascript or Python).

Let’s get started!

Host your code on Github or Bitbucket

We will avoid getting ahead of ourselves by learning and using Drupal’s unit test classes (which are based on PHPUnit) and testing infrastructure (we’ll do that below): we want to start by understanding how to unit test any PHP code (Drupal or otherwise).

To that end, we will need to host our code (or a mirror thereof) on non-Drupal infrastructure. Github and Bitbucket both integrate with CircleCI, a free, fast, and easy cloud continuous integration (CI) service with no vendor lock-in; we’ll use CircleCI later on in this article. With understanding of general unit testing principles under your belt, you can later move on to use framework-specific (including Drupal-specific) testing environments if you deem it necessary (for example if you are a contributor to core or to contrib modules which follow Drupal’s testing guidelines).

To demonstrate the principles in this article, I have taken a random Drupal 8 module which, at the time of this writing, has no unit tests, Automatic Entity Label. My selection is completely arbitrary, and I don’t use this module myself, and I’m not advocating you use it or not use it.

So, as my first step, I have added v. 8.x-3.0-beta1 of this module as is to Github, and tagged it as “original”.

You can see the version I uploaded to Github, without tests, here. There are no unit tests – yet.

Start continuous integration

Because, as we mentioned above, automated testing is all but useless without continuous integration (CI) to confirm your tests are passing, the next step is to set up CI. Attaching CircleCI to Github repos is straightforward. I started by adding a test that simply confirms that we can access PHPUnit on our CI environment.

Here is the changes I made to my code to add continuous integration. At this stage, this code only confirms that PHPUnit can be run via Docker, nothing else. If you want to follow along with your own codebase, you can add the same minor changes (in fact you are encouraged to do so). The change to the README.md document is a “Badge” which displays as green if tests pass, and red if they don’t, on the project’s home page. The rest is straightforward.

Once your code is set up for CI integration, create an account and log on to CircleCI using your Github account (Bitbucket works also), select your project from your list of projects (“Set Up Project” button), and start building it (“Start Building” button); that’s it!

Here is my very first build for my version of Auto Entity Label. It is worth unfolding the “Tests” section and looking at the test results:

./scripts/ci.sh
Unable to find image 'phpunit/phpunit:latest' locally
latest: Pulling from phpunit/phpunit
Digest: sha256:bbbb143951f55fe93dbfed9adf130cae8623a1948f5a458e1aabbd175f7cb0b6
Status: Downloaded newer image for phpunit/phpunit:latest
PHPUnit 6.5.13 by Sebastian Bergmann, Julien Breux (Docker) and contributors.

You’ll notice that you have output very similar to what you have on your own computer. That’s the magic of Docker: build once, run anywhere. Without it, Continuous Integration is like pulling teeth.

Setting up PHPUnit to actually run tests

Before we can test anything, PHPUnit needs to know where the tests reside, which tests to run, and how to autoload classes based on their namespace. Different frameworks, including Drupal, have recommendations on all this, but to get a good idea of how PHPUnit works, let’s start from scratch by creating four new files in our project (keep them empty for now):

  • ./phpunit.xml, at the root of our project, will define where are tests are located, and where our autoloader is located.
  • ./phpunit-autoload.php, at the root of our project, is our autoloader; it tells PHPUnit that, for example, the namespace Drupal\auto_entitylabel\AutoEntityLabelManager corresponds to the file src/AutoEntityLabelManager.
  • ./phpunit-bootstrap.php, we’ll leave empty for now, and look at it later on.
  • ./tests/AutoEntityLabelManagerTest.php, which will contain a test for the AutoEntityLabelManager class.

phpunit.xml

In this file, we’ll tell PHPUnit where to find our tests, and where the autoloader is. Different developers have their own preferences for what to put here, and Drupal has specific recommendations, but for now we’ll just use a simple file declaring that our tests are in ./tests (although they could be anywhere), and that the file phpunit-autoload.php (you could name it anything) should be loaded before each test is run:

<?xml version="1.0" encoding="UTF-8"?>
<phpunit bootstrap="phpunit-autoload.php">
  <testsuites>
    <testsuite name="myproject">
      <directory>./tests</directory>
    </testsuite>
  </testsuites>
</phpunit>

phpunit-autoload.php

In this file, we’ll tell PHPUnit how to find files based on namespaces. Different projects do this differently. For example, Drupal 7 has a custom Drupal-only way of autoloading classes; Drupal 8 uses the PSR-4 standard. In our example, we’re telling PHPUnit that any code which uses the class Drupal\auto_entitylabel\Something will load the corresponding file ./src/Something.php:

<?php

/**
 * @file
 * PHPUnit class autoloader.
 *
 * PHPUnit knows nothing about Drupal, so provide PHPUnit with the bare
 * minimum it needs to know in order to find classes by namespace.
 *
 * Used by the PHPUnit test runner and referenced in ./phpunit.xml.
 */

spl_autoload_register(function ($class) {
  if (substr($class, 0, strlen('Drupal\\auto_entitylabel\\')) == 'Drupal\\auto_entitylabel\\') {
    $class2 = str_replace('Drupal\\auto_entitylabel\\', '', $class);
    $path = 'src/' . str_replace('\\', '/', $class2) . '.php';
    require_once $path;
  }
});

phpunit-bootstrap.php

(We’ll leave that one empty for now, but later on we’ll use it to put dummy versions of classes that Drupal code expects to find.)

tests/AutoEntityLabelManagerTest.php

Here is our first test. Let’s start with a very simple unit test: once which tests a pure function with no externalities.

Let’s take AutoEntityLabelManager::auto_entitylabel_entity_label_visible().

Here it is context, and here is the actual code we want to test:

public static function auto_entitylabel_entity_label_visible($entity_type) {
  // @codingStandardsIgnoreEnd
  $hidden = [
    'profile2' => TRUE,
  ];
  return empty($hidden[$entity_type]);
}

This is actual code which exists in the Auto Entity Label project; I have never tried this function in a running Drupal instance, I’m not even sure why it’s there, but I can still test it. I assume that if I call AutoEntityLabelManager::auto_entitylabel_entity_label_visible('whatever'), I should get TRUE as a response. This is what I will test for in ./tests/AutoEntityLabelManagerTest.php:

<?php

namespace Drupal\auto_entitylabel\Tests;

use Drupal\auto_entitylabel\AutoEntityLabelManager;
use PHPUnit\Framework\TestCase;

/**
 * Test AutoEntityLabelManager.
 *
 * @group myproject
 */
class AutoEntityLabelManagerTest extends TestCase {

  /**
   * Test for auto_entitylabel_entity_label_visible().
   *
   * @cover ::auto_entitylabel_entity_label_visible
   */
  public function testAuto_entitylabel_entity_label_visible() {
    $this->assertTrue(AutoEntityLabelManager::auto_entitylabel_entity_label_visible('whatever') === TRUE, 'Label "whatever" is visible.');
  }

}

For test methods to be called by PHPUnit, they need to start with a lowercase test.

(If you have looked at other Drupal unit testing tutorials, you might have noticed that Drupal unit tests are based not on PHPUnit\Framework\TestCase but on Drupal\Tests\UnitTestCase. The latter provides some useful, but not critical, helper code. In our case, using PHPUnit directly without Drupal means we don’t depend on Drupal to run our code; and we can better understand the intricacies of PHPUnit.)

scripts/ci.sh

Finally we’ll need to tweak ./scripts/ci.sh a bit:

docker run --rm -v "$(pwd)":/app phpunit/phpunit \
  --group myproject

Adding -v "$(pwd)":/app shares our code on our host computer or server with a directory called /app on the PHPUnit Docker container, so PHPUnit actually has access to our code. --group myproject runs all tests in the “myproject” group (recall that in tests/AutoEntityLabelManagerTest.php, we have added @group myproject to the class comment).

Here are the changes we made to our code.

Running our first test… and running into our first problem

With all those changes in place, if you run ./scripts/ci.sh, you should have this output:

$ ./scripts/ci.sh
PHPUnit 6.5.13 by Sebastian Bergmann, Julien Breux (Docker) and contributors.

…and this Fatal error…

PHP Fatal error:  Trait 'Drupal\Core\StringTranslation\StringTranslationTrait' not found in /app/src/AutoEntityLabelManager.php on line 16
...

So what’s happening here? It turns out AutoEntityLabelManager uses something called StringTranslationTrait. A PHP trait is a code sharing pattern. It’s a fascinating topic and super useful to write testable code (we’ll get to it later); but right now we don’t need it and don’t really care about it, it’s just getting in the way of our test. We somehow need to tell PHPUnit that Drupal\Core\StringTranslation\StringTranslationTrait needs to exist, but we don’t really care – right now – what it does.

That’s where our phpunit-bootstrap.php file comes in. In it, we can define Drupal\Core\StringTranslation\StringTranslationTrait so that PHP will not complain that it does not exit.

In phpunit-autoload.php, require phpunit-bootsrap.php:

require_once 'phpunit-bootstrap.php';

And in phpunit-bootsrap.php, define a dummy version of Drupal\Core\StringTranslation\StringTranslationTrait:

<?php

/**
 * @file
 *
 * PHPUnit knows nothing about Drupal. Declare required classes here.
 */

namespace Drupal\Core\StringTranslation {
  trait StringTranslationTrait {}
}

Here is the diff in our repo.

Running our first passing test!

This is a big day for you, it’s the day of your first passing test:

$ ./scripts/ci.sh
PHPUnit 6.5.13 by Sebastian Bergmann, Julien Breux (Docker) and contributors.

.                                                                   1 / 1 (100%)

Time: 124 ms, Memory: 4.00MB

OK (1 test, 1 assertion)

Because of the magic of Docker, the same output can be found on our CI infrastructure’s equivalent passing test (by unfolding the “Tests” section) once we push our code to Github.

Introducing test providers

OK, we’re getting into the jargon of PHPUnit now. To introduce the concept of test providers, consider this: almost every time we run a test, we’d like to bombard our unit (our PHP method) with a variety of inputs and expected outputs, and confirm our unit always works as expected.

The basic testing code is always the same, but the inputs and expected outputs change.

Consider our existing test:

/**
 * Test for auto_entitylabel_entity_label_visible().
 *
 * @cover ::auto_entitylabel_entity_label_visible
 */
public function testAuto_entitylabel_entity_label_visible() {
  $this->assertTrue(AutoEntityLabelManager::auto_entitylabel_entity_label_visible('whatever') === TRUE, 'Label "whatever" is visible.');
}

Maybe calling our method with “whatever” should yield TRUE, but we might also want to test other inputs to make sure we cover every possible usecase for the method. In our case, looking at the method, we can reasonably surmise that calling it with “profile2” should yield FALSE. Again, I’m not sure why this is; in the context of this tutorial, all I want to do is to make sure the method works as expected.

So the answer here is to serarate the testing code from the inputs and expected outputs. That’s where the provider comes in. We will add arguments to the test code, and define a separate function which calls our test code with different arguments. The end results looks like this (I also like to print_r() the expected and actual output in case they differ, but this is not required):

/**
 * Test for auto_entitylabel_entity_label_visible().
 *
 * @param string $message
 *   The test message.
 * @param string $input
 *   Input string.
 * @param bool $expected
 *   Expected output.
 *
 * @cover ::auto_entitylabel_entity_label_visible
 * @dataProvider providerAuto_entitylabel_entity_label_visible
 */
public function testAuto_entitylabel_entity_label_visible(string $message, string $input, bool $expected) {
  $output = AutoEntityLabelManager::auto_entitylabel_entity_label_visible($input);

  if ($output != $expected) {
    print_r([
      'output' => $output,
      'expected' => $expected,
    ]);
  }

  $this->assertTrue($output === $expected, $message);
}

/**
 * Provider for testAuto_entitylabel_entity_label_visible().
 */
public function providerAuto_entitylabel_entity_label_visible() {
  return [
    [
      'message' => 'Label "whatever" is visible',
      'input' => 'whatever',
      'expected' => TRUE,
    ],
    [
      'message' => 'Label "profile2" is invisible',
      'input' => 'profile2',
      'expected' => FALSE,
    ],
    [
      'message' => 'Empty label is visible',
      'input' => '',
      'expected' => TRUE,
    ],
  ];
}

Here is the diff in GitHub.

At this point, we have one test method being called with three different sets of data, so the same test method is being run three times; running the test now shows three dots:

$ ./scripts/ci.sh
PHPUnit 6.5.13 by Sebastian Bergmann, Julien Breux (Docker) and contributors.

...                                                                 3 / 3 (100%)

Time: 232 ms, Memory: 4.00MB

OK (3 tests, 3 assertions)

Breaking down monster functions

It must be human nature, but over time, during development, functions tend to get longer and longer, and more and more complex. Functions longer than a few lines tend to be hard to test, because of the sheer number of possible execution paths, especially if there are several levels of control statements.

Let’s take, as an example, auto_entitylabel_prepare_entityform(). With its multiple switch and if statements, it has a cyclomatic complexity of 7, the highest in this codebase, according to the static analysis tool Pdepend. If you’re curious about finding your cyclomatic complexity, you can use the magic of Docker, run the following, and take a look at ./php_code_quality/pdepend_output.xml:

mkdir -p php_code_quality && docker run -it --rm -v "$PWD":/app -w /app adamculp/php-code-quality:latest php /usr/local/lib/php-code-quality/vendor/bin/pdepend --suffix='php,module' --summary-xml='./php_code_quality/pdepend_output.xml' .

See adamculp/php-code-quality for more details. But I digress…

Testing this completely would require close to 2 to the power 7 test providers, so the easiest is to break it down into smaller functions with a lower cyclomatic complexity (that is, fewer control statements). We’ll get to that in a minute, but first…

Procedural code is not testable, use class methods

For all but pure functions, procedural code like our auto_entitylabel_prepare_entityform(), as well as private and static methods, are untestable with mock objects (which we’ll get those later). Therefore, any code you’d like to test should exist within a class. For our purposes, we’ll put auto_entitylabel_prepare_entityform() within a Singleton class, like this, and name it prepareEntityForm(). (You don’t need to use a Singleton; you can use a Drupal service or whatever you want, as long as everything you want to test is a non-static class method.)

Our second test

So we put our procedural code in a class. But the problem remains: it’s too complex to fully cover with unit tests, so as a next step I recommend surgically removing only those parts of the method we want to test, and putting them in a separate method. Let’s focus on these lines of code, which can lead to this change in our code.

Object and method mocking, and stubs

Let’s consider a scenario where we want to add some tests to EntityLabelNotNullConstraintValidator::validate().

Let’s start by splitting the validate method into smaller parts, like this. We will now focus on testing a more manageable method with a lower cyclomatic complexity:

/**
 * Manage typed data if it is valid.
 *
 * @return bool
 *   FALSE if the parent class validation should be called.
 */
public function manageTypedData() : bool {
  $typed_data = $this->getTypedData();
  if ($typed_data instanceof FieldItemList && $typed_data->isEmpty()) {
    return $this->manageValidTypedData($typed_data);
  }
  return FALSE;
}

Recall that in unit testing, we are only testing single units of code. In this case, the unit of code we are testing is manageTypedData(), above.

In order to test `manageTypedData() and nothing else, conceptually, we need to assume that getTypedData() and manageValidTypedData() are doing their jobs, we will not call them, but replace them with stub methods within a mock object.

We want to avoid calling getTypedData() and manageValidTypedData() because that would interfere with our testing of manageTypedData() – we need to mock getTypedData() and manageValidTypedData().

When we test manageTypedData() in this way, we need to replace the real getTypedData() and manageValidTypedData() with mock methods and make them return whatever we want.

PHPUnit achieves this by making a copy of our EntityLabelNotNullConstraintValidator class, where getTypedData() and manageValidTypedData() are replaced with our own methods which return what we want. So in the context of our test, we do not instantiate EntityLabelNotNullConstraintValidator, but rather, a mock version of that class in which we replace certain methods. Here is how to instantiate that class:

$object = $this->getMockBuilder(EntityLabelNotNullConstraintValidator::class)
  ->setMethods([
    'getTypedData',
    'manageValidTypedData',
  ])
  ->disableOriginalConstructor()
  ->getMock();
// We don't care how getTypedData() figures out what to return to
// manageTypedData, but we do want to see how our function will react
// to a variety of possibilities.
$object->method('getTypedData')
  ->willReturn($input);
// We will assume manageValidTypedData() is doing its job; that's not
// what were are testing here. For our test, it will always return TRUE.
$object->method('manageValidTypedData')
  ->willReturn(TRUE);

In the above example, our new object behaves exactly as EntityLabelNotNullConstraintValidator, except that getTypedData() returns $input (which we’ll define in a provider); and manageValidTypedData() always returns TRUE.

Keep in mind that private methods cannot be mocked, so for that reason I generally avoid using them; use protected methods instead.

Here is our initial test for this.

Our provider, at this point, only makes sure that if getTypedData() returns a new \stdClass() which is not an instanceof FieldItemList, then the method we’re testing will return FALSE.

Here is how we could extend our provider to make sure our method reacts correctly if getTypedData() returns a FieldItemList whose isEmpty() method returns TRUE, and FALSE.

Testing protecting methods

Let’s say we want to (partially) test the protected AutoEntityLabelManager::getConfig(), we need to introduce a new trick.

Start by taking a look at our test code which fails. If you try to run this, you will get:

There was 1 error:

1) Drupal\auto_entitylabel\Tests\AutoEntityLabelManagerTest::testGetConfig
Error: Cannot access protected property Mock_AutoEntityLabelManager_0f5704cf::$config

So we want to test a protected method (getConfig()), and, in order to test it, we need to modify a protected property ($config). These two will result in “Cannot access”-type failures.

The solution is to use a trick known as class reflection; it’s a bit opaque, but it does allow us to access protected properties and methods.

Take a look at some changes which result in a working version of our test.

Copy-pasting is perhaps your best fiend here, because this concept kind of plays with your mind. But basically, a ReflectionClass allows us to retrieve properties and methods as objects, then set their visibility using methods of those objects, then set their values or call them using their own methods… As I said, copy-pasting is good, sometimes.

A note about testing abstract classes

There are no abstract classes in Auto Entity Label, but if you want to test an abstract class, here is how to create a mock object:

$object = $this->getMockBuilder(MyAbstractClass::class)
  ->setMethods(NULL)
  ->disableOriginalConstructor()
  ->getMockForAbstractClass();

Using traits

Consider the following scenario: a bunch of your code uses the legacy drupal_set_message() method. You might have something like:

class a extends some_class {
  public function a() {
    ...
    drupal_set_message('hello');
    ...
  }
}

class b extends some_other_class {
  public function b() {
    ...
    drupal_set_message('world');
    ...
  }
}

Your tests will complain if you try to call, or mock drupal_set_message() when unit-testing a::a() or b::b(), because drupal_set_message()` is procedural and you can’t do much with it (thankfully there is fewer and fewer procedural code in Drupal modules, but you’ll still find a lot of it).

So in order to make drupal_set_message() mockable, you might want to something like:

class a extends some_class {
  protected method drupalSetMessage($x) {
    drupal_set_message($x);
  }
  public function a() {
    ...
    $this->drupalSetMessage('hello');
    ...
  }
}

class b extends some_other_class {
  protected method drupalSetMessage($x) {
    drupal_set_message($x);
  }
  public function b() {
    ...
    $this->drupalSetMessage('world');
    ...
  }
}

Now, however, we’re in code duplication territory, which is not cool (well, not much of what we’re doing is cool, not in the traditional sense anyway). We can’t define a base class which has drupalSetMessage() as a method because PHP doesn’t (and probably shouldn’t) support multiple inheritance. That’s where traits come in, it’s a technique for code reuse which is exactly adapted to this situation:

trait commonMethodsTrait {
  protected method drupalSetMessage($x) {
    drupal_set_message($x);
  }
}

class a extends some_class {
  use commonMethodsTrait;

  public function a() {
    ...
    $this->drupalSetMessage('hello');
    ...
  }
}

class b extends some_other_class {
  use commonMethodsTrait;

  public function b() {
    ...
    $this->drupalSetMessage('world');
    ...
  }
}

Drupal uses this a lot: the t() method is peppered in most of core and contrib; earlier in this article we ran into StringTranslationTrait; that allows developers to use $this->t() instead of the legacy t(), therefore making it mockable when testing methods which use it. The great thing about this approach is that we do not even need Drupal’s StringTranslationTrait when running our tests, we can mock t() even if a dummy version of StringTranslationTrait is used.

Check out this test for an example.

What about Javascript, Python and other languages?

PHP has PHPUnit; other languages also have their test suites, and they, too, can run within Docker. Javascript has AVA; Python has unittest.

All unit test frameworks support mocking.

Let’s look a bit more closely at AVA, but we do not want to install and maintain it on all our developers’ machines, and on our CI server, so we’ll use a Dockerized version of AVA. We can download that project and, specifically, run tests against example 3:

git clone [email protected]:dcycle/docker-ava.git
docker run -v $(pwd)/example03/test:/app/code \
  -v $(pwd)/example03/code:/mycode dcycle/ava

The result here, again due to the magic of Docker, should be:

So what’s going on here? We have some sample Javascript code which has a function we’d like to test:

module.exports = {
  dangerlevel: function(){
    return this.tsunamidangerlevel() * 4 + this.volcanodangerlevel() * 10;
  },

  tsunamidangerlevel: function(num){
    // Call some external API.
    return this_will_fail_during_testing();
    // During tests, we want to ignore this function.
  },

  volcanodangerlevel: function(num){
    // Call some external API.
    return this_will_fail_during_testing();
    // During tests, we want to ignore this function.
  }
}

In this specific case we’d like to mock tsunamidangerlevel() and volcanodangerlevel() during unit testing: we don’t care that this_will_fail_during_testing() is unknown to our test code. Our test could look something like this:

import test from 'ava'
import sinon from 'sinon'

var my = require('/mycode/dangerlevel.js');

test('Danger level is correct', t => {
  sinon.stub(my, 'tsunamidangerlevel').returns(1);
  sinon.stub(my, 'volcanodangerlevel').returns(2);

  t.true(my.dangerlevel() == 24);
})

What we’re saying here is that if tsunamidangerlevel() returns 1 and volcanodangerlevel() returns 2, then dangerlevel() should return 24.

The Drupal testbot

Edit (December 10, 2019): until this issue is fixed I recommend using the CircleCI technique and not testing on the Drupal infrastructure.

Drupal has its own Continuous Integration infrastructure, or testbot. It’s a bit more involving to reproduce its results locally; still, you might want to use if you are developing a Drupal module; and indeed you’ll have to use if it you are submitting patches to core.

In fact, it is possible to tweak our code a bit to allow it to run on the Drupal testbot and CircleCI.

Here are some changes to our code which allow exactly that. Let’s go over the changes required:

  • Tests need to be in ./tests/src/Unit;
  • The @group name should be unique to your project (you can use your project’s machine name);
  • The tests should have the namespace Drupal\Tests\my_project_machine_name\Unit or Drupal\Tests\my_project_machine_name\Unit\Sub\Folder (for example Drupal\Tests\my_project_machine_name\Unit\Plugin\Validation);
  • The unit tests have access to Drupal code. This is actually quite annoying, for example, we can no longer just create an anonymous class for FieldItemList but rather, we need to create a mock object using disableOriginalConstructor(); this is because, the unit test code being aware of Drupal, it knows that FieldItemList requires parameters to its constructor; and therefore it complains when we don’t have any (in the case of an anonymous object).

To make sure this works, I created a project (it has to be a full project, as far as I can tell, can’t be a sandbox project, or at least I didn’t figure out to do this with a sandbox project) at Unit Test Tutorial. I then activated automated testing under the Automated testing tab.

The results can be seen on the Drupal testbot. Look for these lines specifically:

20:32:38 Drupal\Tests\auto_entitylabel\Unit\AutoEntityLabelSingletonT   2 passes
20:32:38 Drupal\Tests\auto_entitylabel\Unit\AutoEntityLabelManagerTes   4 passes
20:32:38 Drupal\Tests\auto_entitylabel\Unit\Plugin\Validation\EntityL   1 passes
20:32:38 Drupal\Tests\auto_entitylabel\Unit\Form\AutoEntityLabelFormT   1 passes

My main annoyance with using the Drupal testbot is that it’s hard to test locally; you need to have access to a Drupal instance with PHPUnit installed as a dev dependency, and a database. To remedy this, the Drupal Tester Docker project can be used to run Drupal-like tests locally, here is how:

git clone https://github.com/dcycle/drupal-tester.git
cd drupal-tester/
mkdir -p modules
cd modules
git clone --branch 8.x-1.x https://git.drupalcode.org/project/unit_test_tutorial.git
cd ..
./scripts/test.sh "--verbose --suppress-deprecations unit_test_tutorial"
docker-compose down -v

This will give you more or less the same results as the Drupal testbot:

Drupal\Tests\auto_entitylabel\Unit\AutoEntityLabelManagerTes   4 passes
Drupal\Tests\auto_entitylabel\Unit\AutoEntityLabelSingletonT   2 passes
Drupal\Tests\auto_entitylabel\Unit\Form\AutoEntityLabelFormT   1 passes
Drupal\Tests\auto_entitylabel\Unit\Plugin\Validation\EntityL   1 passes

In conclusion

Our promise, from the title of this article, is “Start unit testing your PHP code today”. Hopefully the tricks herein will allow you to do just that. My advice to you, dear testers, is to start by using Docker locally, then to make sure you have Continuous Integration set up (on Drupal testbot or CircleCI, or, as in our example, both), and only then start testing.

Happy coding!

Please enable JavaScript to view the comments powered by Disqus.

Apr 07 2019
Apr 07

April 07, 2019

Accessibility tests can be automated to a degree, but not completely; to succeed at accessibility, it needs to be a mindset shared by developers, UX and front-end folks, business people and other stakeholders. In this article, we will attempt to run tests and produce meaningful metrics which can help teams who are already committed to produce more accessible websites.

Premise

Say your team is developing a Drupal 8 site and you have decided that you want to reduce its accessibility issues by 50% over the course of six months.

In this article, we will look at a subset of accessibility issues which can be automatically checked – color contrast, placement of tags and HTML attributes, for example. Furthermore, we will only test the code itself with some dummy data, not actual live data or environment. Therefore, if you use the approach outlined in this article, it is best to do so within a global approach which includes stakeholder training; and automated and manual monitoring of live environments, all of which are outside the scope of this article.

Approach

Your team is probably perpetually “too busy” to fix accessibility issues; and therefore too busy to read and process reports with dozens, perhaps hundreds, of accessibility problems on thousands of pages.

Instead of expecting teams to process accessibility reports, we will use a threshold approach:

First, determine a standard towards which you’d like to work, for example WCAG 2.0 AA is more stringent than WCAG 2.0 A (but if you’re working on a U.S. Government website, WCAG 2 AA is mandated by the Americans with Disabilities Act). Be realistic as to the level of effort your team is ready to deploy.

Next (we’ll see how to do this later), figure out which pages you’d like to test against: perhaps one article, one event page, the home page, perhaps an internal page for logged in users.

In this article, to keep things simple, we’ll test for:

  • the home page;
  • an public-facing internal page, /node/1;
  • the /user page for users who are logged in;
  • the node editing form at /node/1/edit (for users who are logged in, obviously).

Running accessibility checks on each of the above pages, we will end up with our baseline threshold, the current number of errors, for example this might be:

  • 6 for the home page
  • 6 for /node/1
  • 10 for /user
  • 10 for /node/1/edit

We will then make our tests fail if there are more errors on a given page than we allow for. The test should pass at first, and this approach meets several objectives:

  • First, have an idea of the state of your site: are there 10 accessibility errors on the home page, or 1000?
  • Fail immediately if a developer opens a pull request where the number of accessibility errors increases past the threshold for any given page. For example, if a widget is added to the /user page which makes the number of accessibility errors jump to 12 (in this example), we should see a failure in our continuous integration infrastructure because 12 >= 10.
  • Provide your team with the tools to reduce the threshold over time. Concretely, a discussion with all stakeholders can be had once the initial metrics are in place; a decision might be made that we want to reduce thresholds for each page by 50% within 6 months. This allows your technical team to justify the prioritization of time spent on accessibility fixes vs. other tasks seen by able-bodied stakeholders as having a more direct business value.

Principles

Principle #1: Docker for everything

Because we want to run tests on a continuous integration server, we want to avoid dependencies. Specifically, we want a system which does not require us to install specific versions of MySQL, PHP, headless browsers, accessibility checkers, etc. All our dependencies will be embedded into our project using Docker and Docker Compose. That way, all you need to install in order to run your project and test for accessibility (and indeed other tests) is Docker, which in most cases includes Docker Compose.

Principle #2: A starter database

In our continous integration setup, will will be testing our code on every commit. Although it can be useful to test, or monitor, a remote environment such as the live or staging site, this is not what this article is about. This means we need some way to include dummy data into our codebase. We will do this by adding dummy data into a “starter database” committed to version control. (Be careful not to rely on this starter database to move configuration to the production site – use configuration management for that – we only want to store dummy data in our starter database; all configuration should be in code.) In our example, our starter database will contain node/1 with some realistic dummy data. This is required because as part of our test we want to run accessibility checks agains /node/1 and /node/1/edit.

A good practice during development would be that for new data types, say a new content type “sandwich”, a new version of the starter database be created with, say, node/2 of type “sandwich”, with realistic data in all its fields. This will allow us to add an accessibility test for /node/2, and /node/2/edit if we wish.

Don’t forget, as per principle #1, above, you will never need to install anything other than Docker on your computer or CI server, so don’t attempt to install these tools locally, they will run on Docker containers which will be built automatically for you.

  • Pa11y: There are dozens of tools to check for accessibility; in this article we’ve settled on Pa11y because it provides clear error reports; and allows the concept of a threshold above which the script fails.
  • Chromium: In order to check a page for accessibility issues without actually having a browser open, a so-called headless browser is needed. Chromium is a fully functional browser which works on the command line and can be scripted. This works under the hood and you will have no need to install it or interact with it directly, it’s just good to know it’s there.
  • Puppeteer: most accessibility tools, including Pa11y, are good at testing one page. Say, if you point Pa11y to /node/1 or the home page, it will generate nice reports with thresholds. However if you point Pa11y to /user or /node/1/edit it will see those pages anonymously, which is not what we want to test. This is where Puppeteer, a browser scripting tool, comes into play. We will use Puppeteer later on to log into our site and save the markup of /user and /node/1/edit as /dom-captures/user.html and /dom-captures/node-1-edit.html, respectively, which will then allow Pa11y to access and test those paths anonymously.
  • And of course, Drupal 8, although you could apply the technique in this article to any web technology, because our accessibility checks are run against the web pages just like an end user would see them; there is no interaction with Drupal.

Setup

To follow along, you can install and start Docker Desktop and download the Dcycle Drupal 8 starterkit.

git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/deploy.sh

You are also welcome to fork the project and link it to a free CircleCI account, in which case continuous integration tests should start running immediately on every commit.

A few minutes after running ./scripts/deploy.sh, you should see a login link to a full Drupal installation on a random local port (for example http://0.0.0.0:32769) with some dummy data (/node/1). Deploying this site locally or on a CI server such as Circle CI is a one-step, one-dependency process.

In the rest of this article we will refer to this local environment as http://0.0.0.0:YOUR_PORT; always substitute your own port number (in our example 32769) for YOUR_PORT.

Introducing Pa11y

We will use a Dockerized version of Pa11y, dcycle/pa11y, here is how it works against, say, amazon.com:

docker run --rm dcycle/pa11y:1 https://amazon.com

No site that I know of has zero accessibility issues; so you’ll see a bunch of issues in this format:

• Error: This element's role is "presentation" but contains child elements with semantic meaning.
  ├── WCAG2AA.Principle1.Guideline1_3.1_3_1.F92,ARIA4
  ├── #navFooter > div:nth-child(2)
  └── <div class="navFooterVerticalColumn navAccessibility" role="presentation"><div class="navFooterVerticalRo...</div>

Running Pa11y against a local site

Developers and continuous integration servers will need to run Pa11y against a local site. We would be tempted to run Pa11y on 0.0.0.0:YOUR_PORT, but that won’t work because Pa11y is being run inside its own container and will not have access to the host machine. You could give it access, but that raises another issue: the port is not guaranteed to be the same at every run, which requires ugly logic to figure out the port. Ugh! Instead, we will attach Pa11y to the Docker network used by our Starter site, in this case called starterkit_drupal8site_default (you can use docker network ls to list networks). Because our docker-compose.yml file defines the Drupal container as having the name drupal and port 80 (the default port), we can now run:

docker run --network starterkit_drupal8site_default \
  --rm dcycle/pa11y:1 http://drupal

This has some errors, just as we expected. Before doing anything else, type echo $?; this will give a non-zero code, meaning running this will make your continuous integration script fail. However, because we decided earlier that we will tolerate, for now, 6 errors on the home page, let’s set a threshold of 6 (or however many errors you get – there are 6 at the time of this writing) instead of the default zero:

docker run --network starterkit_drupal8site_default \
  --rm dcycle/pa11y:1 http://drupal --threshold 6

If you run echo $? right after, you should get the “passing” exit code of zero. There, we’ve met our threshold, so we will not have a failure!

How about pages where you need to be logged in?

The above solution breaks down, though, when you want to test http://drupal/node/1/edit. Although it will produce results, what we are actually checking against here is the “Access denied” page, not /node/1/edit when we are logged in. We will approach this in the following way:

  • Set a random password for user 1;
  • Use Puppeteer (see “Tools”, above) to click around your local site with its dummy data, do whatever you want to, and, every step of the way, save the DOM (the document object model, or the current markup after it has been processed by Javascript) as a temporary flat file, named, say, http://drupal/dom-captures/user.html;
  • Use Pa11y to test the temporary file we just created.

Putting it all together

In our Drupal 8 Starterkit, we can test the entire process. Start by running the Puppeteer script:

./scripts/end-to-end-tests.sh

What does this look like?

Astute readers have realized that using Puppeteer to click through the site to create our dom captures has the added benefit of confirming that our site functionality works as expected, which is why I called the script end-to-end-tests.sh.

To confirm this actually worked, you can visit, in an incognito window:

Yes it looks like you’re logged in, but you are not: these are anonymous webpages which Pa11y can check.

So if this worked correctly (and it should, because we hav it under continuous integration), we can run our Pa11y tests agains all these pages:

./scripts/a11y-tests.sh
echo $?

You will see the errors, but because the number of errors is below our threshold, the exit code will be zero, allowing our Continuous integration tests to pass.

Conclusion

Making a site accessible is, in my opinion, akin to making a site secure: it is not something to add to a to-do list, but rather an approach including all site stakeholders. Neither is accessibility something which can be automated; it really is a team culture. However, approaches like the one outlined in this article, or whatever works in your organization, will give teams metrics to facilitate the integration of accessibility into their day-to-day operations.

Please enable JavaScript to view the comments powered by Disqus.

Mar 14 2019
Mar 14

March 14, 2019

Often, during local Drupal development (or if we’re really unlucky, in production), we get the dreaded message, “Unable to send e-mail. Contact the site administrator if the problem persists.”

This can make it hard to debug anything email-related during local development.

Enter Mailhog

Mailhog is a dummy SMTP server with a browser GUI, which means you view all outgoing messages with a Gmail-type interface.

It is a major pain to install, but we can automate the entire process with the magic of Docker.

Let’s see how it works, and discuss after. Follow along by installing Docker Desktop – no other dependencies are required – and installing a Drupal 8 starterkit:

git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/deploy.sh

This will install the following Docker containers: a MySQL server with a starter database, a configured Drupal site, and Mailhog. You wil see something like this at the end of the output:

If all went well you can now access your site at:

=> Drupal: http://0.0.0.0:32791/user/reset/...
=> Dummy email client: http://0.0.0.0:32790

You might be seeing different port numbers instead of 32791 and 32790, so use your own instead of the example ports.

Now, the magic

(In my example, DRUPAL_PORT is 32791 and MAILHOG_PORT is 32790. In your case it will probably be different.)

As you can see, all emails produced by Drupal are now visible on a cool GUI!

So how does it work?

A dedicated “Mailhog” docker container, using on the Mailhog Docker image is defined in our docker-compose.yml file. It exposes port 8025 for public GUI access, which is mapped to a random unused port on the host computer (in the above example, 32790). Port 1025 is the SMTP mailhog port as you can see in the Mailhog Dockerfile. We are not mapping port 1025 to a random port on the host computer because it’s only needed in the Drupal container, not the host machine.

In the same docker-compose.yml, the “drupal” container (service) defines a link to the “mail” service; this means that when you are inside the Drupal container, you can access Mailhog SMPT server “mail” at port 1025.

In the Starterkit’s Dockerfile, we download the SMTP modules, and in our configuration, we install SMTP (0, in this case, is the module’s weight, it doesn’t mean “disabled”!).

Next, configuration: because this is for local development, we are leaving SMTP off in the exported configuration; in production we don’t want SMTP to link to Mailhog. Then, in our overridden settings, we enable SMTP and set the server to “mail” and the port to 1025.

Now, you can debug sent emails in a very realistic way!

You can remove the starterkit environment by running:

docker-compose down -v

Please enable JavaScript to view the comments powered by Disqus.

Oct 27 2018
Oct 27

October 27, 2018

This article discusses how to use HTTPS for local development if you use Docker and Docker Compose to develop Drupal 7 or Drupal 8 (indeed any other platform as well) projects. We’re assuming you already have a technique to deploy your code to production (either a build step, rsync, etc.).

In this article we will use the Drupal 8 site starterkit, a Docker Compose-based Drupal application that comes with everything you need to build a Drupal site with a few commands (including local HTTPS); we’ll then discuss how HTTPS works.

If you want to follow along, install and launch the latest version of Docker, make sure ports 80 and 443 are not used locally, and run these commands:

cd ~/Desktop
git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/https-deploy.sh

The script will prompt you for a domain (for example my-website.local) to access your local development environment. You might also be asked for your password if you want the script to add “127.0.0.1 my-website.local” to your /etc/hosts file. (If you do not want to supply your password, you can add that line to /etc/hosts before running ./scripts/https-deploy.sh).

After a few minutes you will be able to access a Drupal environment on http://my-website.local and https://my-website.local. For https, you will need to explicitly accept the certificate in the browser, because it’s self-signed.

Troubleshooting: if you get a connection error, try using an incongnito (private) window in your browser, or a different browser.

Being a security-conscious developer, you probably read through ./scripts/https-deploy.sh before running it on your computer. If you haven’t, you are encouraged to do so now, as we will be explaining how it works in this article.

You cannot use Let’s Encrypt locally

I often see questions related to setting up Let’s Encrypt for local development. This is not possible because the idea behind Let’s Encrypt is to certify that you own the domain on which you’re working; because no one uniquely owns localhost, or my-project.local, no one can get a certificate for it.

For local development, the Let’s Encrypt folks suggest using trusted, self-signed certificates instead, which is what we are doing in our script.

(If you are interested in setting up Let’s Encrypt for a publicly-available domain, this article is not for you. You might be interested, instead, in Letsencrypt HTTPS for Drupal on Docker and Deploying Letsencrypt with Docker-Compose.)

Make sure your project works without https first

So let’s look at how the ./scripts/https-deploy.sh script we used above works.

Let’s start by making sure our project works without https, then add a https access in a separate container.

In our starterkit project, you can run:

./scripts/deploy.sh

At the end of that scripts, you will see something like:

If all went well you can now access your site at:

 => http://0.0.0.0:32780/user/reset/...

Docker is serving our application using a random non-secure port, in this case 32780, and mapping it to port 80 on our container.

If you use Docker Compose for local development, you might have several applications running at the same time on different host ports, all mapped to port 80 on their respective container. At the end of this article you should be able to see each of them on port 443, something like:

The secret to all your local projects sharing port 443 is a reverse proxy container which receives requests to port 443, and indeed port 80 also, and acts as a sort of traffic cop to direct traffic the appropriate container.

That is why your individual projects should not directly use ports 80 and/or 443.

Adding an Nginx proxy container in front of your project’s container

An oft-seen approach to making your project available locally via HTTPS is to fiddle with your Dockerfile, installing openssl, setting up the certificate there; and rebuilding your container. This can work, but I would argue that it has significant drawbacks:

  • If you have several projects running on https port 443 locally, you could only develop one at a time because you only have one 443 port on your host machine.
  • You would need to maintain the SSL portion of your code for each of your projects.
  • It would go against the principle of separation of concerns which makes containers so robust.
  • You would be reinventing the wheel: there’s already a well-maintained Nginx proxy image which does exactly what you want.
  • Your job as a software developer is not to set up SSL.
  • If you decide to deploy your project to production Kubernetes cluster, it would longer makes sense for each of your Apache containers to support SSL.

For all those reasons, we will loosely couple our project with the act of serving it via HTTPS; we’ll leave our project alone and place an Nginx proxy in front of it to deal with the SSL/HTTPS portion of our local deployment.

Local https for one or more running projects

In this example we set up only one starterkit application, but real-world developers often need HTTPS with more than one application. Because you only have one local 443 port for HTTPS, We need a way to differentiate between our running applications.

Our approach will be for each of our projects to have an assigned local domain. This is why the https script we used in our example asked you to choose a domain like starterkit-drupal8.local.

Our script stored this information in the .env file at the root or your project, and also made sure it resolves to localhost in your /etc/hosts file.

Launching the Nginx reverse proxy

To me the terms “proxy” and “reverse proxy” are not intuitive. I’ll try to demystify them here.

The term “proxy” means something which represents something else; that term is already widely used to denote a web client being hidden from the user. So, a server might deliver content to a proxy which then delivers it to the end user, thereby hiding the end user from the server.

In our case we want to do the reverse: the client (you) is not placing a proxy in front of it; rather the application is placing a proxy in front of it, thereby hiding the project server from the browser: the browser communicates with Nginx, and Nginx communicates with your project.

Hence, “reverse proxy”.

Our reverse proxy uses a widely used and well-maintained GitHub project. The script you used earlier in this article launched a container based on that image.

Linking the reverse proxy to our application

With our starterkit application running on a random port (something like 32780) and our nginx proxy application running on ports 80 and 443, how are the two linked?

We now need to tell our Nginx proxy that when it receives a request for domain starterkit-drupal8.local, it should display our starterkit application.

There are a few steps to this, most handled by our script:

  • Your project’s docker-compose.yml file should look something like this: it needs to contain the environment variable VIRTUAL_HOST=${VIRTUAL_HOST}. This takes the VIRTUAL_HOST environment variable that our script added to the ./.env file, and makes it available inside the container.
  • Our script assumes that your project contains a ./scripts/deploy.sh file, which deploys our project to a random, non-secure port.
  • Our script assumes that only the Nginx Proxy container is published on ports 80 and 443, so if these ports are already used by something else, you’ll get an error.
  • Our script appends VIRTUAL_HOST=starterkit-drupal8.local to the ./.env file.
  • Our script attempts to add 127.0.0.1 starterkit-drupal8.local to our /etc/hosts file, which might require a password.
  • Our script finds the network your project is running on locally (all Docker-compose projects run on their own local named network), and gives the reverse proxy accesss to it.

That’s it!

You should now be able to access your project locally with https://starterkit-drupal8.local (port 443) and http://starterkit-drupal8.local (port 80), and apply this technique to any number of Docker Compose projects.

Troubleshooting: if you get a connection error, try using an incongnito (private) window in your browser, or a different browser; also note that you need to explicitly trust the certificate.

You can copy paste the script to your Docker Compose project at ./scripts/https-deploy.sh if:

  • Your ./docker-compose.yml contains the environment variable VIRTUAL_HOST=${VIRTUAL_HOST};
  • You have a script, ./scripts/deploy.sh, which launches a non-secure version of your application on a random port.

Happy coding!

Please enable JavaScript to view the comments powered by Disqus.

Oct 05 2018
Oct 05

October 05, 2018

I recently ran into a series of weird issues on my Acquia production environment which I traced back to some code I deployed which depended on my site being served securely using HTTPS.

Acquia Staging environments don’t use HTTPS by default and require you to install SSL certificates using a tedious manual process, which in my opinion is outdated, because competitors such as Platform.sh and Pantheon, Aegir, even Github pages support lots of automation around HTTPS using Let’s Encrypt.

Anyhow, because staging did not have HTTPS, I could not test some code I deployed, which ended up costing me an evening debugging an outage on a production environment. (Any difference between environments will eventually result in an outage.)

I found a great blog post which explains how to set up Let’s Encrypt on Acquia environments, Installing (FREE) Let’s Encrypt SSL Certificates on Acquia, by Chris at Redfin solutions, May 2, 2017. Although the process is very well documented, I made some tweaks:

  • First, I prefer using Docker-based solutions rather than install softward on my computer. So, instead of install certbot on my Mac, I opted to use the Certbot Docker Image, this has two advantages for me: first, I don’t need to install certbot on every machine I use this script on; and second, I don’t need to worry about updating certbot, as the Docker image is updated automatically. Of course, this does require that you install Docker on your machine.
  • Second, I automated everything I could. This result in this gist (a “gist” a basically a single file hosted on Github), a script which you can install locally.

Running the script

When you put the script locally on your computer (I added it to my project code), at, say ./scripts/set-up-letsencrypt-acquia-stage.sh, and run it:

  • the first time you run it, it will tell you where to put your environment information (in ./acquia-stage-letsencrypt-environments/environment-my-acquia-project-one.source, ./acquia-stage-letsencrypt-environments/environment-my-acquia-project-two.source, etc.), and what to put in those files.
  • the next time you run it, it will automate what it can and tell you exactly what you need to do manually.

I tried this and it works for creating new certs, and should work for renewals as well!

Please enable JavaScript to view the comments powered by Disqus.

Apr 07 2018
Apr 07

April 07, 2018

The process documented process for setting up a local environment and running tests locally is, in my opinion, so complex that it can be a barrier to even determined developers.

For those wishing to locally test and develop core patches, I think it is possible to automate the process down to a few steps and few minutes; here is an example with a core issue, #2273889 Don’t use one language’s plural index formula with another language’s string in the case of untranslated strings using format_plural(), which, at the time of this writing, results in the number 0 being displayed as 1 in certain cases.

Is it possible to start useful local development on this within 10 minutes on a computer with nothing installed except Docker? Let’s try…

Step 1: install Docker

Install and launch Docker. Everything we need, Apache web server, MySql server, Drush, Drupal, will reside on Docker containers, so we won’t need to install anything locally except Docker.

Step 2: launch a dev environment

I have create a project hosted on GitHub which will help you set up everything you need in Docker contains without local dependencies other than Docker, or any manual steps. Set it up by running:

git clone https://github.com/dcycle/drupal8_core_dev_helper.git && \
  cd drupal8_core_dev_helper && \
  ./scripts/deploy.sh`

This will create everything you need: a webserver container and database container, and your Drupal core code which will be placed in ./drupal8_core_dev_helper/drupal; near the end of the output of ./scripts/deploy.sh, you will see a login link to your development environment. Confirm you can access that local development environment at an address like http://0.0.0.0:SOME-PORT. (The port is random.)

The first time you run this, it will have to download Docker images with Drupal, MySQL, and install everything you need for local development. Future runs will be a lot faster.

See the project’s README for more details.

In your dev environment, you can confirm that the problem exists (provided the issue has not yet been fixed) by following the instructions in the “To reproduce this problem:” section of the issue description on your local development environment.

Any calls to drush can be run on the Docker container like so:

docker-compose exec drupal /bin/bash -c 'drush ...'

For example:

docker-compose exec drupal /bin/bash -c 'drush en locale language -y'

If you want to run drush directly, you can connect to your container like so:

docker-compose exec drupal /bin/bash

This will result in the following prompt on the container:

[email protected]:/var/www/html#

Now you can run drush commands directly on the container:

drush eval "print_r(\Drupal::translation()->formatPlural(0, '1 whatever', '@count whatevers', array(), array('langcode' => 'fr')) . PHP_EOL);"

Because the drupal8_core_dev_helper project also pre-installs devel on your environment, you can also confirm the problem exists by visiting /devel/php and executing:

dpm((string) (\Drupal::translation()->formatPlural(0, '1 whatever', '@count whatevers', array(), array('langcode' => 'fr'))));

Whether you do this by Drush or /devel/php, the result should be the same if the issue has not been resolved: 1 whatever instead of 0 whatevers.

Step 3: get a local version of the patch and apply it

In this example, we’ll look at the patch in comment #32 of our formatPlural issue, referenced above. If the issue has been resolved since this blog post has been written, follow along with another patch.

cd drupal8_core_dev_helper
curl https://www.drupal.org/files/issues/2018-04-07/2273889-31-core-8.5.x-plural-index-no-test.patch -O
cd ./drupal && patch -p1 < ../2273889-31-core-8.5.x-plural-index-no-test.patch

You have now patched your local version of Drupal. You can try the “0 whatevers” test again and the bug should be fixed.

Running tests

Now the real fun begins… and the “fast-track” ends.

For any patch to be considered for inclusion in Drupal core, it will need to (a) not break existing tests; and (b) provide a test which, without the patch, confirms that the problem exists.

Let’s head back to comment #32 of issue #2273889 and see if our patch is breaking anything. Clicking on “PHP 7 & MySQL 5.5 23,209 pass, 17 fail” will bring us to the test results page, which at first glance seems indecipherable. You’ll notice that our seemingly simple change to the PluralTranslatableMarkup.php file is causing a number of tests to fail: HelpEmptyPageTest, EntityTypeTest…

Let’s start by finding the test which is most likely to be directly related to our change by searching on the test results page for the string “PluralTranslatableMarkupTest” (this is name of the class we changed, with the word Test appended), which shows that it is failing:

Testing Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest
.E

We need to figure out where that file resides, by typing:

cd /path/to/drupal8_core_dev_helper/drupal/core
find . -name 'PluralTranslatableMarkupTest.php'

This tells us it is at ./tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php.

Because we have a predictable Docker container, we can relatively easily run this test locally:

cd /path/to/drupal8_core_dev_helper
docker-compose exec drupal /bin/bash -c 'cd core && \
  ../vendor/bin/phpunit \
  ./tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php'

You should now see the test results for only PluralTranslatableMarkupTest:

PHPUnit 6.5.7 by Sebastian Bergmann and contributors.

Testing Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest
.E                                                                  2 / 2 (100%)

Time: 16.48 seconds, Memory: 6.00MB

There was 1 error:

1) Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest::testPluralTranslatableMarkupSerialization with data set #1 (2, 'plural 2')
Error: Call to undefined method Mock_TranslationInterface_4be32af3::getStringTranslation()

/var/www/html/core/lib/Drupal/Core/StringTranslation/PluralTranslatableMarkup.php:150
/var/www/html/core/lib/Drupal/Core/StringTranslation/PluralTranslatableMarkup.php:121
/var/www/html/core/tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php:31

ERRORS!
Tests: 2, Assertions: 1, Errors: 1.

How to fix this, indeed whether this will be fixed, is a whole nother story, a story fraught with dependency injection, mock objects, method stubs… More an adventure, really, than a story. An adventure which deserves to be told, just not right now.

Please enable JavaScript to view the comments powered by Disqus.

Jan 24 2018
Jan 24

January 24, 2018

Here are a few things I learned about caching for REST resources.

There are probably better ways to accomplish this, but here is what works for me.

Let’s say we have a REST resource that looks something like this in .../my_module/src/Plugin/rest/resource/MyRestResource.php and we have enabled it using the Rest UI module and given anonymous users permission to view it:

<?php

namespace Drupal\my_module\Plugin\rest\resource;

use Drupal\rest\ResourceResponse;

/**
 * This is just an example.
 *
 * @RestResource(
 *   id = "this_is_just_an_example",
 *   label = @Translation("Display the title of node 1"),
 *   uri_paths = {
 *     "canonical" = "/api/v1/get"
 *   }
 * )
 */
class MyRestResource extends ResourceBase {

  /**
   * {@inheritdoc}
   */
  public function get() {
    $node = node_load(1);
    $response = new ResourceResponse(
      [
        'title' => $node->getTitle(),
        'time' => time(),
      ]
    );
    return $response;
  }

}

Now, we can visit http://example.localhost/api/v1/get?_format=json and we will see something like:

{"title":"Some Title","time":1516803204}

Reloading the page, ‘time’ stays the same. That means caching is working; we are not re-computing our Json output each time someone requests it.

How to invalidate the cache when the title changes.

If we edit node 1 and change its title to, say, “Another title”, and reload http://example.localhost/api/v1/get?_format=json, we’ll see the old title. To make sure the cache is invalidated when this happens, we need to provide cacheability metadata to our response telling it when it needs to be recomputed.

Our node, when it’s loaded, contains within it all the caching metadata needed to describe when it should be recomputed: when the title changes, when new filters are added to the text format that’s being used, etc. We can add this information to our ResourceResponse like this:

...
$response->addCacheableDependency($node);
return $response;
...

When we clear our cache with drush cr and reload our page, we’ll see something like:

{"title":"Another title","time":1516804411}

Even more fun is changing the title of node 1 and reloading our Json page, and seeing the title and time change without clearing the cache:

{"title":"Yet another title","time":1516804481}

How to set custom cache invalidation events

Let’s say you want to trigger a cache rebuild for some reason other than those defined by the node itself (title change, etc.).

A real-world example might be events: an “upcoming events” page should only display events which start later than now. If we invalidate the cache every day, then we’ll never show yesterday’s events in our events feed. Here, we need to add our custom cache invalidation event, in this case “rebuild events feed”.

For the purpose of this demo, we won’t actually build an events feed, but we’ll see how cron might be able to trigger cache invalidation.

Let’s add the following code to our response:

...
use Drupal\Core\Cache\CacheableMetadata;
...
$response->addCacheableDependency($node);
$response->addCacheableDependency(CacheableMetadata::createFromRenderArray([
  '#cache' => [
    'tags' => [
      'rebuild-events-feed',
    ],
  ],
]));
return $response;
...

This uses Drupal’s cache tags concept and tells Drupal that when the cache tag ‘rebuild-events-feed’ is invalidated, all cacheable responses which have that cache tag should be invalidated as well. I prefer this to the ‘max-age’ cache tag because it allows us more fine-grained control over when to invalidate our caches.

On cron, we could only invalidate ‘rebuild-events-feed’ if events have passed since our last invalidation of that tag, for example.

For this example, we’ll just invalidate it manually. Clear your cache to begin using the new code (drush cr), then load the page, you will see something like:

{"hello":"Yet another title","time":1516805677}

As always, the time remains the same no matter how many times you reload the page.

Let’s say you are in the midst of a cron run and you have determined that you need to invalidate your cache for response which have the cache tag ‘rebuild-events-feed’, you can run:

\Drupal::service('cache_tags.invalidator')->invalidateTags(['rebuild-events-feed'])

Let’s do it in Drush to see it in action:

drush ev "\Drupal::service('cache_tags.invalidator')->\
  invalidateTags(['rebuild-events-feed'])"

We’ve just invalidated our ‘rebuild-events-feed’ tag and, hence, Responses that use it.

This one is beyond my competence level, but I wanted to mention it anyway.

Let’s say you want to output your node’s URL to Json, you might consider computing it using $node->toUrl()->toString(). This will give us “/node/1”.

Let’s add it to our code:

...
'title' => $node->getTitle(),
'url' => $node->toUrl()->toString(),
'time' => time(),
...

This results in a very ugly error which completely breaks your site (at least at the time of this writing): “The controller result claims to be providing relevant cache metadata, but leaked metadata was detected. Please ensure you are not rendering content too early.”.

The problem, it seems, is that Drupal detects that the URL object, like the node we saw earlier, contains its own internal information which tells it when its cache should be invalidated. Converting it to a string prevents the Response from being informed about that information somehow (again, if someone can explain this better than me, please leave a comment), so an exception is thrown.

The ‘toString()’ function has an optional parameter, “$collect_bubbleable_metadata”, which can be used to get not just a string, but also information about when its cache should be invalidated. In Drush, this will look like something like:

drush ev 'print_r(node_load(1)->toUrl()->toString(TRUE))'
Drupal\Core\GeneratedUrl Object
(
    [generatedUrl:protected] => /node/1
    [cacheContexts:protected] => Array
        (
        )

    [cacheTags:protected] => Array
        (
        )

    [cacheMaxAge:protected] => -1
    [attachments:protected] => Array
        (
        )

)

This changes the return type of toString(), though: toString() no longer returns a string but a GeneratedUrl, so this won’t work:

...
'title' => $node->getTitle(),
'url' => $node->toUrl()->toString(TRUE),
'time' => time(),
...

It gives us the error “Could not normalize object of type Drupal\Core\GeneratedUrl, no supporting normalizer found”.

ohthehugemanatee commented on Drupal.org on how to fix this. Integrating his suggestion, our code now looks like:

...
$url = $node->toUrl()->toString(TRUE);
$response = new ResourceResponse(
  [
    'title' => $node->getTitle(),
    'url' => $url->getGeneratedUrl(),
    'time' => time(),
  ]
);
$response->addCacheableDependency($node);
$response->addCacheableDependency($url);
...

This will now work as expected.

With all the fun we’re having, though, let’s take this a step further, let’s say we want to export the feed of frontpage items in our Response:

$url = $node->toUrl()->toString(TRUE);
$view = \Drupal\views\Views::getView("frontpage"); 
$view->setDisplay("feed_1");
$view_render_array = $view->render();
$rendered_view = render($view_render_array);

$response = new ResourceResponse(
  [
    'title' => $node->getTitle(),
    'url' => $url->getGeneratedUrl(),
    'view' => $rendered_view,
    'time' => time(),
  ]
);
$response->addCacheableDependency($node);
$response->addCacheableDependency($url);
$response->addCacheableDependency(CacheableMetadata::createFromRenderArray($view_render_array));

You will not be surpised to see the “leaked metadata was detected” error again… In fact you have come to love and expect this error at this point.

Here is where I’m completely out of my league; according to Crell, “[i]f you [use render() yourself], you’re wrong and you should fix your code “, but I’m not sure how to get a rendered view without using render() myself… I’ve implemented a variation on a comment on Drupal.org by mikejw suggesting using different render context to prevent Drupal from complaining.

$view_render_array = NULL;
$rendered_view = NULL;
\Drupal::service('renderer')->executeInRenderContext(new RenderContext(), function () use ($view, &$view_render_array, &$rendered_view) {
  $view_render_array = $view->render();
  $rendered_view = render($view_render_array);
});

If we check to make sure we have this line in our code:

$response->addCacheableDependency(CacheableMetadata::createFromRenderArray($view_render_array));

we’re telling our Response’s cache to invalidate whenever our view’s cache invaliates. So, for example, if we have several nodes promoted to the front page in our view, we can modify any one of them and our entire Response’s cache will be invalidated and rebuilt.

Resources and further reading

Please enable JavaScript to view the comments powered by Disqus.

Dec 18 2017
Dec 18

December 18, 2017

I recently needed to port hundreds of Drupal 7 webforms with thousands of submissions from Drupal 7 to Drupal 8.

My requirements were:

  • Node ids need to remain the same
  • Webforms need to be treated as data: they should be ignored by config export and import, just like nodes and taxonomy terms are. The reasonining is that in my setup, forms are managed by site editors, not developers. (This is not related to migration per se, but was a success criteria for my migration so I’ll document my solution here)

Migration from Drupal 7

I could not find a reliable upgrade or migration path from Drupal 7 to Drupal 8. I found webform_migrate lacks documentation (I don’t know where to start) and migrate_webform is meant for Drupal 6, not Drupal 7 as a source.

I settled on a my own combination of tools and workflows to perform the migration, all of them available on my Github account.

Using version 8.x-5.x of webform, I started by enabling webform, webform_node and webform_ui on my Drupal 8 site, this gives me an empty webform node type.

I then followed the instructions for a basic migration, which is outside the scope of this article. I have a project on Github which I use as starting point from my Drpual 6 and 7 to 8 migrations. The blog post Custom Drupal-to-Drupal Migrations with Migrate Tools, Drupalize.me, April 26, 2016 by William Hetherington provides more information on performing a basic migration of data.

Once you have set up your migration configurations as per those instructions, you should be able to run:

drush migrate-import upgrade_d7_node_webform --execute-dependencies

And you should see something like:

Processed 25 items (25 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_node_type'
Processed 11 items (11 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user_role'
Processed 0 items (0 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user_role'
Processed 95 items (95 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user'
Processed 109 items (109 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_node_webform'

At this point I had all my webforms as nodes with the same node ids on Drupal 7 and Drupal 8, however this does nothing to import the actual forms or submissions.

Importing the data itself

I found that the most efficient way of importing the data was to create my own Drupal 8 module, which I have published on Dcycle’s Github account, called webform_d7_to_d8. (I have decided against publishing this on Drupal.org because I don’t plan on maintaining it long-term, and I don’t have the resources to combine efforts with existing webform migration modules.)

I did my best to make that module self-explanatory, so you should be able to follow the steps the README file, which I will summarize here:

Start by giving your Drupal 8 site access to your Drupal 7 database in ./sites/default/settings.php:

$databases['upgrade']['default'] = array (
  'database' => 'drupal7database',
  'username' => 'drupal7user',
  'password' => 'drupal7password',
  'prefix' => '',
  'host' => 'drupal7host',
  'port' => '3306',
  'namespace' => 'Drupal\\Core\\Database\\Driver\\mysql',
  'driver' => 'mysql',
);

Run the migration with our without options:

drush ev 'webform_d7_to_d8()'

or

drush ev 'webform_d7_to_d8(["nid" => 123])'

or

drush ev 'webform_d7_to_d8(["simulate" => TRUE])'

More detailed information can be found in the module’s README file.

Treating webforms as data

Once you have imported your webforms to Drupal 8, they are treated as configuration, that is, the Webform module assumes that developers, not site builders, will be creating the forms. This may be fine in many cases, however my usecase is that site editors want to create and edit forms directly on the production site, and we don’t want them to be tracked by the configuration management system.

Jacob Rockowitz pointed me in the right direction for making sure webforms are not treated as configuration. For that purpose I am using Drush CMI tools by Previous Next and documented on their blog post, Introducing Drush CMI tools, 24 Aug. 2016.

Once you install Drush CMI tools in your ~/.drush folder and run drush cc drush, you can use druch cexy and druch cimy instead of drush cim and drush cex in your conguration management process. Here is how and why:

Normally, if you develop your site locally and, say, add a content type or field, or remove a content type of field, you can run drush cex to export your newly created configuration. Then, your colleagues can pull your code and run drush cim to pull your configuration. drush cim can also be used in continuous integration, preproduction, dev, and production environments.

The problem is that drush cex exports all configuration, and drush cim deletes everything in the database which is not in configuration. In our case, we don’t want to consider webforms as configuration but as data, just as nodes as taxonomy terms: we don’t want them to be exported along with other configuration; and if they exist on a target environment we want to leave them as they are.

Using Drush CMI tools, you can add a file such as the following to ~/.drush/config-ignore.yml:

# See http://blog.dcycle.com/blog/2017-12-18
ignore:
  - webform.webform.*

This has to be done on all developers’ machines or, if you use Docker, on a shared Docker container (which is outside the scope of this article).

Now, for exporting configuration, run:

drush cexy --destination='/path/to/config/folder'

Now, webforms will not be exported along with other configuration.

We also need to avoid erasing webforms on target environments: if you create a webform on a target environment, then run drush cim, you will see something like:

webform.webform.webform_9521   delete
webform.webform.webform_8996   delete
webform.webform.webform_8991   delete
webform.webform.webform_8986   delete

So, we need to avoid deleting webforms on the target environment when we import configuration. We could just do drush cim --partial but this avoids deleting everything, not just webforms.

Drush CMI tools provides an alternative:

drush cimy --source=/path/to/config/folder

This works much like drush cim --partial, but it allows you to specify another parameter, –delete-list=/path/to/config-delete.yml

Then, in config-delete.yml, you can specify items that you actually want to delete on the target environment, for example content types, fields, and views which do not exist in code. This is dependent on your workflow and they way to set it up isdocumented on the Drush CMI tools project homepage.

With this in place, we’ll have our Drupal 7 webforms on our Drupal 8 site.

Please enable JavaScript to view the comments powered by Disqus.

Oct 03 2017
Oct 03

October 03, 2017

This article is about serving your Drupal Docker container, and/or any other container, via https with a valid Let’s encrypt SSL certificate.

Edit: if you’re having trouble with Docker-Compose, read this follow-up post.

Step one: make sure you have a public VM

To follow along, create a new virtual machine (VM) with Docker, for example using the “Docker” distribution in the “One-click apps” section of Digital Ocean.

This will not work on localhost, because in order to use Let’s Encrypt, you need to demonstrate ownership over your domain(s) to the outside world.

In this tutorial we will serve two different sites, one simple HTML site and one Drupal site, each using standard ports, on the same Docker host, using a reverse proxy, a container which sits in front of your other containers and directs traffic.

Step two: Set up two domains or subdomains you own and point them to your server

Start by making sure you have two domains which point to your server, in this example we’ll use:

  • test-one.example.com will be a simple HTML site.
  • test-two.example.com will be a Drupal site.

Step three: create your sites

We do not want to map our containers’ ports directly to our host ports using -p 80:80 -p 443:443 because we will have more than one app using the same port (the secure 443). Port mapping will be the responsibility of the reverse proxy (more on that later). Replace example.com with your own domain:

DOMAIN=example.com
docker run -d \
  -e "VIRTUAL_HOST=test-one.$DOMAIN" \
  -e "LETSENCRYPT_HOST=test-one.$DOMAIN" \
  -e "[email protected]$DOMAIN" \
  --expose 80 --name test-one \
  httpd
docker run -d \
  -e "VIRTUAL_HOST=test-two.$DOMAIN" \
  -e "LETSENCRYPT_HOST=test-two.$DOMAIN" \
  -e "[email protected]$DOMAIN" \
  --expose 80 --name test-two \
  drupal

Now you have two running sites, but they’re not yet accessible to the outside world.

Step three: a reverse proxy and Let’s encrypt

The term “proxy” means something which represents something else. In our case we want to have a webserver container which represents our Drupal and html containers. The Drupal and html containers are effectively hidden in front of a proxy. Why “reverse”? The term “proxy” is already used and means that the web user is hidden from the server. If it is the web servers that are hidden (in this case Drupal or the html containers), we use the term “reverse proxy”.

Let’s encrypt is a free certificate authority which certifies that you are the owner of your domain.

We will use nginx-proxy as our reverse proxy. Because that does not take care of certificates, we will use LetsEncrypt companion container for nginx-proxy to set up and maintain Let’s Encrypt certificates.

Let’s start by creating an empty directory which will contain our certificates:

mkdir "$HOME"/certs

Now, following the instructions of the LetsEncrypt companion project, we can set up our reverse proxy:

docker run -d -p 80:80 -p 443:443 \
  --name nginx-proxy \
  -v "$HOME"/certs:/etc/nginx/certs:ro \
  -v /etc/nginx/vhost.d \
  -v /usr/share/nginx/html \
  -v /var/run/docker.sock:/tmp/docker.sock:ro \
  --label com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy \
  --restart=always \
  jwilder/nginx-proxy

And, finally, start the LetEncrypt companion:

docker run -d \
  --name nginx-letsencrypt \
  -v "$HOME"/certs:/etc/nginx/certs:rw \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  --volumes-from nginx-proxy \
  --restart=always \
  jrcs/letsencrypt-nginx-proxy-companion

Wait a few minutes for "$HOME"/certs to be populated with your certificate files, and you should now be able to access your sites:

A note about renewals

Let’s Encrypt certificates last 3 months, so we generally want to renew every two months. LetsEncrypt companion container for nginx-proxy states that it automatically renews certificates which are set to expire in less than a month, and it checks this hourly, although there are some renewal-related issues in the issue queue.

It seems to also be possible to force renewals by running:

docker exec nginx-letsencrypt /app/force_renew

So it might be worth considering to be on the lookout for failed renewals and force them if necessary.

Edit: domain-specific configurations

I used this technique to create a Docker registry, and make it accessible securely:

docker run \
  --entrypoint htpasswd \
  registry:2 -Bbn username password > auth/htpasswd

docker run -d --expose 5000 \
  -e "VIRTUAL_HOST=mydomain.example.com" \
  -e "LETSENCRYPT_HOST=mydomain.example.com" \
  -e "[email protected]" \
  -e "REGISTRY_AUTH=htpasswd" \
  -e "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm" \
  -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \ 
  --restart=always -v "$PWD"/auth:/auth \
  --name registry registry:2

But when trying to push an image, I was getting “413 Request Entity Too Large”. This is an error with the nginx-proxy, not the Docker registry. To fix this, you can set domain-specific configurations, in this example we are allowing a maximum of 600M to be passed but only to the Docker registry at mydomain.example.com:

docker exec nginx-proxy /bin/bash -c 'cp /etc/nginx/vhost.d/default /etc/nginx/vhost.d/mydomain.example.com'
docker exec nginx-proxy /bin/bash -c 'echo "client_max_body_size 600M;" >> /etc/nginx/vhost.d/mydomain.example.com'
docker restart nginx-proxy

Enjoy!

You can now bask in the knowledge that your cooking blog will not be man-in-the-middled.

Please enable JavaScript to view the comments powered by Disqus.

Feb 28 2017
Feb 28

February 28, 2017

As the maintainer of Realistic Dummy Content, having procrastinated long and hard before releasing a Drupal 8 version, I decided to leave my (admittedly inelegant) logic intact and abstract away the Drupal 7 code, with the goal of plugging in Drupal 7 or 8 code at runtime.

Example original Drupal 7 code

// Some logic.
$updated_file = file_save($drupal_file);
// More logic.

Example updated code

Here is a simplified example of how the updated code might look:

// Some logic.
$updated_file = Framework::instance()->fileSave($drupal_file);
// More logic.

abstract class Framework {

  static function instance() {
    if (!$this->instance) {
      if (defined('VERSION')) {
        $this->instance = new Drupal7();
      }
      else {
        $this->instance = new Drupal8();
      }
    }
    return $this->instance;
  }

  abstract function fileSave($drupal_file);

}

class Drupal8 extends Framework {
  public function fileSave($drupal_file) {
    $drupal_file->save();
    return $drupal_file;
  }
}

class Drupal7 extends Framework {
  public function fileSave($drupal_file) {
    return file_save($drupal_file);
  }
}

Once I have defined fileSave(), I can simply replace every instance of file_save() in my legacy code with Framework::instance()->fileSave().

In theory, I can then identify all Drupal 7 code my module and abstract it away.

Automated testing

As long as I surgically replace Drupal 7 code such as file_save() with “universal” code such Framework::instance()->fileSave(), without doing anything else, without giving in the impulse of “improving” the code, I can theoretically only test Framework::instance()->fileSave() itself on Drupal 7 and Drupal 8, and as long as both versions are the same, my underlying code should work. My approach to automated tests is: if it works and you’re not changing it, there is no need to test it.

Still, I want to make sure my framework-specific code works as expected. To set up my testing environment, I have used Docker-compose to set up three containers: Drupal 7, Drupal 8; and MySQL. I then have a script which builds the sites, installs my module on each, then run a selftest() function which can test the abstracted function such as fileSave() and make sure they work.

This can then be run on a continuous integration platform such as Circle CI which generates a cool badge:

CircleCI

Extending to Backdrop

Once your module is structured in this way, it is relatively easy to add new related frameworks, and I’m much more comfortable releasing a Drupal 9 update in 2021 (or whenever it’s ready).

I have included experimental Backdrop code in Realistic Dummy Content to prove the point. Backdrop is a fork of Drupal 7.

abstract class Framework {
  static function instance() {
    if (!$this->instance) {
      if (defined('BACKDROP_BOOTSTRAP_SESSION')) {
        $this->instance = new Backdrop();
      }
      elseif (defined('VERSION')) {
        $this->instance = new Drupal7();
      }
      else {
        $this->instance = new Drupal8();
      }
    }
    return $this->instance;
  }
}

// Most of Backdrop's API is identical to D7, so we can only override
// what differs, such as fileSave().
class Backdrop extends Drupal7 {
  public function fileSave($drupal_file) {
    file_save($drupal_file);
    // Unlike Drupal 7, Backdrop returns a result code, not the file itself,
    // in file_save(). We are expecting the file object.
    return $drupal_file;
  }
}

Disadvantages of this approach

Having just released Realisic Dummy Content 7.x-2.0-beta1 and 8.x-2.0-beta1 (which are identical), I can safely say that this approach was a lot more time-consuming than I initially thought.

Drupal 7 class autoloading is incompatible with Drupal 8 autoloading. In Drupal 7, classes cannot (to my knowledge) use namespaces, and must be added to the .info file, like this:

files[] = includes/MyClass.php

Once that is done, you can define MyClass in includes/MyClass.php, then use MyClass anywhere you want in your code.

Drupal 8 uses PSR-4 autoloading with namespaces, so I decided to create my own autoloader to use the same system in Drupal 7, something like:

spl_autoload_register(function ($class_name) {
  if (defined('VERSION')) {
    // We are in Drupal 7.
    $parts = explode('\\', $class_name);
    // Remove "Drupal" from the beginning of the class name.
    array_shift($parts);
    $module = array_shift($parts);
    $path = 'src/' . implode('/', $parts);
    if ($module == 'MY_MODULE_NAME') {
      module_load_include('php', $module, $path);
    }
  }
});

Hooks have different signatures in Drupal 7 and 8; in my case I was lucky and the only hook I need for Drupal 7 and 8 is hook_entity_presave() which has a similar signature and can be abstracted.

Deeply-nested associative arrays are a staple of Drupal 7, so a lot of legacy code expects this type of data. Shoehorning Drupal 8 to output something like Drupal 7’s field_info_fields(), for example, was a painful experience:

public function fieldInfoFields() {
  $return = array();
  $field_map = \Drupal::entityManager()->getFieldMap();
  foreach ($field_map as $entity_type => $fields) {
    foreach ($fields as $field => $field_info) {
      $return[$field]['entity_types'][$entity_type] = $entity_type;
      $return[$field]['field_name'] = $field;
      $return[$field]['type'] = $field_info['type'];
      $return[$field]['bundles'][$entity_type] = $field_info['bundles'];
    }
  }
  return $return;
}

Finally, making Drupal 8 work like Drupal 7 makes it hard to use Drupal 8’s advanced features such as Plugins. However, once your module is “universal”, adding Drupal 8-specific functionality might be an option.

Using this approach for website upgrades

This approach might remove a lot of the risk associated with complex site upgrades. Let’s say I have a Drupal 7 site with a few custom modules: each module can be made “universal” in this way. If automated tests are added for all subsequent development, migrating the functionality to Drupal 8 might be less painful.

A fun proof of concept, or real value?

I’ve been toying with this approach for some time, and had a good time (yes, that’s my definition of a good time!) implementing it, but it’s not for everyone or every project. If your usecase includes preserving legacy functionality without leveraging Drupal 8’s modern features, while reducing risk, it can have value though. The jury is still out on whether maintaining a single universal branch will really be more efficient than maintaining two separate branches for Realistic Dummy Content, and whether the approach can reduce risk during site upgrades of legacy custom code, which I plan to try on my next upgrade project.

Please enable JavaScript to view the comments powered by Disqus.

Oct 02 2016
Oct 02

October 02, 2016

Unless you work exclusively with Drupal developers, you might be hearing some criticism of the Drupal community, among them:

  • We are almost cult-like in our devotion to Drupal;
  • maintenance and hosting are expensive;
  • Drupal is really complicated;
  • we tend to be biased toward Drupal as a solution to any problem (the law of the instrument).

It is true that Drupal is a great solution in many cases; and I love Drupal and the Drupal community.

But we can only grow by getting off the Drupal island, and being open to objectively assess whether or not Drupal is right solution for a given use case and a given client.

“if you love something, set it free” —Unknown origin.

Case study: the Dcycle blog

I have built my entire career on Drupal, and I have been accused (with reason) several times of being biased toward Drupal; in 2016 I am making a conscious effort to be open to other technologies and assess my commitment to Drupal more objectively.

The result has been that I now tend to use Drupal for what it’s good at, data-heavy web applications with user-supplied content. However, I have integrated other technologies to my toolbox: among them node.js for real-time websocket communication, and Jekyll for sites that don’t need to be dynamic on the server-side. In fact, these technologies can be used alongside Drupal to create a great ecosystem.

My blog has looked like this for quite some time:

Very ugly design.

It seemed to be time to refresh it. My goals were:

  • Keeping the same paths and path aliases to all posts, for example blog/96/catching-watchdog-errors-your-simpletests and blog/96 and node/96 should all redirect to the same page;
  • Keep comment functionality;
  • Apply an open-source theme with minimal changes;
  • It should be easy for myself to add articles using the markdown syntax;
  • There should be a contact form.

My knee-jerk reaction would have been to build a Drupal 8 site, but looking at my requirements objectively, I realized that:

  • Comments can easily be exported to Disqus using the Disqus Migrate module;
  • For my contact form I can use formspree.io;
  • Other than the above, there is no user-generated content;
  • Upgrading my blog between major versions every few years is a problem with Drupal;
  • Security updates and hosting require a lot of resources;
  • Backups of the database and files need to be tested every so often, which also requires resources.

I eventually settled on moving this blog away from Drupal toward Jekyll, a static website generator which has the following advantages over Drupal for my use case:

  • What is actually publicly available is static HTML, ergo no security updates;
  • Because of its simplicity, testing backups is super easy;
  • My site can be hosted on GitHub using GitHub pages for free (although HTTPS is not supported yet for custom domain names Github pages now supports secure HTTPS via Let’s encrypt);
  • All content and structure is stored in my git repo, so adding a blog post is as simple as adding a file to my git repo;
  • No PHP, no MySQL, just plain HTML and CSS: my blog now feels lightning fast;
  • Existing free and open-source templates are more plentiful for Jekyll than for Drupal, and if I can’t find what I want, it is easier to convert an HTML template to Jekyll than it is to convert it to Drupal (for me anyway).
  • Jekyll offers plugins for all of my project’s needs, including the Jekyll Redirect Form gem to define several paths for a single piece of content, including a canonical URL (permalink).

In a nutshell, Jekyll works by regenerating an entirely new static website every time a change is made to underlying structured data, and putting the result in a subdirectory called _site. All content and layout is structured in the directory hierarchy, and no database is used.

Exporting content from Drupal to Jekyll

Depending on the complexity of your content, this will likely be the longest part of your migration, and will necessitate some trial and error. For the technical details of my own migration, see my blog post Migrating content from Drupal to Jekyll.

What I learned

I set out with the goal of performing the entire migration in less than a few days, and I managed to do so, all the while learning more about Jekyll. I decided to spend as little time as possible on the design, instead reusing brianmaierjr’s open-source Long Haul Jekyll theme. I estimate that I have managed to perform the migration to Jekyll in about 1/5th the time it would have taken me to migrate to Drupal 8, and I’m saving on hosting and maintenance as well. Some of my clients are interested in this approach as well, and are willing to trade an administrative backend for a large reduction in risk and cost.

So how do users enter content?

Being the only person who updates this blog, I am confortable adding my content (text and images) as files in Github, but most non-technical users will prefer a backend. A few notes on this:

  • First, I have noticed that even though it is possible for clients to modify their Drupal site, many actually do not;
  • Many editors consider the Drupal backend to be very user-unfriendly to begin with, and may be willing instead of it to accept the technical Github interface and a little training if it saves them development time.
  • I see a big future for Jekyll frontends such as Prose.io which provide a neat editing interface (including image insertion) for editors of Jekyll sites hosted on GitHub.

Conclusion

I am not advocating replacing your Drupal sites with Jekyll, but in some cases we may benefit as a community by adding tools other than the proverbial hammer to our toolbox.

Static site generators such as Jekyll are one example of this, and with the interconnected web, making use of Drupal for what it’s good at will be, in the long term, good for Drupal, our community, our clients, and ourselves as developers

Please enable JavaScript to view the comments powered by Disqus.

Sep 19 2016
Sep 19

September 19, 2016

Docker is now available natively on Mac OS in addition to Linux. Docker is also included with CoreOS which you can run on remote Virtual Machines, or locally through Vagrant.

Once you have installed Docker and Git, locally or remotely, you don’t need to install anything else.

In these examples we will leverage the official Drupal and mySQL Docker images. We will use the mySQL image as is, and we will add Drush to our Drupal image.

Docker is efficient with caching: these scripts will be slow the first time you run them, but very fast thereafter.

Here are a few scripts I often use to set up quick Drupal 7 or 8 environments for module evaluation and development.

Keep in mind that using Docker for deployment to production is another topic entirely and is not covered here; also, these scripts are meant to be quick and dirty; docker-compose might be useful for more advanced usage.

Port mapping

In all cases, using -p 80, I map port 80 of Drupal to any port that happens to be available on my host, and in these examples I am using Docker for Mac OS, so my sites are available on localhost.

I use DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g') to figure out the current port of my running containers. When your containers are running, you can also just docker ps to see port mapping:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
f1bf6e7e51c9        drupal8-image       "apache2-foreground"     15 seconds ago      Up 11 seconds       0.0.0.0:32771->80/tcp   drupal8-container
...

In the above example (scroll right to see more outpu), port http://localhost:32771 will show your Drupal 8 site.

Using Docker to evaluate, patch or develop Drupal 7 modules

I can set up a quick environment to evaluate one or more Drupal 7 modules. In this example I’ll evaluate Views.

mkdir ~/drupal7-modules-to-evaluate
cd ~/drupal7-modules-to-evaluate
git clone --branch 7.x-3.x https://git.drupal.org/project/views.git
# add any other modules for evaluation here.

echo 'FROM drupal:7' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal7-image .
docker run --name d7-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/sites/all/modules --name drupal7-container -p 80 --link d7-mysql-container:mysql -d drupal-image

DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 15

docker exec drupal7-container /bin/bash -c "echo 'create database drupal'|mysql -uroot -proot -hmysql"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush en views_ui -y"
# enable any other modules here. Dependencies will be downloaded
# automatically

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Note that we are linking (rather than adding) sites/all/modules as a volume, so any change we make to our local copy of views will quasi-immediately be reflected on the container, making this a good technique to develop modules or write patches to existing modules.

When you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal7-container d7-mysql-container
docker rm drupal7-container d7-mysql-container

Using Docker to evaluate, patch or develop Drupal 8 modules

Our script for Drupal 8 modules is slightly different:

  • ./modules is used on the container instead of ./sites/all/modules;
  • Our Dockerfile is based on drupal:8, not drupal:7;
  • Unlike with Drupal 7, your database is not required to exist prior to installing Drupal with Drush;
  • In my tests I need to chown /var/www/html/sites/default/files to www-data:www-data to enable Drupal to write files.

Here is an example where we are evaluating the Token module for Drupal 8:

mkdir ~/drupal8-modules-to-evaluate
cd ~/drupal8-modules-to-evaluate
git clone --branch 8.x-1.x https://git.drupal.org/project/token.git
# add any other modules for evaluation here.

echo 'FROM drupal:8' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal8-image .
docker run --name d8-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/modules --name drupal8-container -p 80 --link d8-mysql-container:mysql -d drupal8-image

DRUPALPORT=$(docker ps|grep drupal8-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 15

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal8-container /bin/bash -c "chown -R www-data:www-data /var/www/html/sites/default/files"
docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush en token -y"
# enable any other modules here.

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Again, when you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal8-container d8-mysql-container
docker rm drupal8-container d8-mysql-container

Please enable JavaScript to view the comments powered by Disqus.

Jul 06 2015
Jul 06

July 06, 2015

If you are using a site deployment module, and running simpletests against it in your continuous integration server using drush test-run, you might come across Simpletest output like this in your Jenkins console output:

Starting test MyModuleTestCase.                                         [ok]
...
WD rules: Unable to get variable some_variable, it is not           [error]
defined.
...
MyModuleTestCase 9 passes, 0 fails, 0 exceptions, and 7 debug messages  [ok]
No leftover tables to remove.                                           [status]
No temporary directories to remove.                                     [status]
Removed 1 test result.                                                  [status]
 Group  Class  Name

In the above example, the Rules module is complaining that it is misconfigured. You will probably be able to confirm this by installing a local version of your site along with rules_ui and visiting the rules admin page.

Here, it is rules which is logging a watchdog error, but it could by any module.

However, this will not necessarily cause your test to fail (see 0 fails), and more importantly, your continuous integration script will not fail either.

At first you might find it strange that your console output shows [error], but that your script is still passing. You script probably looks something like this:

set -e
drush test-run MyModuleTestCase

So: drush test-run outputs an [error] message, but is still exiting with the normal exit code of 0. How can that be?

Well, your test is doing exactly what you are asking of it: it is asserting that certain conditions are met, but you have never explicitly asked it to fail when a watchdog error is logged within the temporary testing environment. This is normal: consider a case where you want to assert that a given piece of code logs an error. In your test, you will create the necessary conditions for the error to be logged, and then you will assert that the error has in fact been logged. In this case your test will fail if the error has not been logged, but will succeed if the error has been logged. This is why the test script should not fail every time there is an error.

But in our above example, we have no way of knowing when such an error is introduced; to ensure more robust testing, let’s add a teardown function to our test which asserts that no errors were logged during any of our tests. To make sure that the tests don’t fail when errors are expected, we will allow for that as well.

Add the following code to your Simpletest (if you have several tests, consider creating a base test for all of them to avoid reusing code):

/**
 * {inheritdoc}
 */
function tearDown() {
  // See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
  $num_errors = $this->getNumWatchdogEntries(WATCHDOG_ERROR);
  $expected_errors = isset($this->expected_errors) ? $this->expected_errors : 0;
  $this->assertTrue($num_errors == $expected_errors, 'Expected ' . $expected_errors . ' watchdog errors and got ' . $num_errors . '.');

  parent::tearDown();
}

/**
 * Get the number of watchdog entries for a given severity or worse
 *
 * See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
 *
 * @param $severity = WATCHDOG_ERROR
 *   Severity codes are listed at https://api.drupal.org/api/drupal/includes%21bootstrap.inc/group/logging_severity_levels/7
 *   Lower numbers are worse severity messages, for example an emergency is 0, and an
 *   error is 3.
 *   Specify a threshold here, for example for the default WATCHDOG_ERROR, this function
 *   will return the number of watchdog entries which are 0, 1, 2, or 3.
 *
 * @return
 *   The number of watchdog errors logged during this test.
 */
function getNumWatchdogEntries($severity = WATCHDOG_ERROR) {
  $results = db_select('watchdog')
      ->fields(NULL, array('wid'))
      ->condition('severity', $severity, '<=')
      ->execute()
      ->fetchAll();
  return count($results);
}

Now, all your tests which have this code will fail if there are any watchdog errors in it. If you are actually expecting there to be errors, then at some point in your test you could use this code:

$this->expected_errors = 1; // for example

Please enable JavaScript to view the comments powered by Disqus.

Jun 10 2015
Jun 10

June 10, 2015

Edit, this blog post is deprecated, see blog.dcycle.com/unit instead!

To me, modern code must be tracked by a continuous integration server, and must have automated tests. Anything else is legacy code, even if it was rolled out this morning.

In the last year, I have adopted a policy of never modifying any legacy code, because even a one-line change can have unanticipated effects on functionality, plus there is no guarantee that you won’t be re-fixing the same problem in 6 months.

This article will focus on a simple technique I use to bring legacy Drupal code under a test harness (hence transforming it into modern code), which is my first step before working on it.

Unit vs. functional testing

If you have already written automated tests for Drupal, you know about Simpletest and the concept of functional web-request tests with a temporary database: the vast majority of tests written for Drupal 7 code are based on the DrupalWebTestCase, which builds a Drupal site from scratch, often installing something like a site deployment module, using a temporary database, and then allows your test to make web requests to that interface. It’s all automatic and temporary environments are destroyed when tests are done.

It’s great, it really simulates how your site is used, but it has some drawbacks: first, it’s a bit of a pain to set up: your continuous integration server needs to have a LAMP stack or spin up Vagrant boxes or Docker containers, you need to set up virtual hosts for your code, and most importantly, it’s very time-consuming, because each test case in each test class creates a brand new Drupal site, installs your modules, and destroys the environment.

(I even had to write a module, Simpletest Turbo, to perform some caching, or else my tests were taking hours to run (at which point everyone starts ignoring them) – but that is just a stopgap measure.)

Unit tests, on the other hand, don’t require a database, don’t do web requests, and are lightning fast, often running in less than a second.

This article will detail how I use unit testing on legacy code.

Typical legacy code

Typically, you will be asked to make a “small change” to a function which is often 200+ lines long, and uses global variables, performs database requests, and REST calls to external services. But I’m not judging the authors of such code – more often than not, git blame tells me that I wrote it myself.

For the purposes of our example, let’s imagine that you are asked to make change to a function which returns a “score” for the current user.

function mymodule_user_score() {
  global $user;
  $user = user_load($user->uid);
  $node = node_load($user->field_score_nid['und'][0]['value']);
  return $node->field_score['und'][0]['value'];
}

This example is not too menacing, but it’s still not unit testable: the function calls the database, and uses global variables.

Now, the above function is not very elegant; our first task is to ignore our impulse to improve it. Remember: we’re not going to even touch any code that’s not under a test harness.

As mentioned above, we could write a subclass of DrupalWebTestCase which provisions a database, we could create a node, a user, populate it, and then run the function.

But we would rather write a unit test, which does not need externalities like the database or global variables.

But our function depends on externalities! How can we ignore them? We’ll use a technique called dependency injection. There are several approaches to dependency injection; and Drupal 8 code supports it very well with PHPUnit; but we’ll use a simple implementation which requires the following steps:

  • Move the code to a class method
  • Move dependencies into their own methods
  • Write a subclass replaces dependencies (not logic) with mock implementations
  • Write a test
  • Then, and only then, make the “small change” requested by the client

Let’s get started!

Move the code to a class method

For dependency to work, we need to put the above code in a class, so our code will now look like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    global $user;
    $user = user_load($user->uid);
    $node = node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }
}

function mymodule_user_score() {
  $score = new MyModuleUserScore();
  return $score->mymodule_user_score();
}

That wasn’t that hard, right? I like to keep each of my classes in its own file, but for simplicity’s sake let’s assume everything is in the same file.

Move dependencies into their own methods

There are a few dependencies in this function: global $user, user_load(), and node_load(). All of these are not available to unit tests, so we need to move them out of the function, like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    $user = $this->globalUser();
    $user = $this->user_load($user->uid);
    $node = $this->node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }

  function globalUser() {
    return global $user;
  }

  function user_load($uid) {
    return user_load($uid);
  }

  function node_load($nid) {
    return node_load($nid);
  }

}

Your dependency methods should generally only contain one line. The above code should behave in exactly the same way as the original.

Override dependencies in a subclass

Our next step will be to provide mock versions of our dependencies. The trick here is to make our mock versions return values which are expected by the main function. For example, we can surmise that our user is expected to have a field_score_nid, which is expected to contain a valid node id. We can also make similar assumptions about how our node is structured. Let’s make mock responses with these assumptions:

class MyModuleUserScoreMock extends MyModuleUserScore {
  function globalUser() {
    return (object) array(
      'uid' => 123,
    );
  }

  function user_load($uid) {
    if ($uid == 123) {
      return (object) array {
        field_score_nid => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 234,
            ),
          ),
        ),
      }
    }
  }

  function node_load($nid) {
    if ($nid == 234) {
      return (object) array {
        field_score => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 3000,
            ),
          ),
        ),
      }
    }
  }

}

Notice that our return values are not meant to be complete: they only contain the minimal data expected by our function: our mock user object does not even contain a uid property! But that does not matter, because our function is not expecting it.

Write a test

It is now possible to write a unit test for our logic without requiring the database. You can copy the contents of this sample unit test to your module folder as mymodule.test, add files[] = mymodule.test to your mymodule.info, enable the simpletest modules and clear your cache.

There remains the task of actually writing the test: in your testModule() function, the following lines will do:

public function testModule() {
  // load the file or files where your classes are located. This can
  // also be done in the setUp() function.
  module_load_include('module', 'mymodule');

  $score = new MyModuleUserScoreMock();
  $this->assertTrue($score->mymodule_user_score() == 3000, 'User score function returns the expected score');
}

Run your test

All that’s left now is to run your test:

php ./scripts/run-tests.sh --class mymoduleTestCase

Then add above line to your continuous integration server to make sure you’re notified when someone breaks it.

Your code is now ready to be fixed

Now, when your client asks for a small or big change, you can use test-driven development to implement it. For example, let’s say your client wants all scores to be multiplied by 10 (30000 should be the score when 3000 is the value in the node):

  • First, modify your unit test to make sure it fails: make the test expect 30000 instead of 3000
  • Next, change your code iteratively until your test passes.

What’s next

This has been a very simple introduction to dependency injection and unit testing for legacy code: if you want to do even more, you can make your Mock subclass as complex as you wish, simulating corrupt data, nodes which don’t load, and so on.

I highly recommend getting familiar with PHPUnit, which is part of Drupal 8, and which takes dependency injection to a whole new level: Juan Treminio’s “Unit Testing Tutorial Part I: Introduction to PHPUnit”, March 1, 2013 is the best introduction I’ve found.

I do not recommend doing away entirely with functional, database, and web tests, but a layered approach where most of your tests are unit tests, and you limit the use of functional tests, will allow you to keep your test runs below an acceptable duration, making them all the more useful, and increasing the overall quality of new and even legacy code.

Please enable JavaScript to view the comments powered by Disqus.

Feb 23 2015
Feb 23

February 23, 2015

Continuous integration (CI) is the practice of running a series of checks on every push of your code, to make sure it is always in a potentially deployable state; and to make sure you are alerted as soon as possible if it is not.

Continuous integration and Drupal projects

This blog post is aimed at module maintainers, and we’ll look at how to use CI for modules hosted on Drupal.org. I’ll use as an example a project I’m maintaining, Realistic Dummy Content.

The good news is that Drupal.org has a built-in CI service for hosted modules: to use it, project maintainers need to click on the “Automated Testing” tab of their projects, enable automated testing, and make sure some tests are defined.

Once you have enabled automated testing, every submitted patch will be applied to the code and tested, and the main branches will be tested continually as well.

If you’re not sure how to write tests, you can learn by example by looking at the test code of any module which has automated testing enabled.

Limitations of the Drupal.org QA system

The system described above is great, and in this blog post we’ll explore how to take it a bit further. Drupal’s CI service runs your code on a new Drupal site with PHP 5.3 enabled. We know this by looking at the log for a test on Realistic Dummy content, which contains:

[13:50:02] Database backend [mysql] loaded.
...
[simpletest.db] =>
[test.php.version] => 5.3
...

For the sake of this article, let’s say we want to use SQLite with php 5.5, and we also want to run checks from the coder project’s coder_review module. We can’t achieve this within the Drupal.org infrastructure, but it is possible using Docker, CircleCI, and GitHub. Here is how.

Step 1: get a local CoreOS+Docker environment

Let’s start by setting up a local development environment on which we can run Docker. Docker is a system which uses Linux containers to run your software and all its dependencies in an isolated environment.

If you need a primer on Docker, check out Getting Started with Docker on Servers for Hackers (March 20, 2014), and A quick intro to Docker for a Drupal project.

Docker works best on CoreOS, which you can install quite easily on any computer using Vagrant and VirtualBox, as explained at Running CoreOS on Vagrant.

Step 2: Add a Dockerfile to your project

Because, in this example, we want to run tests which require changing things on the server, we’ll use the Docker container management system to simulate a Ubuntu machine over which we have complete control.

To see how this works, download the latest dev version of realistic_dummy_content to your CoreOS VM, take a look at the included files ./Dockerfile and ./scripts/test.sh to see how they are structured, then run the test script:

./scripts/test.sh

Without any further configuration, you will see tests run on the desired environment: Ubuntu with the correct version of PHP, SQLite, and coder review. (You can also see the results on CircleCI on the project’s CI dashbaord if you unfold the “test” section – we’ll see how to set that up for your project later on).

Setting up Docker for your own project is just a question of copy-pasting a few scripts.

Step 3: Make sure there is a mirror of your project on GitHub

Having test results on your command line is nice, but there is no reason to run them yourself. For that we use continuous integration (CI) servers, which run the tests every time someone commits something to your codebase.

Some of you might be familiar with Jenkins, which I use myself and which is great, but for open source projects, there are free CI services out there: the two I know of, CircleCI and Travis CI, synchronize with GitHub, not with Drupal.org, so you need a mirror of your project on GitHub.

Note that it is possible, using the tool HubDrop, to mirror your project on GitHub, but it’s not on your account, whereas the CI tools sync only with projects on your own account. My solution has been to add a ./scripts/mirror.sh script to Realistic Dummy Content, and call it once every ten minutes via a Jenkins job on my personal Jenkins server. If you don’t have access to a Jenkins server you can also use a cron job on any server to do this.

The mirror of Realistic Dummy Content on GitHub is here.

As mentioned above, two of the CI tools out there are CircleCI and Travis CI. One of my requirements is that the CI tool integrate well with Docker, because that’s my DevOps tool of choice.

As mentioned in Faster Builds with Container-Based Infrastructure and Docker (Mathias Meyer, Travis CI blog, 17 Dec. 2014), it seems that Travis CI is moving towards Docker, but it seems that its new infrastructure is based on Docker, but does not let you run your own Docker containers.

Circle CI, on the other hand, seems to provide more flexibility with regards to Docker, as explained in the article Continuous Integration and Delivery with Docker on CircleCI’s website.

Although Travis is a great, widely-used tool (Drush uses it), we’ll use CircleCI because I found it easier to set up with Docker.

Once you open a CircleCI account and link it to your GitHub account, you will be able to turn on CI for your mirrored project, in my case Realistic Dummy Content.

Step 5: Add a circle.yml file to your project

In order for Circle CI to know what to do with your project, it needs a circle.yml file at the root of your project. If you look at the circle.yml file at the root Realistic Dummy Content, it is actually quite simple:

machine:
  services:
    - docker

test:
  override:
    - ./scripts/test.sh

That’s it! Commit your circle.yml file, and if mirroring with GitHub works correctly, Circle CI will test your build. Debug any errors you may have, and voilà!

Here is the result of a recent Realistic Dummy Content build on CircleCI: unfold the “test” section to see the complete output: PHP version, SQLite database, coder review…

Conclusion

We have seen how you can easily add Docker support to make sure the tests and checks you run on your code are in a controlled environment, with the extensions you need (one could imagine a module which requires some external system like ApacheSolr installed on the server – Docker allows this too). This is one concrete application of DevOps: reducing the risk of glitches where “tests pass on my dev machine but not on my CI server”.

Please enable JavaScript to view the comments powered by Disqus.

Feb 18 2015
Feb 18

February 18, 2015

I recently added Docker support to Realistic Dummy Content, a project I maintain on Drupal.org. It is now possible (with Docker installed, preferably on a CoreOS VM) to run ./scripts/dev.sh directly from the project directory (use the latest dev version if you try this), and have a development environment, sans MAMP.

I don’t consider myself an expert in Docker, virtualization, DevOps and config management, but here, nonetheless, is my experience. If I’m wrong about something, please leave a comment!

Intro: Docker and DevOps

The DevOps movement, popularized starting in about 2010, promises to include environment information along with application information in the same git repo for smoother development, testing, and production environments. For example, if your Drupal module requires version 5.4 of PHP, along with a given library, then that information should be somewhere in your Git repo. Building an environment for testing, development or production should then use that information and not be dependent on anything which is unversioned. Docker is a tool which is anchored in the DevOps movement.

DevOps: the Config management approach

The family of tools which has been around for awhile now includes Puppet, Chef, and Ansible. These tools are configuration management tools: they define environment information (PHP version should be 5.3, Apache mod_rewrite should be on, etc.) and make sure a given environment conforms to that information.

I have used Puppet, along with Vagrant, to deliver applications, including my Jenkins server hosted on GitHub.

Virtualization and containers

Using Puppet and Vagrant, you need to use Virtualization: create a Virtual Machine on your host machine.

Docker works with a different principle: instead of creating a VM on top of your host OS, Docker uses containers, so resources are shared. The article Getting Started with Docker (Servers for Hackers, 2014/03/20) contains some graphics which demonstrate how much more efficient containers are as opposed to virtualization.

Puppet and Vagrant are slow; Docker is fast

Puppet and Vagrant together work for packaging software and environment configuration, but it is excruciatingly slow: it can take several minutes to launch an environment. My reaction to this has been to cringe every time I have to do it.

Docker, on the other hand, uses caching agressively: if a server was already in a given state, Docker uses a cached version of it to move along faster. So, when building a container, Docker goes through a series of steps, and caches each step to make it lightning fast.

One example: launching a dev environment of the Jenkins Vagrant project on Mac OS takes over five minutes, but launching a dev environment of my Drupal project Realistic Dummy Content (which uses Docker), takes less than 15 seconds the first time it is run once the server code has been downloaded, and, because of caching, less than one (1) second subsequent times if no changes have been made. Less than one second to fire up a full-fledged development environment which is functionally independent from your host. That’s huge to me.

Configuration management is idempotent, Docker is not

Before we move on, note that Docker is not incompatible with config management tools, but Docker does not require them. Here is why I think, in many cases, config management tools are not necessary.

The config management tools such as Puppet are idempotent: you define how an environment should be, and the tools run whatever steps are necessary to make it that way. This sounds like a good idea in theory, but it looks like this in practice. I have come to the conclusion that this is not the way I think, and it forces me to relearn how to think of my environments. I suspect that many developers have a hard time wrapping their heads around idempotence.

Docker is not idempotent; it defines a series of steps to get to a given state. If you like idempotence, one of the steps can be to run a puppet manifest; but if, like me, you think idempotence is overrated, then you don’t need to use it. Here is what a Dockerfile looks like: I understood it at first glace, it doesn’t require me to learn a new way of thinking.

The CoreOS project

The CoreOS project has seen the promise of Docker and containers. It is an OS which ships with Docker, Git, and a few other tools, but is designed so that everything you do happens within containers (using the included Docker, and eventually Rocket, a tool they are building). The result is that CoreOS is tiny: it takes 10 seconds to build a CoreOS instance on DigitalOcean, for example, but almost a minute to set up a CentOS instance.

Because Docker does not work on Mac OS without going through hoops, I decided to use Vagrant to set up a CoreOS VM on my Mac, which is speedy and works great.

Docker for deploying to production

We have seen that Docker can work for quickly setting up dev and testing environments. Can it be used to deploy to production? I don’t see why not, especially if used with CoreOS. For an example see the blog post Building an Internal Cloud with Docker and CoreOS (Shopify, Oct. 15, 2014).

In conclusion, I am just beginning to play with Docker, and it just feels right to me. I remember working with Joomla in 2006, when I discovered Drupal, and it just felt right, and I have made a career of it since then. I am having the same feeling now discovering Docker and CoreOs.

I am looking forward to your comments explaining why I am wrong about not liking idempotence, how to make config management and virutalization faster, and how and why to integrate config management tools with Docker!

Please enable JavaScript to view the comments powered by Disqus.

Feb 09 2015
Feb 09

February 09, 2015

To get the most of this blog post, please read and understand Getting Started with Docker (Servers for Hackers, 2014/03/20). Also, all the steps outlined here have been done on a Vagrant CoreOS virtual machine (VM).

I recently needed a really simple non-production Drupal Docker image on which I could run tests. b7alt/drupal (which you can find by typing docker search drupal, or on GitHub) worked for my needs, except that it did not have the cUrl php library installed, so drush en simpletest -y was throwing an error.

Therefore, I decided to create a new Docker image which is based on b7alt/drupal, but with the php5-curl library installed.

I started by creating a new local directory (on my CoreOS VM), which I called docker-drupal:

mkdir docker-drupal

In that directory, I created Dockerfile which takes b7alt/drupal as its base, and runs apt-get install curl.

FROM b7alt/drupal

RUN apt-get update
RUN apt-get -y install curl

(You can find this code at my GitHub account at alberto56/docker-drupal.)

When you run this you will get:

docker build .
...
Successfully built 55a8c8999520

That hash is a Docker image ID, and your hash might be different. You can run it and see if it works as expected:

docker run -d 55a8c8999520
c9a98bdcab4e027e8571bde71ee92b4380247a44ef9314749ef5680864de2928

In the above, we are telling Docker to create a container based on the image we just created (55a8c8999520). The resulting container hash is displayed (yours might be different). We are using -d so that our containers runs in the background. You can see that the container is actually running by typing:

docker ps
CONTAINER ID        IMAGE               COMMAND...
c9a98bdcab4e        55a8c8999520        "/usr/bin/supervisor...

This tells you that there is a running container (c9a98bdcab4e) based on the image 55a8c8999520. Again, your hases will be different. Let’s log into that container now:

docker exec -it c9a98bdcab4e bash
[email protected]:/#

To make sure that cUrl is successfully installed, I will figure out where Drupal resides on this container, and then try to enable Simpletest. If that works, I will consider my image a success, and exit from my container:

[email protected]:/# find / -name 'index.php'
/srv/drupal/www/index.php
[email protected]:/# cd /srv/drupal/www
[email protected]:/srv/drupal/www# drush en simpletest -y
The following extensions will be enabled: simpletest
Do you really want to continue? (y/n): y
simpletest was enabled successfully.                   [ok]
[email protected]:/srv/drupal/www# exit
exit

Now I know that my 55a8c8999520 image is good for now and for my purposes; I can create an account on Docker.com and push it to my account for later use:

Docker build -t alberto56/docker-drupal .
docker push alberto56/docker-drupal

Anyone can now run this Docker image by simply typing:

docker run alberto56/docker-drupal

One thing I had a hard time getting my head around was having a GitHub project and Docker project, and both are different but linked. The GitHub project is the the recipe for creating an image, whereas the Docker project is the image itself.

One we start thinking of our environments like this (as entities which should be versioned and shared), the risk of differences between environments is greatly reduced. I was used to running simpletests for my projects on an environment which is managed by hand; when I got a strange permissions error on the test environment, I decided to start using Docker and version control to manage the container where tests are run.

Please enable JavaScript to view the comments powered by Disqus.

Feb 06 2015
Feb 06

February 06, 2015

I have been using Simpletest on Drupal 7 for several years, and, used well, it can greatly enhance the quality of your code. I like to practice test-driven development: writing a failing test first, then run it multiple times, each time tweaking the code, until the test passes.

Simpletest works by spawning a completely new Drupal site (ignoring your current database), running tests, and destroying the database. Sometimes, a test will fail and you’re not quite sure why. Here are two tips to help you debug why your tests are failing:

Tip #1: debug()

The Drupal debug() function can be placed anywhere in your test or your source code, and the result will appear on the test results page in the GUI.

For example, if when you are playing around with the dev version of your site, things work fine, but in the test, a specific node contains invalid data, you can add this line anywhere in your test or source code which is being called during your test:

...
debug($node);
...

This will provide formatted output of your $node variable, alongside your test results.

Tip #2: die()

Sometimes the temporary test environment’s behaviour seems to make no sense. And it can be frustrating to not be able to simply log into it and play around with it, because it is destroyed after the test is over.

To understand this technique, here is quick primer on how Simpletest works:

  • In Drupal 7, running a test requires a host site and database. This is basically an installed Drupal site with Simpletest enabled, and your module somewhere in the modules directory (the module you are testing does not have to be enabled).
  • When you run a test, Simpletest creates a brand-new installation of Drupal using a special prefix simpletest123456 where 123456 is a random number. This allows Simpletest to have an isolated environment where to run tests, but on the same database and with the same credentials as the host.
  • When your test does something, like call a function, or load a page with, for example, $this->drupalGet('user'), the host environment is ignored and temporary environment (which uses the prefixed database tables) is used. In the previous example, the test loads the “user” page using a real HTTP calls. Simpletest knows to use the temporary environment because the call is made using a specially-crafted user agent.
  • When the test is over, all tables with the prefix simpletest123456 are destroyed.

If you have ever tried to run a test on a host environment which already contains a prefix, you will understand why you can get “table name too long” errors in certain cases: Simpletest is trying to add a prefix to another prefix. That’s one reason to avoid prefixes when you can, but I digress.

Now you can try this: somewhere in your test code, add die(), this will kill Simpletest, leaving the temporary database intact.

Here is an example: a colleague recently was testing a feature which exported a view. In the dev environment, the view was available to users with the role manager, as was expected. However when the test logged in as a manager user and attempted to access the view, the result was an “Access denied” page.

Because we couldn’t easily figure it out, I suggested adding die() to play around in the environment:

...
$this->drupalLogin($manager);
$this->drupalGet('inventory');
die();
$this->assertNoText('denied', 'A manager accessing the inventory page does not see "access denied"');
...

Now, when the test was run, we could:

  • wait for it to crash,
  • then examine our database to figure out which prefix the test was using,
  • change the database prefix in sites/default/settings.php from '' to (for example) 'simpletest73845'.
  • run drush uli to get a one-time login.

Now, it was easier to debug the source of the problem by visiting the views configuration for inventory: it turns out that features exports views with access by role using the role ID, not the role name (the role ID can be different for each environment). Simply changing the access method for the view from “by role” to “by permission” made the test pass, and prevented a potential security flaw in the code.

(Another reason to avoid “by role” access in views is that User 1 often does not have the role required, and it is often disconcerting to be user 1 and have “access denied” to a view.)

So in conclusion, Simpletest is great when it works as expected and when you understand what it does, but when you don’t, it is always good to know a few techniques for further investigation.

Please enable JavaScript to view the comments powered by Disqus.

Jan 20 2015
Jan 20

January 20, 2015

When building a Drupal 7 site, one oft-used technique is to keep the entire Drupal root under git (for Drupal 8 sites, I favor having the Drupal root one level up).

Starting a new project can be done by downloading an unversioned copy of D7, and initializing a git repo, like this:

Approach #1

drush dl
cd drupal*
git init
git add .
git commit -am 'initial project commit'
git remote add origin ssh://[email protected]/myproject

Another trick I learned from my colleagues at the Linux Foundation is to get Drupal via git and have two origins, like this:

Approach #2

git clone --branch 7.x http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

This second approach lets you push changes to your own repo, and pull changes from the Drupal git repo. This has the advantage of keeping track of Drupal project commits, and your own project commits, in a unified git history.

git push origin 7.x
git pull drupal 7.x

If you are tight for space though, there might be one inconvenience: Approach #2 keeps track of the entire Drupal 7.x commit history, for example we are now tracking in our own repo commit e829881 by natrak, on June 2, 2000:

git log |grep e829881 --after-context=4
commit e8298816587f79e090cb6e78ea17b00fae705deb
Author: natrak <>
Date:   Fri Jun 2 18:43:11 2000 +0000

    CVS drives me nuts *G*

All of this information takes disk space: Approach #2 takes 156Mb, vs. 23Mb for approach #1. This may add up if you are working on several projects, and especially if for each project you have several environments for feature branches. If you have a continuous integration server tracking multiple projects and spawning new environments for each feature branch, several gigs of disk space can be used.

If you want to streamline the size of your git repos, you might want to try the --depth option of git clone, like this:

Approach #3

git clone --branch 7.x --depth 1 http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

Adding the --depth parameter here reduces the initial size of your repo to 18Mb in my test, which interestingly is even less than approach #1. Even though your repo is now linked to the Drupal git repo, by running git log you will see that the entire history is not being stored.

Please enable JavaScript to view the comments powered by Disqus.

Dec 03 2014
Dec 03

December 03, 2014

What is content? What is configuration? At first glance, the question seems simple, almost quaint, the kind one finds oneself patiently answering for the benefit of Drupal novices: content is usually information like nodes and taxonomy terms, while content types, views and taxonomy vocabularies are usually configuration.

Content lives in the database of each environment, we say, while configuration is exportable via Features or other mechanisms and should live in the Git repo (this has been called code-driven development).

Still, a definition of content and configuration is naggingly elusive: why “usually”? Why are there so many edge cases? We’re engineers, we need precision! I often feel like I’m trying to define what a bird is: every child knows what a bird is, but it’s hard to define it. Ostriches can’t fly; platypuses lay eggs but aren’t birds.

Why the distinction?

I recently saw an interesting comment titled “A heretic speaks” on a blog post about code-driven development. It sums up some of the uneasiness about the place of configuration in Drupal: “Drupal was built primarily with site builders in mind, and this is one reason [configuration] is in the database”.

In effect, the primary distinction in Drupal is between code (Drupal core and config), and the database, which contains content types, nodes, and everything else.

As more complex sites were being built, a new distinction had to be made between two types of information in the database: configuration and content. This was required to allow development in a dev-stage-production workflow where features being developed outside of a production site could be deployed to production without squashing the database (and existing comments, nodes, and the like). We needed to move those features into code and we called them “configuration”.

Thus the features module was born, allowing views, content types, and vocabularies (but not nodes and taxonomy terms) to be developed outside of the database, and then deployed into production.

Drupal 8’s config management system takes that one step further by providing a mature, central API to deal with this.

The devil is in the details

This is all fine and good, but edge cases soon begin to arise:

  • What about an “About us” page? It’s a menu item (deployable) linking to a node (content). Is it config? Is it content?
  • What about a “Social media” menu and its menu items? We want a Facebook link to be deployable, but we don’t want to hard-code the actual link to our client’s Facebook page (which feels like content) – we probably don’t even know what that link is during development.
  • What about a block whose placement is known, but whose content is not? Is this content? Is it configuration?
  • What about a view which references a taxonomy term id in a hard-coded filter. We can export the view, but the taxonomy term has an incremental ID ans is not guaranteed to work on all environments.

The wrong answer to any of these questions can lead to a misguided development approach which will come back to haunt you afterward. You might wind up using incremental IDs in your code or deploying something as configuration which is, in fact, content.

Defining our terms

At the risk of irking you, dear reader, I will suggest doing away with the terms “content” and “configuration” for our purposes: they are just too vague. Because we want a formal definition with no edge cases, I propose that we use these terms instead (we’ll look at each in detail a bit further on):

  • Code: this is what our deliverable is for a given project. It should be testable, versioned, and deployable to any number of environments.
  • Data: this is whatever is potentially different on each environment to which our code is deployed. One example is comments: On a dev environment, we might generate thousands of dummy comments for theming purposes, but on prod there might be a few dozen only.
  • Placeholder content: this is any data which should be created as part of the installation process, meant to be changed later on.

Code

This is what our deliverable is for a given project. This is important. There is no single answer. Let’s take the following examples:

  • If I am a contributor to the Views contrib project, my deliverable is a system which allows users to create views in the database. In this case I will not export many particular views.

  • For another project, my deliverable may be a website which contains a set number of lists (views). In this case I may use features (D7) or config management (D8) to export all the views my client asked for. Furthermore, I may enable views_ui (the Views User interface) only on my development box, and disable it on production.

  • For a third project, my deliverable may a website with a number of set views, plus the ability for the client to add new ones. In this only certain views will be in code, and I will enable the views UI as a dependency of my site deployment module. The views my client creates on production will be data.

Data

A few years ago, I took a step back from my day-to-day Drupal work and thought about what my main pain points were and how to do away with them. After consulting with colleagues, looking at bugs which took longest to fix, and looking at major sources of regressions, I realized that the one thing all major pain points had in common were our deployment techniques.

It struck me that cloning the database from production to development was wrong. Relying on production data to do development is sloppy and will cause problems. It is better to invest in realistic dummy content and a good site deployment module, allowing the standardized deployment of an environment in a few minutes from any commit.

Once we remove data from the development equation in this way, it is easier to define what data is: anything which can differ from one environment to the next without overriding a feature.

Furthermore, I like to think of production as just another environment, there is nothing special about it.

A new view or content type created on production outside of our development cycle resides on the database, is never used during the course of development, and is therefore data.

Nodes and taxonomy terms are data.

What about a view which is deployed through features and later changed on another environment? That’s a tough one, I’ll get to it (See Overriden features, below).

Placeholder content

Let’s get back to our “About us” page. Three components are involved here:

  • The menu which contains the “About us” menu item. These types of menus are generally deployable, so let’s call them code.
  • The “About us” node itself which has an incremental nid which can be different on each environment. On some environments it might not even exist.
  • The “About us” menu item, which should link to the node.

Remember: we are not cloning the production database, so the “About us” does not exist anywhere. For situations such as this, I will suggest the use of Placeholder content.

For sake of argument, let’s define our deliverable for this sample project as follows:

"Define an _About us_ page which is modifiable".

We might be tempted to figure out a way to assign a unique ID to our “About us” node to make it deployable, and devise all kinds of techniques to make sure it cannot be deleted or overridden.

I have an approach which I consider more logical for these situations:

First, in my site deployment module’s hook_update_N(), create the node and the menu item, bypassing features entirely. Something like:

function mysite_deploy_update_7023() {
  $node = new stdClass();
  $node->title = 'About us';
  $node->body[LANGUAGE_NONE][0]['format'] = 'filtered_html';
  $node->body[LANGUAGE_NONE][0]['value'] = 'Lorem ipsum...';
  $node->type = 'page';
  node_object_prepare($node);
  $node->uid = 1;
  $node->status = 1;
  $node->promote = 0;
  node_save($node);

  $menu_item = array(
    'link_path' => 'node/' . $node->nid,
    'link_title' => 'About us',
    'menu_name' => 'my-existing-menu-exported-via-features',
  );

  menu_link_save($item);
}

If you wish, you can also implement hook_requirements() in your custom module, to check that the About us page has not been accidentally deleted, that the menu item exists and points to a valid path.

What are the advantages of placeholder content?

  • It is deployable in a standard manner: any environment can simply run drush updb -y and the placeholder content will be deployed.
  • It can be changed without rendering your features (D7) or configuration (D8) overriden. This is a good thing: if our incremental deployment script calls features_revert() or drush fra -y (D7) or drush cim -y (D8), all changes to features are deleted. We do not want changes made to our placeholder content to be deleted.
  • It can be easily tested. All we need to do is make sure our site deployment module’s hook_install() calls all hook_update_N()s; then we can enable our site deployment module within our simpletest, and run any tests we want against a known good starting point.

Overriden features

Although it is easy to override features on production, I would not recommend it. It is important to define with your client and your team what is code and what is data. Again, this depends on the project.

When a feature gets overridden, it is a symptom that someone does not understand the process. Here are a few ways to mitigate this:

  • Make sure your features are reverted (D7) or your configuration is imported (D8) as part of your deployment process, and automate that process with a continuous integration server. That way, if anyone overrides a feature on a production, it won’t stay overridden long.
  • Limit administrator permissions so that only user 1 can override features (this can be more trouble than it’s worth though).
  • Implement hook_requirements() to check for overridden features, warning you on the environment’s dashboard if a feature has been overridden.

Some edge cases

Now, with our more rigorous approach, how do our edge cases fare?

Social media menu and items: Our deliverable here is the existence of a social media menu with two items (twitter and facebook), but whose links can be changed at any time on production without triggering an overridden feature. For this I would use placeholder content. Still, we need to theme each button separately, and our css does not know the incremental IDs of the menu items we are creating. I have successfully used the menu attributes module to associate classes to menu items, allowing easy theming. Here is an example, assuming menu_attributes exists and menu-social has been exported as a feature.

/**
 * Add facebook and twitter menu items
 */
function mysite_deploy_update_7117() {
  $item = array(
    'link_path' => 'http://twitter.com',
    'link_title' => 'Twitter',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'twitter',
      )
    )
  );
  menu_link_save($item);
  $item = array(
    'link_path' => 'http://facebook.com',
    'link_title' => 'Facebook',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'facebook',
      )
    )
  );
  menu_link_save($item);
}

The above code creates the menu items linking to Facebook and Twitter home pages, so that content editors can put in the correct links directly on production when they have them.

Placeholder content is just like regular data but it’s created as part of the deployment process, as a service to the webmaster.

A block whose placement is known, but whose content is not. It may be tempting to use the box module which makes blocks exportable with feature. But in this case the block is more like placeholder content, so it should be deployed outside of features. And if you create your block programmatically, its id is incremental and it cannot be deployed with context, but should be placed in a region directly, again, programmatically in a hook_update_N().

Another approach here is to create a content type and a view with a block display, fetching the last published node of that content type and displaying it at the right place. If you go that route (which seems a bit overengineered to me), you can then place your block with the context module and export it via features.

A view which references a taxonomy term id in its filter: If a view requires access to a taxonomy term nid, then perhaps taxonomy is the wrong tool here. Taxonomy terms are data, they can be deleted, their names can be changed. It is not a good idea for a view to reference a specific taxonomy term. (Your view can use taxonomy terms for contextual filters without a problem, but we don’t want to hard-code a specific term in a non-contextual filter – See this issue for an example of how I learned this the hard way, I’ll get around to fixing that soon…).

For this problem I would suggest rethinking our use of a taxonomy term. Rather I would define a select field with a set number of options (with defined keys and values). These are deployable and guaranteed to not change without triggering a features override. Thus, our views can safely use them. If you are implementing this change on an existing site, you will need to update all nodes from the old to the new technique in a hook_update_N() – and probably add an automated test to make sure you’re updating the data correctly. This is one more reason to think things through properly at the onset of your project, not midway through.

In conclusion

Content and configuration are hard to define, I prefer the following definitions:

  • Code: deployable, deliverable, versioned, tested piece of software.
  • Data: anything which can differ from one environment to the next.
  • Placeholder content: any data which should be created as part of the deployment process.

In my experience, what fits in each category depends on each project. Defining these with your team as part of your sprint planning will allow you create a system with less edge cases.

Please enable JavaScript to view the comments powered by Disqus.

Sep 10 2014
Sep 10

September 10, 2014

What is code-driven development and why is it done?

Code-driven development is the practice of placing all development in code. How can development not be in code?, you ask.

In Drupal, what makes your site unique is often configuration which resides in the database: the current theme, active modules, module-specific configuration, content types, and so on.

For the purpose of this article, our goal will be for all configuration (the current theme, the content types, module-specific config, the active module list…) to be in code, and only content to be in the database. There are several advantages to this approach:

  • Because all our configuration is in code, we can package all of it into a single module, which we’ll call a site deployment module. When enabled, this module should provide a fully workable site without any content.
  • When a site deployment module is combined with generated content, it becomes possible to create new instances of a website without cloning the database. Devel’s devel_generate module, and Realistic Dummy Content can be used to create realistic dummy content. This makes on-ramping new developers easy and consistent.
  • Because unversioned databases are not required to be cloned to set up new environments, your continuous integration server can set up new instances of your site based on a known good starting point, making tests more robust.

Code-driven development for Drupal 7

Before moving on to D8, let’s look at a typical D7 workflow: The technique I use for developing in Drupal 7 is making sure I have one or more features with my content types, views, contexts, and so on; as well as a site deployment module which contains, in its .install file, update hooks which revert my features when needed, enable new modules, and programmatically set configuration which can’t be exported via features. That way,

  • incrementally deploying sites is as simple as calling drush updb -y (to run new update hooks).
  • deploying a site for the first time (or redeploying it from scratch) requires creating the database, enabling our site deployment module (which runs all or update hooks), and optionally generating dummy content if required. For example: drush si -y && drush en mysite_deploy -y && drush en devel_generate && drush generate-content 50.

I have been using this technique for a few years on all my D7 projects and, in this article, I will explore how something similar can be done in D8.

New in Drupal 8: configuration management

If, like me, you are using features exclusively to deploy websites (as opposed to using it to bundle generic functionality, for example having a “blog” feature, or a “calendar” feature you can add to any site), config management will replace features in D8. In D7, context is used to provide the ability to export block placement to features, and strongarm exports variables. In D8, variables no longer exist, and block placement is now exportable. All of these modules are thus no longer needed.

They are replaced by the concept of configuration management, a central API for importing and exporting configuration as yml files.

Configuration management and site UUIDs

In Drupal 8, sites are now assigned a UUID on install and configuration can only be synchronized between sites having the same UUID. This is fine if the site has been cloned at some point from one environment to another, but as mentioned above, we are avoiding database cloning: we want it to be possible to install a brand new instance of a site at any time.

We thus need a mechanism to assign the same UUID to all instances of our site, but still allow us to reinstall it without cloning the database.

The solution I am using is to assign a site UUID in the site deployment module. Thus, in Drupal 8, my site deployment module’s .module file looks like this:

/**
 * @file
 * site deployment functions
 */
use Drupal\Core\Extension\InfoParser;

/**
 * Updates dependencies based on the site deployment's info file.
 *
 * If during the course of development, you add a dependency to your
 * site deployment module's .info file, increment the update hook
 * (see the .install module) and this function will be called, making
 * sure dependencies are enabled.
 */
function mysite_deploy_update_dependencies() {
  $parser = new InfoParser;
  $info_file = $parser->parse(drupal_get_path('module', 'mysite_deploy') . '/mysite_deploy.info.yml');
  if (isset($info_file['dependencies'])) {
    \Drupal::service('module_installer')->install($info_file['dependencies'], TRUE);
  }
}

/**
 * Set the UUID of this website.
 *
 * By default, reinstalling a site will assign it a new random UUID, making
 * it impossible to sync configuration with other instances. This function
 * is called by site deployment module's .install hook.
 *
 * @param $uuid
 *   A uuid string, for example 'e732b460-add4-47a7-8c00-e4dedbb42900'.
 */
function mysite_deploy_set_uuid($uuid) {
  \Drupal::configFactory() ->getEditable('system.site')
    ->set('uuid', $uuid)
    ->save();
}    

And the site deployment module’s .install file looks like this:

/**
 * @file
 * site deployment install functions
 */

/**
 * Implements hook_install().
 */
function mysite_deploy_install() {
  // This module is designed to be enabled on a brand new instance of
  // Drupal. Settings its uuid here will tell this instance that it is
  // in fact the same site as any other instance. Therefore, all local
  // instances, continuous integration, testing, dev, and production
  // instances of a codebase will have the same uuid, enabling us to
  // sync these instances via the config management system.
  // See also https://www.drupal.org/node/2133325
  mysite_deploy_set_uuid('e732b460-add4-47a7-8c00-e4dedbb42900');
  for ($i = 8001; $i < 9000; $i++) {
    $candidate = 'mysite_deploy_update_' . $i;
    if (function_exists($candidate)) {
      $candidate();
    }
  }
}

/**
 * Update dependencies and revert features
 */
function mysite_deploy_update_8003() {
  // If you add a new dependency during your development:
  // (1) add your dependency to your .info file
  // (2) increment the number in this function name (example: change
  //     change 8003 to 8004)
  // (3) now, on each target environment, running drush updb -y
  //     will call the mysite_deploy_update_dependencies() function
  //     which in turn will enable all new dependencies.
  mysite_deploy_update_dependencies();
}

The only real difference between a site deployment module for D7 and D8, thus, is that the D8 version must define a UUID common to all instances of a website (local, dev, prod, testing…).

Configuration management directories: active, staging, deploy

Out of the box, there are two directories which can contain config management yml files:

  • The active directory, which is always empty and unused. It used to be there to store your active configuration, and it is still possible to do so, but I’m not sure how. We can ignore this directory for our purposes.
  • The staging directory, which can contain .yml files to be imported into a target site. (For this to work, as mentioned above, the .yml files will need to have been generated by a site having the same UUID as the target site, or else you will get an error message – on the GUI the error message makes sense, but on the command line you will get the cryptic “There were errors validating the config synchronization.”).

I will propose a workflow which ignores the staging directory as well, for the following reasons:

  • First, the staging directory is placed in sites/default/files/, a directory which contains user data and is explicitly ignored in Drupal’s example.gitignore file (which makes sense). In our case, we want this information to reside in our git directory.
  • Second, my team has come to rely heavily on reinstalling Drupal and our site deployment module when things get corrupted locally. When you reinstall Drupal using drush si, the staging directory is deleted, so even if we did have the staging directory in git, we would be prevented from running drush si -y && drush en mysite_deploy -y, which we don’t want.
  • Finally, you might want your config directory to be outside of your Drupal root, for security reasons.

For all of these reasons, we will add a new “deploy” configuration directory and put it in our git repo, but outside of our Drupal root.

Our directory hierarchy will now look like this:

mysite
  .git
  deploy
    README.txt
    ...
  drupal_root
    CHANGELOG.txt
    core
    ...

You can also have your deploy directory inside your Drupal root, but keep in mind that certain configuration information are sensitive, containing email addresses and the like. We’ll see later on how to tell Drupal how it can find your “deploy” directory.

Getting started: creating your Drupal instance

Let’s get started. Make sure you have version 7.x of Drush (compatible with Drupal 8), and create your git repo:

mkdir mysite
cd mysite
mkdir deploy
echo "Contains config meant to be deployed, see http://dcycleproject.org/blog/68" >> deploy/README.txt
drush dl drupal-8.0.x
mv drupal* drupal_root
cp drupal_root/example.gitignore drupal_root/.gitignore
git init
git add .
git commit -am 'initial commit'

Now let’s install our first instance of the site:

cd drupal_root
echo 'create database mysite'|mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysite -y

Now create a site deployment module: here is the code that works for me. We’ll set the correct site UUID in mysite_deploy.install later. Add this to git:

git add drupal_root/modules/custom
git commit -am 'added site deployment module'

Now let’s tell Drupal where our “deploy” config directory is:

  • Open sites/default/settings.php
  • Find the lines beginning with $config_directories
  • Add $config_directories['deploy'] = '../deploy';

Edit: using a config directory name other than ‘sync’ will cause an issue Config Split at the time of this writing.

We can now perform our first export of our site configuration:

cd drupal_root
drush config-export deploy -y

You will now notice that your “deploy” directory is filled with your site’s configuration files, and you can add them to git.

git add .
git commit -am 'added config files'

Now we need to sync the site UUID from the database to the code, to make sure all subsequent instances of this site have the same UUID. Open deploy/system.site.yml and find UUID property, for example:

uuid: 03821007-701a-4231-8107-7abac53907b1
...

Now add this same value to your site deployment module’s .install file, for example:

...
function mysite_deploy_install() {
  mysite_deploy_set_uuid('03821007-701a-4231-8107-7abac53907b1');
...

Let’s create a view! A content type! Position a block!

To see how to export configuration, create some views and content types, position some blocks, and change the default theme.

Now let’s export our changes

cd drupal_root
drush config-export deploy -y

Your git repo will be changed accordingly

cd ..
git status
git add .
git commit -am 'changed theme, blocks, content types, views'

Deploying your Drupal 8 site

At this point you can push your code to a git server, and clone it to a dev server. For testing purposes, we will simply clone it directly

cd ../
git clone mysite mysite_destination
cd mysite_destination/drupal_root
echo 'create database mysite_destination'|mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysite_destination -y

If you visit mysite_destination/drupal_root with a browser, you will see a plain new Drupal 8 site.

Before continuing, we need to open sites/default/settings.php on mysite_destination and add $config_directories['deploy'] = '../deploy';, as we did on the source site.

Now let the magic happen. Let’s enable our site deployment module (to make sure our instance UUID is synched with our source site), and import our configuration from our “deploy” directory:

drush en mysite_deploy -y
drush config-import deploy -y

Now, on your destination site, you will see all your views, content types, block placements, and the default theme.

This deployment technique, which can be combined with generated dummy content, allows one to create new instances very quickly for new developers, testing, demos, continuous integration, and for production.

Incrementally deploying your Drupal 8 site

What about changes you make to the codebase once everything is already deployed. Let’s change a view and run:

cd drupal_root
drush config-export deploy -y
cd ..
git commit -am 'more fields in view'

Let’s deploy this now:

cd ../mysite_destination
git pull origin master
cd drupal_root
drush config-import deploy -y

As you can see, incremental deployments are as easy and standardized as initial deployments, reducing the risk of errors, and allowing incremental deployments to be run automatically by a continuous integration server.

Next steps and conclusion

Some aspects of your site’s configuration (what makes your site unique) still can’t be exported via the config management system, for example enabling new modules; for that we’ll use update hooks as in Drupal 7. As of this writing Drupal 8 update hooks can’t be run with Drush on the command line due to this issue.

Also, although a great GUI exists for importing and exporting configuration, I chose to do it on the command line so that I could easily create a Jenkins continuous integration job to deploy code to dev and run tests on each push.

For Drupal projects developed with a dev-stage-prod continuous integration workflow, the new config management system is a great productivity boost.

Please enable JavaScript to view the comments powered by Disqus.

Jul 30 2014
Jul 30

July 30, 2014

I had this checklist documented internally, but I keep referring back to it so I’ll make it available here in case anyone else needs it. The idea here is to document a minimum (not an ideal) set of modules and tasks which I do for almost all projects.

Questions to ask of a client at the project launch

  • Is your site bilingual? If so is there more than one domain? (if so, and you are exporting your languages as Features, your domain is exported with it. If your domains are different on different environments, you might want to use language_domain to override the domains per environment)
  • What type of compatibility do you need: tablet, mobile, which versions of IE?
  • How do you see your post-launch support and core/module update contract?
  • Do you need SSL support?
  • What is your hosting arrangement?
  • Do you have a contact form?
  • What is your anti-spam method? Note that CAPTCHA is no longer useful; I like Mollom, but it’s giving me more and more false positives with time. Honeypot has given me good results as well.
  • Is WYSIWYG required? I strongly suggest using Markdown instead.
  • Confirm that all emails are sent in plain text, not HTML. If you’re sending out HTML mail, do it right.
  • Do you need an on-site search utility? If so, some thought, and resources, need to go into it or it will be frustrating.
  • What kind of load do you expect on your site (anonymous and admin users)? This information can be used for load testing.
  • If you already have a site, should old paths of critical content map to paths on the new site?
  • Should users be allowed to create accounts (with spam considerations, and see if an admin should approve them).

Here is what should get done in the first Agile sprint, aka Sprint Zero:

  • If you are using continuous integration, a Jenkins job for tracking the master branch: this job should fail if any test fails on the codebase, or if quality metrics (code review, for example, or pdepend metrics) reach predefined thresholds.
  • A Jenkins job for pushing to dev. This is triggered by the first job if tests pass. It pushed the new code to the dev environment, and updates the dev environment’s database. The database is never cloned; rather, a site deployment module is used.
  • An issue queue is set up and the client is given access to it, and training on how to use it.
  • A wiki is set up.
  • A dev environment is set up. This is where the code gets pushed automatically if all tests pass.
  • A prod environment is set up. This environment is normally updated manually after each end of sprint demo.
  • A git repo is set up with a basic Drupal site.
  • A custom module is set up in sites/*/modules/custom: this is where custom function go.
  • A site deployment module in sites/all/modules/custom. All deployment-related code and dependencies go here. A .test file and an .install should be included.
  • A site development module is set up in sites/*/modules/custom, which is meant to contain all modules required or useful for development, as dependencies.
  • A custom theme is created.
  • An initial feature is created in sites/*/modules/features. This is where all your features will be added.
  • A “sites/*/modules/patches” folder is created (with a README.txt file, to make sure it goes into git). This is where core and contrib patches should go. Your site’s maintainers should apply these patches when core or contrib modules are updated. Patch names here should include the node id and comment number on Drupal.org.

Basic module list (always used)

Development modules (not enabled on production)

I normally create a custom development module with these as dependencies:

I make sure this module is in my repo but it is not enabled unless used:

Experimental modules

  • dcycle, this is a module that is in active development, not ready for prime yet, but where I try to add all my code to help with testing, etc.

Multilingual modules

  • i18n
  • potx
  • l10n_update
  • entity_translation if you need the same node id to display in several languages. This is useful if you have references to nodes which should be translated.
  • title if you are using entity translations and your titles can be multilingual.

Launch checklist

  • Design a custom 404, error and maintenance page.
  • Path, alias and permalink strategy. (Might require pathauto.)
  • Think of adding revisions to content types to avoid clients losing their data.
  • Don’t display errors on production.
  • Optimize CSS, JS and page caching.
  • Views should be cached.
  • System messages are properly themed.
  • Prevent very simple passwords.
  • Using syslog instead of dblog on prod

In conclusion

Most shops, and most developers, have some sort of checklist like this. Mine is not any better or worse than most, but can be a good starting point. Another note: I’ve seen at least three Drupal teams try, and fail, to implement a “Drupal Starter kit for Company XYZ” and keep it under version control. The problem with that approach, as opposed to a checklist, is that it’s not lightweight enough: it is a software product which needs maintenance, and after a while no one maintains it.

Please enable JavaScript to view the comments powered by Disqus.

May 23 2014
May 23

May 23, 2014

One of the techniques I use to make sure I write tests is to write them before I do anything else, which is known as test-driven development. If you develop your functionality before writing a test, in most cases you will never write the test to go with it, because you will be pressured to move on to new features.

I have found, though, that when writing tests, our team tends to think only about the happy path: what happens if everything goes according to plan.

Let me give an quick example: let’s say you are developing a donation system for anonymous users to make donations on your site. The user story calls for a form where a donation amount can be entered before redirecting the user to the payment form. Using test-driven development and Drupal’s Simpletest framework, we might start by writing something like this in our site deployment module’s .test file:

// @file mysite_deploy.test

class MysiteDonate extends DrupalWebTestCase {

  ...

  public function testSite() {

    $edit = array(
      'amount' => 420,
    );
    ...
    $this->drupalPost('donate', $edit, 'Donate now!');
    ...
    $this->assertText('You are about to donate $420', 'The donation amount has been recorded');
  }

  ...
}

When you first run this test it will fail, and your job as a developer will be to make this test pass. That’s test-driven development.

The problem with this approach is that it only defines the happy path: what should happen when all goes according to plan. It makes no provision for the sad path: what happens if a user puts something other than a number? What happens if 0 is entered? These are known as sad paths, and most teams never think about them until they occur (human nature, I guess).

To make sure we think about the sad path, I start by making sure the right questions are asked during our Agile sprint planning sessions. In the case of the “donation” user story mentioned above, the following business questions should be asked during sprint planning:

  • What’s the minimum donation? Obviously it should not be possible to donate $0, but is $0.01 OK?
  • Is there a maximum donation? Should the system bring you to the checkout page if you enter 1 billion dollars in the donation box?

Often, the client will not have thought of that, and will answer something like: sure there should be a minimum and a maximum, and we also want site administrators to be able to edit those. Let’s say the team agrees on this (and the extra work it entails), the admin interface too should be tested.

Once the sprint planning session is over, I will start by writing the test based on business considerations above, and also integrating other sad paths I can think of, into my test.

Here is what our test might look like now, assuming we have a setUp() function which enables our site deployment module and dependent features (including roles); and we are using the loginAsRole() method, documented here:

// @file mysite_deploy.test

class MysiteDonate extends DrupalWebTestCase {

  ...

  public function testSite() {

    // Manage minimum and maximum donation amounts.
    $this->drupalGet('admin/options');
    $this->assertText('Access denied', 'Non-admin users cannot access the configuration page');
    $this->loginAsRole('administrator');
    $edit = array(
      'minimum' => '50',
      'maximum' => $this->randomName(),
    );
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum and maximum donation amounts must be numeric');
    $edit['maximum'] = '40';
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum amount must be equal to or less than maximum donation amount');
    $edit['maximum'] = '30';
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum maximum donation amounts have been saved');
    $this->drupalLogout();

    // Make a donation, sad path
    $edit = array(
      'amount' => '<script>alert("hello!")</script>',
    );
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Donation amount must be numeric', 'Intercept non-numeric input.');
    $edit['amount'] = 29;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Thanks for your generosity, but we do not accept donations below $30.');
    $edit['amount'] = 41;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Wow, $41! Do not do this through our website, please contact us and we will discuss this over the phone.');

    // Make a donation, happy path
    $edit['amount'] = 30;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('You are about to donate $30', 'The donation amount has been recorded');
  }

  ...
}

The above example is a much more complete portrait of what your site should do, and documenting everything in a failing test even before you or someone else starts coding ensures you don’t forget validations and the like.

One interesting thing to note about our complete test is that sad paths actually take up a lot more effort than the happy path. There are many advantages to thinking of them first:

  • The client can be involved in making business decisions which can affect the sad path.
  • The entire team (including the client) is made aware as early as possible about sad path considerations, and the extra work they entail.
  • Nothing is taken for granted as obvious: time is set aside for sad path development.
  • The sad path becomes an integral part of your user story which can be part of the demo. Often in Agile sprint reviews, if no one has ever thought of the sad path, only the happy path is demonstrated.
  • There is less technical debt associated with sad path development: you are less likely to get a panicked call from your client once your site goes live about getting dozens of 50 cent donations when the payment processor is taking a dollar in fees.
  • Your code will be more secure: you will think about how your system can be hacked and integrate hacking attempts (and the appropriate response) directly into your test.
  • You will be more confident putting a failing test on a feature branch and handing it to junior developers: they will be less likely to forget something.
  • Thinking of the sad path can make you reconsider how to define your features: a contact form or commenting system can seem trivial when you only think of the happy path. However, when you take into account how to deal with spam, you might decide to not allow comments at all, or to allow only authenticated users to post comments or use the contact form.

Note that as in all test-driven development, your test is not set in stone. It is like any other code: developers can modify it as long as they follow the spirit of your test. For example, maybe your config page is not admin/option but something else. Developers should feel that they own the test and can change it to fit the real system.

Please enable JavaScript to view the comments powered by Disqus.

Apr 22 2014
Apr 22

April 22, 2014

My development team is using a site deployment module which, when enabled, deploys our entire website (with translations, views, content types, the default theme, etc.).

We defined about 30 tests (and counting) which are linked to Agile user stories and confirm that the site is doing what it’s supposed to do. These tests are defined in Drupal’s own Simpletest framework, and works as follows: for every test, our site deployment module is enabled on a new database (the database is never cloned), which can take about two minutes; the test is run, and then the temporary database is destroyed.

This created the following problem: because we were deploying our site 30 times during our test run, a single test run was taking over 90 minutes. Furthermore, we are halfway into the project, and we anticipate doubling, perhaps tripling our test coverage, which would mean our tests would take over four hours to run.

Now, we have a Jenkins server which performs all the tests every time a change is detected in Git, but even so, when several people are pushing to the git repo, test results which are 90 minutes old tend to be harder to debug, and developers tend to ignore, subvert and resent the whole testing process.

We could combine tests so the site would be deployed less often during the testing process, but this causes another problem: tests which are hundreds of lines long, and which validate unrelated functionality, are harder to debug than short tests, so it is not a satisfactory solution.

When we look at what is taking so long, we notice that a majority of the processing power goes to install (deploy) our testing environment for each test, which is then destroyed after a very short test.

Enter Simpletest Turbo, which provides very simple code to cache your database once the setUp() function is run, so the next test can simply reuse the same database starting point rather than recreate everything from scratch.

Although Simpletest Turbo is in early stages of development, I have used it to almost quadruple the speed of my tests, as you can see from this Jenkins trend chart:

I know: my tests are failing more than I would like them to, but now I’m getting feedback every 25 minutes instead of every 95 minutes, so failures are easier to pinpoint and fix.

Furthermore, fairly little time is spent deploying the site: this is done once, and the following tests use a cached deployment, so we are not merely speeding up our tests (as we would if we were adding hardware): we are streamlining duplicate effort. It thus becomes relatively cheap to add new independent tests, because they are using a cached site setup.

Please enable JavaScript to view the comments powered by Disqus.

Feb 26 2014
Feb 26

February 26, 2014

Many Drupal projects now under maintenance suffer from technical debt: a lot of the functionality is in the database and outside of git, and the code lacks automated testing. Furthermore, the functionality is often brittle: a change to one feature breaks something seemingly unrelated.

As our community and our industry mature, teams are increasingly interested in automated testing. Having worked on several Drupal projects with and without automated testing, I’ve come to the conclusion that any line of code which is not subject to automated testing is legacy code; and I agree with Michael Feathers who stated in his book Working Effectively with Legacy Code[1] that a site with zero automated tests is a legacy site from the moment you deliver it.

But the road to automatic testing for Drupal is, as I’ve learned the hard way, strewn with obstacles, and first-time implementations of automated testing tend to fail. Here are a few tips to keep in mind if your team is willing to implement automated testing.

Tip #1: Use a continuous integration server

Tests are only useful if someone actually runs them. If you don’t automate running the test suite on each push to your git repo, no one will run your tests, however good their intentions are.

The absolute first thing you need to do is set up a continuous integration (CI) server which runs a script every time your git repo changes. To make this easier I’ve set up a project on GitHub which uses Vagrant and Puppet to set up a quick Jenkins server tailored for use with Drupal.

Even before starting to write tests, make sure your continuous integration job actually runs on your master branch. When your project passes tests (which is easy at first because you won’t have tests), your project will be marked as stable.

Notice that I mentioned the master branch: although git has advanced branching features, the only branch you should track in your CI server is your stable branch (often master, although for projects with more than one stable release, like Drupal itself, you may have two or three stable branches).

It is important at this point to get the team (including the client) used to seeing the continuous integration dashboard, ideally by having a monitor in a visible place (this team even plugged Jenkins into a stop light, which really grabs attention in case of a failure). If your code is flagged as failed by your CI server, you want it to be known as soon as possible, and you want the entire team to have responsibility for fixing it immediately. Your main enemy here is failure fatigue: if your master branch is broken, and no one is working at fixing it, you will get used to seeing failures and you will fail at implementing automated testing.

Eventually, you will want to add value to your continuous integration job by running Code Review tests, and other code analysis tools like Pdepend. With these kinds of tools, you can get a historical perspective on metrics like adherance to Drupal coding standards, the number of lines of code per function, code abstraction, and the like. I even like to have my Jenkins job take a screenshot of my site on every push (using PhantomJS), and comparing the latest screenshot to the previous one ImageMagick’s compare utility.

Basically, any testing and analysis you can do on the command line should be done within your continuous integration job.

If done right, and if you have high confidence in your test suite, you can eventually use your CI server to deploy continuously to preproduction, but let’s not get ahead of ourselves.

Tip #2: Test your code, not the database

Most Drupal developers I’ve talked to create their local development environment by bringing their git repo up to date, and cloning the production database.

They also tend to clone the production or preproduction database back to Jenkins in their continuous integration.

For me, this is the wrong approach, as I’ve documented in this blog post.

Basically, any tests you write should reside in your git repo and be limited to testing what’s in the git repo. If you try to test the production database, here is a typical scenario:

  • Someone will do something to your database which will break a test.

  • Your Jenkins job will clone the database, run the test, and fail.

  • Another person will make another change to the database, and your test will now pass.

You will now see a history of failures which will indicate problems outside of your code. These will be very hard to reproduce and fix.

Keep in mind that the tests you write should depend on a known good starting point: you should be able to consistently reproduce an environment leading to a success or a failure. Drupal’s Simpletests completely ignore the current host database and create a new database from scratch just for testing, then destroy that database.

How to do this? First, I always use a site deployment module whose job it is to populate the database with everything that makes your site unique: enabling the site deployment module should enable all modules used by your site, and, using Features and related modules, deploy all views, content types, and the like, set all variables and set the default theme. The site deployment module can then be used by new developers on your team who need a development environment, and also by the CI server, all without cloning the database. If you need dummy content for development, you can use Devel’s devel_generate utility, along with this trick to make your generated content more realistic.

When a bug is reported on your production site, you should reproduce it consistently in your dummy content, and then run your test against the simulation, not the real data. An example of this is the use of Wysiwyg: often, lorem ipsum works fine, but once the client starts copy-pasting from Word, all kinds of problems arise. Simulated word-generated markup is the kind of thing your test should set up, and then test against.

If you are involved in a highly-critical project, you might eventually want to run certain tests on a clone of your production database, but this, in my opinion, should not be attempted until you have proper test coverage and metrics for your code itself. If you do test a clone of your production database and a bug is found, reproduce the bug in a simulation, add a test to confirm the bug, and fix your code. Fixing your code to deal with a problem in production without simulating the problem first, and testing the simulation, just results in more legacy code.

Tip #3: Understand the effort involved

Testing is time-consuming. If your client or employer asks for it, that desire needs to come with the appropriate resources. Near the beginning of a project, you can easily double all time estimates, and the payoff will come later on.

Stakeholders cannot expect the same velocity for a project with and without automated testing: if you are implementing testing correctly, your end-of-sprint demos will contain less features. On the other hand, once you have reached your sweet spot (see chart, above), the more manageable number of bugs will mean you can continue working on features.

Tip #4: Start gradually

Don’t try to test everything at once. If your team is called upon to “implement automated testing” on a project, you are very likely to succumb to test paralysis if you try to implement it all at once.

When working with legacy sites, or even new sites for which there is pressure to deliver fast, I have seen many teams never deliver a single test, instead delivering excuses such as “it’s really simple, we don’t need to test it”, or “we absolutely had to deliver it this week”. In reality, we tend to see “automated testing” as insurmountable and try to weasel our way of it.

To overcome this, I often start a project with a single test: find a function in your code which you can run against a unit test (no database required), and write your first test. In Drupal, you can use a Simpletest Unit test (as in this example) and then run it straight from the browser.

Once you’re satisfied, add this line to your CI job so the test is run on every push:

drush test-run mytestgroup

Once that is done, it becomes easier for developers to write their own tests by adding it to the test file already present.

Tip #5: Don’t overestimate how good a developer you are

We all think we’re good developers, and really we can’t imagine anything ever going wrong with our code, I mean, it’s so elegant! Well, we’re wrong.

I’ve seen really intelligent people write code which looks really elegant, but still breaks.

I’ve seen developers never write tests for the simple stuff because it’s too simple, and never write tests for the more complex stuff because they never practiced with the simple stuff.

Even though you’re positive your code is so robust it will never break, just test it.

Tip #6: Start with the low-hanging fruit

This is an error I made myself and which proved very painful. Consider a system with three possible use cases for the end user. Each use case uses the same underlying calls to the database, and the same underlying pure functions.

Now, let’s say you are using a high-level testing framework like Behat and Selenium to test the rich user interface and you write three tests, one for each use case. You think (wrongly, as we’ll see) that you don’t need unit tests, because whatever it is you want to test with your unit tests is already tested by your high-level rich user interface tests.

Don’t forget, your specs also call for you to support IE8, IE9, Webkit (Safari) and Firefox. You can set up Jenkins to run the rich GUI tests via Selenium Grid on a Windows VM, and other fancy stuff.

This approach is wrong, because when you start having 5, 8, 10, 20 use cases, you will be tempted to continue just implement dozens of new, expensive rich GUI tests, and your tests will end up taking hours.

In my experience, if your entire test suite takes more than two hours to run, developers will start resenting the process and ignoring the test results, and you are back to square one.

In his book Succeeding with Agile, Mike Cohn came up with the idea of a test pyramid, as shown in the diagram below (you can learn more about the concept in this blog post).

Based on this concept, we quickly realize that:

  • Several steps are redundant among the GUI use cases.
  • The exact same underlying functionality is tested several times over.

Thinking of this from a different angle, we can start by testing our pure functions using unit tests. This will make for lightning-fast tests, and will get the team into the habit of not mixing UI functions, database functions and pure functions (for an example of what not to do, see Drupal’s own block_admin_display_form_submit).

Once you have built up a suite of unit tests which actually has value, move on to the next step: tests which require the database. This requires some variation of a site deployment module or another technique to bring the database to a known-good starting point before you run the test; it is harder to grasp and setting up a CI job for these types of tests is difficult too. However, your team will more likely be willing to work hard to overcome these obstacles because of the success they achieved with unit tests.

All of the above can be done with Drupal’s core simpletest.

Finally, when you are satisfied with your unit test suites and your database tests, you can move outside of Drupal and on targeted tests (not all usecases, only a few to make sure your widgets work) with Behat, Mink, Selenium, Windows/IE VMs. If you start with the fancy stuff, though, or have too much of it, the risk of failure is much greater.

Tip #7: Don’t underestimate developers’ ability to avoid writing tests

If you implement all the tips you’ve seen until now in this article, something curious will happen: no one will write any tests. Not even you.

Here’s the psychology behind not writing tests:

  • You really have the intention of writing tests, you just want to get your feature working first.
  • You work hard at getting your feature ready for the end-of-sprint demo.
  • You show off your feature to the team and they like it.
  • You don’t write any tests.

The above will happen to you. And keep in mind, you’re actually very interested in automated testing (enough to have read this article until now!). Now imagine your teammates, who are less interested in automated testing. They don’t stand a chance.

These are some techniques to get people to write tests:

The first is used by the Drupal project itself and is based on peer review of patches. If you submit a patch to core and it does not contain tests, it will not make it in. This requires that all code be reviewed before making it into your git repo’s stable branch. There are tools for this, like Phabricator, but I’ve never successfully implemented this approach (if you have, let me know!).

The second approach is to write your tests before writing a new feature or fixing a bug. This is known as test-driven development (TDD) and it generally requires people to see things from a different angle. Here is a typical scenario of TDD:

  • A bug comes in for project xyz, and you are assigned to it.

  • You write a test for it. If you don’t know something (no function exists yet, so you don’t know what it’s called; no field exists yet, so you don’t know how to target it), just put something feasible. If you’re dealing with the body field in your test, just use body. Try to test all conceivable happy paths and sad paths.

  • Now switch modes: your goal is to make the test pass. This is an iterative process which entails writing code and changing your test as well (your test is code too, don’t forget!). For example, perhaps the body field’s machine name is not body but something like field_body[und][0]. If such is the case, change the test, as long as the spirit of the test remains.

The above techniques, and code coverage tools like code_coverage or the experimental cover, which I like, will help you write tests, but changing a team’s approach can only be achieved through hard work, evangelizing, presentations, blogging, and the like.

Tip #8: Don’t subvert your process

When it becomes challenging to write tests, you might figure that, just this once, you’ll not test something. A typical example I’ve seen of this, in project after project, is communication with outside systems and outside APIs. Because we’re not controlling the outside system, it’s hard to test it, right? True, but not impossible. If you’ve set aside enough time in your estimates to do things right, you will be able to implement mock objects, making sure you test everything.

For example, in this blog post, I demonstrate how I used the Mockable module to define mock objects to test integration between Drupal and a content deployment system.

You will come across situations where implementing testing seems very hard, but however much effort I put into implementing automated testing for something, I have never regretted it.

Bonus tip: the entire team should own the tests

Your tests cannot be imposed by any one member of the team if they are to succeed. Instead, agree on what should be tested during your sprint planning.

For example, some developers (myself included) like to have close to zero Drupal styling errors. Others don’t really see the point of using two spaces instead of a tab. Unless you agree on what defines a failure (more than 100 minor styling errors? 1000? No threshold at all?), developers will feel resentful of having to fix it.

Because in Agile, your client is part of team as well, it is a good idea to involve them in defining what you are testing, providing them with the costs and benefits of each test. Perhaps your client doesn’t know what a MySQL query is, but if told that keeping the number of queries to less than 100 on the home page (something that can be tracked automatically) will keep performance up, they will be more likely to accept the extra cost associated.

Conclusion

Automated testing is about much more than tools (often the tools are quite simple to set up). The human aspect and the methodology are much more important to get your automated testing project off the ground.

[1] See Jez Humble and David Farley’s Continuous Delivery, Addison Wesley.

Please enable JavaScript to view the comments powered by Disqus.

Jan 20 2014
Jan 20

January 20, 2014

Drupal uses incremental IDs for such data as taxonomy terms and nodes, but not content types or vocabularies. If, like me, you believe your site’s codebase should work with different environments and different databases, your incremental IDs can be different on each environment, causing your code to break.

But wait, you are thinking, I have only one environment: my production environment.

Even if such is the case, there are advantages to be able to spawn new environments independently of the production environment without cloning the database upstream:

  • Everything you need to create your website, minus the content, is under version control. The production database, being outside version control, should not be needed to install a new environment. See also “what is a deployment module?”.
  • New developers can be up and running with a predictable environment and dummy content.
  • Your automated tests, using Drupal’s Simpletest, by default deploy a new environment without cloning the database.
  • For predictable results in your continuous integration server, it is best to deploy a new envrionment. The production database is unpredictable and unversioned. If you test it, your test results will be unpredictable as well.
  • Maybe in the future you’ll need a separate version of your site with different data (for a new market, perhaps).

Even if you choose to clone the database upstream for development, testing and continuous integration, it is still a good idea to avoid referencing incremental IDs of a particular database, because at some point you might decide that it is important to be able to have environments with different databases.

Example #1: using node IDs in CSS and in template files

I have often seen this: particular pages (say, nodes 56 and 400) require particular markup, so we see template files like page--node--56.tpl.php and css like this:

.page-node-56 #content,
.page-node-400 #content {
   ...
}

When, as developers, we decide to use this type of code on a website, we are tightly coupling our code, which is under version control, to our database, which is not under version control. In other words our project as a whole can no longer be said to be versioned as it requires a database clone to work correctly.

Also, this creates all sorts of problems: if, for example, a new node needs to be created which has the same characteristics as nodes 56 and 400, one must fiddle with the database (to create the node) and the code. Also, creating automatic tests for something like this is hard because the approach is not based on underlying logic.

A better approach to this problem might be to figure out why nodes 56 and 400 are somehow different than the others. The solution will depend on your answer to that question, and maybe these nodes need to be of a different content type; or maybe some other mechanism should be used. In all cases, though, their ID should be irrelevant to their specificity.

Example #2: filtering a view by taxonomy tag

You might have a website which uses Drupal’s default implementation of articles, with a tag taxonomy field. You might decide that all articles tagged with “blog” should appear in your blog, and you might create a new view, filtered to display all articles with the “blog” tag.

Now, you might export your view into a feature and, perhaps, make your feature a dependency of a site deployment module (so that enabling this module on a new environment will deploy your blog feature, and do everything else necessary to make your site unique, such as enabling the default theme, etc.).

It is important to understand that with this approach, you are in effect putting an incremental ID into code. You view is in fact filtering by the ID of the “blog” taxonomy term as it happens to exist on the site used to create the view. When creating the view, we have no idea what this ID is, but we are saying that in order for our view to work, the “blog” taxonomy term needs to be identical on all environments.

Here is an example of how this bug will play out:

  • This being the most important feature of your site, when creating new environments, the “blog” taxonomy term might always have the ID 1 because it is the first taxonomy term created; you might also be in the habit of cloning your database for new environments, in which case the problem will remain latent.
  • You might decide that such a feature is too “simple” to warrant automated testing; but even if you do define an automated test, your test will run on a new database and will need to create the “blog” taxonomy term in order to validate. Because your tests are separate and simple, the “blog” taxonomy term is probably the only term created during testing, so it, too will have ID 1, and thus your test will pass.
  • Your continuous integration server which monitors changes to your versioned code will run tests against every push, but, again, on a new database, so your tests will pass and your code will be fine.

This might go on for quite some time until, on a given environment, someone decides to create another term before creating the “blog” term. Now the “blog” term will have ID #2 which will break your feature.

Consider, furthermore, that your client decides to create a new view for “jobs” and use the same tag mechanism as for the blog; and perhaps other tags as well. Before long, your entire development cycle becomes dependent on database cloning to work properly.

To come up with a better approach, it is important to understand what we are trying to accomplish; and what taxonomy terms are meant to be used for:

  • The “blog” category here is somehow, logically, immutable and means something very specific. Furthermore, the existence of the blog category is required for our site. Even if its name changes, the key (or underlying identity) of the blog category should always be the same.
  • Taxonomy terms are referenced with incremental IDs (like nodes) and thus, when writing our code, their IDs (and even their existence) cannot be counted upon.

In this case, we are using taxonomy terms for the wrong purpose. Taxonomy terms, like nodes, are meant to be potentially different for each environment: our code should not depend on them.

A potential solution in this case would be to create a new field for articles, perhaps a multiple selection field, with “blog” as one of the possible values. Now, when we create a view filtered by the value “blog” in our new field, we are no longer referencing an incremental ID in our code.

I myself made this very mistake with my own website code without realizing it. The code for this website (the one you are reading) is available on Github and the issue for this problem is documented here (I’ll try to get around to fixing it soon!).

Deploying a fix to an existing site

If you apply these practices from the start of a project, it is relatively straightforward. However, what if a site is already in production with several articles already labelled “blog” (as is the case on the Dcycle website itself)? In this case we need to incrementally deploy the fix. For this, a site deployment module can be of use: in your site deployment module’s .install file, you can add a new update hook to update all your existing articles labelled “blog”, something like:

/**
 * Use a machine name rather than an incremental ID to display blog items.
 */
function mysite_deploy_update_7010() {
  // deploy the new version of the view to the target site
  features_revert(array('mysite_feature' => array('views_view')));
  ...
  // cycle through your nodes and add "blog" to your new field for any
  // content labelled "blog".
}

Of course, you need to test this first with a clone of your production site, perhaps even adding an automatic test to make sure your function works as expected. Also, if you have a lot of nodes, you might need to use the “sandbox” feature of hook_update_n(), to avoid timeouts.

Once all is tested, all that needs to be done, on each environment (production, every developer’s laptop, etc.), is run drush updb -y on the command line.

Conclusion

Drupal makes it very easy to mix incremental IDs into views and code, and this will work well if you always use the same database on every environment. However, you will quickly run into problems if you want to write automated tests or deploy new sites without cloning the database. Being aware of this can help you write more logical, consistent and predictable code.

Please enable JavaScript to view the comments powered by Disqus.

Jan 07 2014
Jan 07

January 07, 2014

It is generally agreed that cloning the database downstream (that is, from development toward production) is a bad idea, if only because by doing so all production content is lost; most developers use Features, Context, some variation on a site deployment module, or a rudimentary written procedure to move new configuration downstream.

However, in a dev-stage-production workflow, the database is often still periodically cloned back upstream:

In such an approach, anything not in Features or a site deployment module exists solely in the database. For example: any content, your default theme, and other information (such as variables not exported with Strongarm or block placement information not exported with Context) are defined only in your database and not in code. Therefore, to create a realistic development environment, it is tempting to clone your database.

I’ll explain why I think database cloning is the wrong approach, and then look at other ways to achieve the same goals. Finally, I’ll look at some situations where cloning the database is a good idea.

Why is cloning the database the wrong approach?

Cloning the database is wrong for the following reasons:

  • The database is not under version control.
  • The database is not a known-good starting point.
  • Database cloning makes automated testing harder.
  • Database cloning makes continuous integration harder.
  • What if there is more than one “production” site?
  • Your production database may be very large.
  • Your production database may contain sensitive data.
  • Fixes to a cloned database will “work on my machine”, but not elsewhere.

The database is not under version control

In Drupal, the database contains all configuration, content types, variables, views, and content; and none of this is under version control.

A good development practice is to put everything except content into code, and into version control, via Features, Context, Strongarm, and a site deployment module. These are code and can be kept under version control.

The database is not a known-good starting point

One important aspect of writing modern software is the importance of automated testing, and the importance of knowing that our test will always yield the same result. This is the concept of a known good starting point, discussed in the book Continuous Delivery. The production database changes continually, for example when new comments or content are added. If your tests, either manual or automated, depend on a cloned production database, there is always a chance that different versions of the database will be yield different test results.

Database cloning makes automated testing harder

Because of the importance of having a known-good starting point, Drupal automated tests which require the database always work in the following manner:

  • Build a brand-new temporary (throw-away) database from scratch.
  • Perform a plain installation.
  • Create the required content and set the required configuration.
  • Perform the test.
  • Discard the throw-away database.

For example, let’s say you have a block appear when there are more than 20 registered users on your site. The only way to accurately test this is to have your test control the number of users, and test the presence or absence of your block. If the only way to deploy a new environment with your site is to clone the database, the test has no real way of creating the conditions (active theme, block placement, active modules) to run this test.

However, if you are using Features and a site deployment module, all your tests needs to do for the above example is to:

  • (1) Enable your site deployment module.
  • (2) Make sure the special block does not appear.
  • (3) Create the 20 users.
  • (4) Make sure the block does appear.

Database cloning makes continuous integration harder

Continuous integration (CI) and continuous deployment are quite popular these days, with good reason, because without CI, automated testing is not that useful (because developers tend to ignore tests).

Basically, CI runs a script on every push to version control. So: every time there is a change to the code base, the tests can be run and either pass or fail.

I have seen many shops experiment with continuous integration, and in many cases the Drupal site is recreated by cloning the production database. Therefore, the CI server’s test site is always in an unknown, unversioned state. So when a test fails, it is impossible to say whether a change to the database caused the fail, or a change to the code did.

In my experience this causes frustration and confusion, and eventually will cause your CI server to be worthless, and hence abandoned.

What if there is more than one “production” site?

When we are cloning the production site’s database, what do we mean exactly? Take the following example: we are developing a code base for a university with dozens of faculties. Each faculty uses the same code base but a different theme, and some slightly different settings.

It doesn’t make sense for new developers to clone one production database rather than another for development, so often a random choice is made, leading to uncertainty.

Consider your codebase to be a software product which can be deployed on any number of sites, just as any software. Would it make sense for developers of a word processor to clone the computer of one of their clients during routine development?

Your production database may be very large

Beyond the logical considerations, cloning production databases can be unwieldy, requiring one to remove cache tables, finding a mechanism to either copy all files and images, ignore them, or use placeholder files and images (that does not feel right, no?). Still, you can quickly find yourself with very large databases.

Your production database may contain sensitive data

Once your production site actually starts being used, you end up with much sensitive data there: email addresses, hashed passwords, order history, addresses, or worse. Consider the consequences if you dump this database on a developer’s laptop (which will eventually be stolen or lost).

Fixes to a cloned database will “work on my machine”, but not elsewhere

So you’ve cloned a database on your laptop, and you changed some configuration on administration pages, and now the problem seems fixed, you’ve made a demo for your team and your client. The next part is messy though: a list of admin screens to click through on the production site to reproduce the fix (ugh!), or, as I’ve already seen, cloning the development database downstream (double-ugh!). Both methods are error-prone and do not record the fix in version control, so a month from now you’ll forget how it was done. In fact, you will find yourself in a sysyphian effort of repeatedly fixing the same problem over and over, and explaining to your clients and your team, with the help of out-of-date wiki pages, email exchanges and undecipherable comments on issue queues, that you are not an incompetent oaf.

What are the alternatives to database cloning?

We generally clone the database to have a realistic development environment. Among other things, during development, we need to have:

  • The same configuration and features.
  • Realistic content.
  • Some exact problem-causing content.

This is possible without cloning the database. Here are some tips and techniques.

Getting the same configuration and features as production

In an ideal world any Drupal site should be deployable without cloning the database, by getting the code from git and enabling the site deployment module.

You are most likely, however, to inherit a site which is a mess: no site deployment module, no tests, with Features, if they exist at all, likely to be overridden on the production site. On some projects you’d be lucky to even have a git repo.

One might think that for such sites, which we’ll call legacy sites for the purpose of this article, cloning the production database is the only viable option. Unfortunately, that is true, but it should only be a temporary solution, to give you time to extract the important configuration into code, and to create a site deployment module.

Let’s say, for example, I get a work request to “fix a little bug on a site which is almost ready”. The first thing I do is to clone the entire site to my laptop, with the database and all, and and determine which configurations, features, and variables are affected by the bug. Let’s say the site in question has 20 content types, 20 views, 50 enabled modules, three languages and a custom theme.

But the bug in question only affects 2 content types, one view, 3 modules and does not require the custom theme or i18n. I would start by generating a feature (if one does not exist) with the required views and content types, and a site deployment module with the feature as a dependency and a basic automated test. Now I can use test-driven development to fix my bug, push everything back to version control and to my continuous integration server, and deploy to production using drush.

Thus, every time an issue is being worked on, a site gradually moves from being a legacy site to a modern, tested site with continuous integration (don’t do it all at once as you will get discouraged).

Realistic content

For developers, Devel’s devel_generate module is great for generating lorem ipsum content with dummy images, so even if you don’t clone your database, you can still get, say, 50 (or 1000) blog posts with 5 (or 50) comments each.

During automated testing, several DrupalWebTestCase API functions allow you to create as much dummy content as you want, being as specific as you want.

Some exact problem-causing content

I have recently had to deal with a bug where the a site’s “layout was periodically going berserk”. That was the exact issue title, and I was lucky because my client was thoughtful enough to provide a screenshot and even the source code.

This problem could be tracked down to a often-seen misconfiguration of views and marked-up content: views would trim all body fields to 100 characters, which works fine with standard lorem ipsum, but in the real world the client was using markup in the content, so if a <div> tag would appear before the 100 character mark, but end after it, the ending tag would be omitted, screwing up the html.

Several colleagues who are used to cloning the database concluded that this a limitation of generated content.

I see this situation as more of an opportunity, and have come up with a way of altering generated lorum ipsum to suit your needs. So, when starting to work on such an issue, first make sure that your generated content better reflects real content, both for developers and for the automated tests.

When is it OK to clone the database?

“Don’t clone the database” is a good rule of thumb, but in some cases cloning the database is good idea, for example in the following cases:

  • For backups and restores.
  • For hard-to-debug “production-only” problems.
  • As a temporary measure to update a legacy site.
  • For proproduction environments.

For backups and restores

Code is not everything. The database contains your content, so you need to have a strategy to clone your database somewhere nightly, test it often, and make sure you can restore it. This is mot easily done by cloning the database.

For hard-to-debug “production-only” problems

Once in a while, you will have a problem which only manifests itself on a production site. Reproducing this type of problem systematically can be best achieved by cloning your production database to figure out what the problem is (never work directly on production, of course).

As a temporary measure to update a legacy site

As mentioned in “Getting the same configuration and features as production”, above, most projects are a complete mess once you get your hands on them. We’ll call these legacy sites. The only way to move important configuration information into code is often to clone these sites temporarily until you have working Features and a site deployment module.

For proproduction environments

For some critical projects, you might decide to continually deploy, but not directly to production. In such circumstances, you might have your Jenkins projects continually deploy to a preproduction site (cloned from production before each deployment), to give the team, and the client, a few hours or a day to walk through the changes before approving them for deployment to production.

Conclusion

Since being interested in Drupal dev-stage-prod, deployment and testing, I have often come across colleagues who systematically cloned the database, and have always felt uneasy about it, and in writing this post I have set out to explain why. The post turned out a lot longer than I thought, and the main take-away is that we should all consider our sites as software products, not single-use sites.

As software products, we need standardized deployment methods, both initial and incremental, via a site deployment module.

As software products, we also need to implement modern testing and continuous integration techniques.

As software products, we need to be able to deploy anywhere, with no environment dependant on any other.

Such a focus on reproducibility will hopefully pave the way to more dependable tests, a better understanding of what is content and what is configuration, and faster, more efficient and more consistent development.

Please enable JavaScript to view the comments powered by Disqus.

Dec 13 2013
Dec 13

December 13, 2013

Edit (2016-10-03): This website is no longer Drupal-based.

Deployments are often one of the most pain-inducing aspects of the Drupal development cycle. I have talked to Drupal developers in several shops, and have found that best practices are often ignored in favor of cloning databases downstream, manually reproducing content on prod environments, following a series of error-prone manual steps on each environment, and other bad practices, all of which should be thrown out the door.

In this article I am referring to deployment of site configuration (not content) on a Drupal 7 site. Configuration refers such aspects of your site as the default theme, CSS aggregation status, content types, views, vocabularies, and the like.

Taking best practices to the extreme, it is possible to deploy continually, dozens of times a day, automatically. The following procedure is a proof of concept, and you will probably want to adapt it to your needs, introducing a manual step perhaps, if only to make sure your deployments happen on fixed schedule.

Still, I have started using the exact procedure discussed herein to deploy the website you are currently reading, dcycleproject.org. Furthermore, the code is on Github, so anyone can reproduce the dcycle project site, without the content (I’ll detail how later on).

In an effort to demonstrate that the principles of the Dcycle manifesto work well for a real — albeit simple — website, I have started to deploy changes to the Dcycle website itself via a site deployment module with automatic testing, a continuous integration server (Jenkins), and continuous deployment. In fact, if you visit the dcycle website, you might come across a maintenance page. This is a deployment in action, and chances are it’s happening automatically.

So what is continuous deployment? For our purposes, it is a site development method which follows these principles:

  • The site’s code is under version control (we are using Git).
  • Our site is deployed, initially and incrementally, via a site deployment module.
  • Automatic testing confirms that the features we have developed actually work.
  • Two branches exist: master, on which development occurs, and prod, considered stable, on which all tests have passed.
  • A continuous integration server (in our case Jenkins) monitors the master branch, and moves code to the prod branch only if automated tests pass.
  • The production site is never changed directly, but via a job in the continuous integration server.
  • Once the prod branch is updated, so is the production site.
  • Databases are never cloned, except to move legacy sites to your local development environment. (A legacy site, in the context of this article, is a site which you can’t deploy (minus the content) without cloning the database).

A typical workflow happens like this:

  • Code is committed and pushed to the master branch in the git repo.
  • Jenkins picks up on the change, runs the tests, and they fail. The prod branch and the production site are untouched.
  • The problem leading to the failing test is fixed, and the code is pushed, again to the master branch in the git repo.
  • Jenkins picks up on the change, runs the tests, and this time they pass. The prod branch is updated automatically, and the production site itself is updated. Automatically.

Before getting started, make sure you have the following.

  • A continuous integration server, which can be on your laptop if you wish. Jenkins is easy to set up, and took me five minutes to install on Mac OS, and another five minutes on CentOS. Just follow the instructions.
  • A central git repo. You can fork the code for the dcycleproject.org website if you like.
  • A webserver on your laptop. I am using MAMP.
  • Access to your production website on the command line via SSH.
  • SSH public-private key access to the production server, to avoid being asked for passwords. This is important for Jenkins to modify the production server automatically.

We won’t be using git hooks or Drupal’s GUI.

More often than not, we are working on existing Drupal sites, not new ones, and we don’t have the luxury of redeveloping everything with best practices. So we’ll start with a single issue, either a bug or feature request. Here is a real-life example for the Dcycle website:

I like the idea of each article having its ID reflected in the URL, as is the case with Stack Overflow. I want the path of my articles to be in the format blog/12345/title-of-the-post. I also want it to be possible to shorten the path and have it redirect the full path, so for example blog/12345 redirects to blog/12345/title-of-the-post, as is the case on Stack Overflow.

So, I started out with the goal of implementing this feature using continuous deployment and automated tests.

If your site has a site deployment module or something like it, download your code from git and deploy the site locally, using these commands, substituting your own site deployment module name and database credentials for those in the example:

echo 'create database example' | mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/example --account-name=root --account-pass=root
drush en example_deploy -y

If you want to try this at home and create a local version of the Dcycle website, make sure you have a webserver, PHP and MySQL installed, and run the following commands (if you want to actually modify the code, fork it first and use your project URL instead of mine). This example uses MAMP.

cd /Applications/MAMP/htdocs
git clone https://github.com/alberto56/dcyclesite.git dcyclesample
cd dcyclesample
echo 'create database dcyclesample' | mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/dcyclesample --account-name=root --account-pass=root
drush en dcycle_deploy -y

The above will yield an empty website. Adding some generated content will make development easier:

drush en devel_generate -y
drush generate-content 50

If there is no site deployment site, you can clone the database, but don’t make a habit of it!

To work well with continuous deployment, your site needs to have a consistent way of being initially and incrementally deployed. To achieve this, I recommend the use of a site deployment module.

Create one for your site (one already exists for the Dcycle website code, if you are using that), and make sure the .install file contains everything necessary to deploy your site. To make sure initial deployment and incremental deployment result in the same state, I just call all my update hooks from my install hook, and that has worked fine for me. Your .install file might look something like:

/**
 * @file
 * sites/default/modules/custom/dcycle_deploy/dcycle_deploy.install
 * Initial and incremental deployment of this website.
 */

/**
 * Implements hook_install().
 */
function dcycle_deploy_install() {
  for ($i = 7001; $i < 8000; $i++) {
    $candidate = 'dcycle_deploy_update_' . $i;
    if (function_exists($candidate)) {
      $candidate();
    }
  }
}

/**
 * Admin menu
 */
function dcycle_deploy_update_7007() {
  module_enable(array('admin_menu_toolbar'));
  module_disable(array('toolbar'));
}

...

/**
 * Set dark_elegant as theme
 */
function dcycle_deploy_update_7015() {
  theme_enable(array('dark_elegant'));
  variable_set('theme_default', 'dark_elegant');
}

The above code sets the default theme and changes the toolbar to admin_menu_toolbar, which I prefer.

Because these features were deployed at different times (the theme was changed after the toolbar was changed), numbered update hooks are used.

Notice how the install hook cycles through all the update hooks, ensuring that our initial deployment and incremental deployments result in the same state. For any given environment, now, regardless of the previous state, bringing it up to date is simply a matter of updating the database. The following script can now be used on any environment:

drush vset maintenance_mode 1
drush updb -y
drush cc all
drush vset maintenance_mode 0

The above puts the site in maintenance mode, runs all the update hooks which have not yet been run, clears the caches and takes the site out of maintenance mode.

We now have standardized deployment, both initial and incremental.

For continuous deployment to be of any use, we need to have very high confidence in our tests. A good first step to that end is for our tests to actually exist. And a good way to ensure that your tests exist is to write them before anything else. This is Test-driven development.

If you cloned my git repo for this site, the “short path” feature, introduced above, has already been implemented and tested, so the test passes. Still, here is the code I had written, which initially was failing. You might want to write something similar, or add a test...() function to your .test file, for another feature.

/**
 * @file
 * sites/default/modules/custom/dcycle_deploy/dcycle_deploy.test
 * This file contains the testing code for this module
 */

// Test should run with this number of blog posts.
define('DCYCLE_DEPLOY_TEST_BLOG_COUNT', 5);

/**
 * The test case
 */
class dcyclesiteTestCase extends DrupalWebTestCase {
  /**
   * Info for this test case.
   */
  public static function getInfo() {
    return array(
      'name' => t('dcyclesite: basic test'),
      'description' => t('describe test.'),
      'group' => 'dcyclesite',
    );
  }

  /*
   * Enable your module
   */
  public function setUp() {
    // set up a new site with default core modules, dcyclesite, and
    // dependencies.
    parent::setUp('dcycle_deploy');
  }

  /*
   * Test case for dcyclesite.
   */
  public function testModule() {
    $this->loginAsRole('administrator');
    $blogs = array();
    for ($i = 1; $i <= DCYCLE_DEPLOY_TEST_BLOG_COUNT; $i++) {
      $this->drupalCreateNode(array('type' => 'article', 'title' => 'É' . $blogs[$i] = $this->randomName()));
      foreach (array('blog', 'node') as $base) {
        // passing alias => TRUE, otherwise, the test converts our call
        // to the alias before the query.
        $this->drupalGet($base . '/' . $i, array('alias' => TRUE));
        // assertUrl() does not work here, because the current url (node/1)
        // equals node/1 and equals also its alias. We want it to equal its
        // alias only.
        $url = $this->getUrl();
        global $base_url;
        $expected = $base_url . '/blog/' . $i . '/e' . strtolower($blogs[$i]);
        $this->assertEqual($url, $expected , format_string('Blog can be accessed using @base/x and will redirect correctly because the end url (@url) is equal to @expected.', array('@base' => $base, '@url' => $url, '@expected' => $expected)));
      }
    }
  }

  /*
   * Login as administrator role.
   *
   * This can be a useful for tests in your deployment module, especially
   * if you create several roles in a Feature dependency.
   *
   * @param $role = 'administrator'
   *   Log in as any role, or administrator by default.
   */
  public function loginAsRole($role = 'administrator') {
    // Get all of the roles in the system.
    $roles = user_roles();
    // Find the index for the role we want to assign to the user.
    $index = array_search($role, $roles);
    // Get the permissions for the role.
    $permissions = user_role_permissions(array(array_search($role, $roles) => $role));
    // Create the user with the permissions.
    $user = $this->drupalCreateUser(array_keys($permissions[$index]));
    // Assign the role.
    $user->roles[$index] = $role;
    // Log in as this user
    if (!($user = user_save($user))) {
      throw new Exception(format_string('cannot save user with role @r', array('@r' => $role)));
    }
    $this->drupalLogin($user);
  }

}

Don’t forget to reference your test in your .info file:

...
files[] = dcycle_deploy.test
...

What are we doing in the automatic test, above?

Take a look at the setUp() function, which does everything required to create a new environment of this website. Because we have used a site deployment module, “everything” is simply a matter of enabling that module.

The key here is that whether the scenario works or not on any given environment (local, prod, etc.) is irrelevant: it needs to work based on a known good starting point. Databases are moving targets and it is thus irrelevant to test your code against an existing database (except if you want to monitor your production environment, which is a more advanced use case and outside the scope of this article). Therefore, we need to bring the throw-away testing database to a point where we can test run a test, and know that our test will always yield the same result. The concept of a known good starting point is discussed in the book Continuous Delivery.

Given a new throw-away environment, testModule() now runs the test, defining a scenario which should work: we are logging in as an administrator, creating new blog posts (making sure to use foreign characters in the title), and then making sure the foreign characters are transliterated to ASCII characters and that our content redirects correctly when using only the ID. Let’s enable Simpletest now and make sure our test is visible and fails:

drush en simpletest -y

Now log into your site, and visit admin/config/development/testing, and run your test. If the desired functionality has not yet been developed, your test should fail.

At this point let’s switch gears and focus our energy on making our test pass. This normally involves several code iterations, and running the test dozens of times, until it passes.

An important note for test-driven development: the initial test is an approximation, and may have to be modified during coding. The spirit of the test, as opposed to the letter of the test, should be conserved.

Test-driven development has the interesting side effect that it makes it easier for teams to collaborate: if I am working with a team in a different time zone, it is less error-prone for me to instruct them to “make the test work on branch xyz and then merge it to master”, rather than explain everything I have in mind.

In the case of the task at hand, I wrote some custom code in a new module, dcyclesite, and then enabled some new modules and configuration. Don’t forget, all operations which modify the database have to be done in update hooks. Here is a partial example of how my site deployment module’s .install file looks after I made the test pass:

/**
 * Enable some custom code
 */
function dcycle_deploy_update_7022() {
  # this is where my custom code is; check my github code if you
  # are curious.
  module_enable(array('dcyclesite'));
}

/**
 * Pattern for articles
 */
function dcycle_deploy_update_7023() {
  variable_set('pathauto_node_article_pattern', 'blog/[node:nid]/[node:title]');
}

/**
 * Enable transliteration
 */
function dcycle_deploy_update_7024() {
  module_enable(array('transliteration'));
}

Coming back to continuous deployment, we need our tests to be run every time code is pushed to our master branch, and running our tests will eventually be done by our Jenkins server.

The idea behind Jenkins is quite simple: run a script as a reaction to an event. The event is a change in the master branch. Jenkins, though, does not know how to fiddle around in a GUI. Therefore we must make it possible to run the tests in the command line. Fortunately this is quite easy:

drush test-run dcyclesite

The above runs all tests in the test group “dcyclesite”. Change “dcyclesite” to whatever your group name is. For tests to run correctly from the command line, you must make sure you base_url is set correctly in your sites/default/settings.php file. This depends on your environment but must reflect the URL at which your site is accessible, for example:

$base_url = 'http://localhost:8888/mywebsite';

Now, running drush test-run dcyclesite should fail if there is a failing test. I normally develop code using Simpletest in the GUI, and use the command-line for regression testing in Jenkins.

Now the fun starts: create a Jenkins job to continuously deploy. Simply put, we need our jenkins server to monitor the master branch of git, and if everything passes, move our code to the production branch, and, if we are feeling particularly confident in our tests, deploy to the production site.

Jenkins is one of the easiest pieces of software I have ever installed. You can probably get a CI server up and running in a matter of minutes by downloading the appropriate package on the Jenkins website.

Once that is done, set up Jenkins:

  • Make sure the jenkins user on your Jenkins server has an SSH key pair, and has public-private SSH key access to both the git repo and the production server. Note that this will allow anyone with access to your Jenkins server to access your production server and your git repo, so apply security best practices to your Jenkins server!
  • In Jenkins’s plugin manager page, install the Git plugin and the post-build task plugin, which allows you to add a second script if the first script succeeds.

Now create a single job with the following attributes:

  • Source code management: git.
  • Source code repository URL: the complete URL to your git repo. If you get an error here, make sure you can access it via the command line (you might need to accept the server’s fingerprint). In the “Advanced…” section, set the name of your repo to origin.
  • Branches to build: master.
  • Build triggers: Poll SCM every minute (type “* * * * *”).
  • Add build step: execute shell: drush test-run dcyclesite. If you are on Mac OS X, you might have to add [ $? = 0 ] || exit $? as explained here, otherwise your job will never fail.
  • Add post-build action “git publisher”. Push only if build succeeds to the prod branch of your origin.
  • Add another action, “post-build task”, selecting “Run script only if all previous steps were successful”, and “Escalate script execution status to job status”. This is a script to actually deploy your site.

In the last script, Jenkins will log into your remote site, pull the prod branch, and update your database. You might also want to backup your database here. In my case I have a separate job which periodically backs up my database. Here is some sample deployment code.

# set the site to maintenance mode
ssh [email protected] "cd /path/to/drupal && drush vset maintenance_mode 1 &&
# get the latest version of the code
git pull origin prod &&
# update the database
drush updb -y &&
# set maintenance mode to off
drush vset maintenance_mode 0 &&
# finally clear the cache
drush cc all"

Note that you can also use rsync if you don’t want to have git on your production server. Whatever you use, the trick is for deployments to production to happen through Jenkins, not through a human.

Now save your job and run it. It won’t work yet; don’t worry, this is normal, we haven’t finished yet.

To run tests, Jenkins will need a database, but we haven’t yet set one up. It will also need HTTP access to its workspace. Let’s do all this now.

Return to configure your Jenkins job, and in “Advanced project options”, click “Advanced…”. Click “Use custom workspace” and set a path which will be available via an URL. For example, if your Jenkins server is on your Mac and you are using MAMP, you can set this to /Applications/MAMP/htdocs/mysite.jenkins. This workspace will be available, for example, via http://localhost:8888/mysite.jenkins/.

Switch to your jenkins user and install a plain Drupal site with the Simpletest module:

sudo su jenkins
echo 'create database mysitejenkins' | mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysitejenkins --account-name=root --account-pass=root

Note that you don’t need to deploy your site here using drush en example_deploy -y: all this site really is needed for is hosting tests. So we just need a plain Drupal site, with simpletest enabled:

drush en simpletest -y

Now set the base url of your workspace in sites/default/settings.php:

$base_url = 'http://localhost:8888/mysite.jenkins';

That’s basically all there is to it! However, because of the sheer number of steps involved, it is probable that you will run into a problem and need to debug something or other. I will appreciate hearing from you, noting any pitfalls and comments you may have. With the above steps in place, you will be able to make any change to the codebase, adding tests and function, test locally, and push to master. Then sit back and look at your Jenkins dashboard and your production site. If all goes well, you will see your job kick off, and after a few minutes, if you refresh your production site, you will see it is in maintenance mode. Some time later, your Jenkins job will end in success and your production site, with your new code, will be live again! Now bring your colleagues into the fold: this technique scales very well.

Please enable JavaScript to view the comments powered by Disqus.

Nov 22 2013
Nov 22

November 22, 2013

In a Drupal development-staging-production workflow, the best practice is for new features and bug fixes to be developed locally, then moved downstream to the staging environment, and later to production.

Just how changes are pushed downstream varies, but typically the process includes Features, manual changes to the production user interface, drush commands, and written procedures.

Some examples include:

  • A view which is part of a Feature called xyz_feature is modified; the feature is updated and pushed to the git repo; and then the feature is reverted using drush fr xyz_feature on the production site.
  • A new default theme is added to the development site and tested, and pushed to the git repo; and then the new theme is selected as default on the production site’s admin/appearance page.
  • Javascript aggregation is set on the dev site’s admin/config/development/performance page, and once everything works locally, it is set on the production via the user interface.

This approach is characterized by the following properties:

  • Each incremental deployment is different and must be documented as such.
  • If there exist several environments, one must keep track manually of what “remains to be done” on each environment.
  • The production database is regularly cloned downstream to a staging environment, but it is impossible to tell when was the last time it was cloned.
  • If an environment is out of date and does not contain any important data, it can be deleted and the staging environment can be re-cloned.
  • Many features (for example javascript aggregation) are never in version control, at best only documented in an out-of-date wiki, at worst in the memory of a long-gone developer.
  • New developers clone the staging database to create a local development environment.
  • Automated functional testing by a continuous integration server, if done at all, uses a clone of the staging database.

The main issue I have with this approach is that it overly relies on the database to store important configuration, and the database is not under version control. There is no way to tell who did what, and when.

Using a deployment module aims to meet the following goals:

  • Everything except content should be in version control: views, the default theme, settings like Javascript aggregation, etc.
  • Incremental deployments should always be performed following the same procedure.
  • Initial deployments (for example for a new developer or for a throwaway environment during an automated test) should be possible without cloning the database.
  • Tests should be run agains a known-good starting point, not a clone of a database.
  • New developers should be up and running without having to clone a database.

Essentially, anything not in version control is unreliable, and cloning the database today can yield a bug which won’t be present if you clone the database tomorrow. So we’ll avoid cloning the database in most cases.

The Dcycle manifesto states that each site should have a deployment module whose job it is to keep track of deployment-related configuration. Once you have settled on a namespace for your project, for example example, by convention your deployment module should reside in sites/*/modules/custom/example_deploy.

Let’s now say that we are starting a project, and our first order of business is to create a specific view: we will create the view, export it as a feature, and make the feature a dependency of our deployment module. Starting now, if all your code is under version control, all new environments (production, continuous integration, testing, new local sites) are deployed the same way, simply by creating a database and enabling the deployment module. Using Drush, you would call something like:

echo 'create database example' | mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/example --account-name=root --account-pass=root
drush en example_deploy -y

The first line creates the database; the second line is the equivalent of clicking though Drupal’s installation procedure; and the third line activates the deployment module.

Because you have set your feature to be a dependency of your deployment module, it is activated and your view is deployed.

We want the incremental deployment procedure to always be the same. Also, we don’t want it to involve cloning the database, because the database is in an unknown state (it is not under version control). Another reason we don’t want to clone the database is because we want to practice our incremental deployment procedure as must as possible, ideally several times a day, to catch any problems before we apply it to the production site.

My incremental deployment procedure, for all my Drupal projects, uses Drush and Registry rebuild, and goes as follows once the new code has been fetched via git:

drush rr
drush vset maintenance_mode 1
drush updb -y
drush cc all
drush cron
drush vset maintenance_mode 0

The first line (drush rr) rebuild the registry in case we moved module files since the last deployment. A typical example is moving contrib modules from sites/all/modules/ to sites/all/modules/contrib/: without rebuilding the registry, your site will be broken and all following commands will fail.

drush vset maintenance_mode 1 sets the site to maintenance mode during the update.

drush updb -y runs all update hooks for contrib modules, core, and, importantly, your deployment module (we’ll get back to that in a second).

drush cc all clears all caches, which can fix some problems during deployment.

On some projects, I have found that running drush cron at this point helps avoid hard-to-diagnose problems.

Finally, move your site out of maintenance mode: drush vset maintenance_mode 0.

Our goal is for all our deployments (features, bug fixes, new modules…) to be channelled though hook_update_N()s, so that the incremental deployment procedure introduced above will trigger them. Simply, hook_update_N() are functions which are called only once for each environment.

Each environment tracks the last hook_update_N() called, and when drush updb -y is called, it checks the code for new hook_update_N() and runs them if necessary. (drush updb -y is the equivalent of visiting the update.php page, but the latter method is unsupported by the Dcycle procedure, because it requires managing PHP timeouts, which we don’t want to do).

hook_update_N()s is the same tried-and-true mechanism used to update database schemas for Drupal core and contrib modules, so we are not introducing anything new.

Now let’s see how a few common tasks can be accomplished the Dcycle way:

Instead of fiddling with the production environment, leaving no trace of what you’ve done, here is an ideal workflow for enabling Javascript aggregation:

First, in your issue tracker, create an issue explaining why you want to enable aggregation, and take note of the issue number (for example #12345).

Next, figure out how to enable aggregation in code. In this case, a little reverse-engineering is required: on your local site, visit admin/config/development/performance and inspect the “Aggregate JavaScript files” checkbox, noting its name property: preprocess_js. This is likely to be a variable. You can confirm that it works by calling drush vset preprocess_js 1 and reloading admin/config/development/performance. Call drush vset preprocess_js 0 to turn it back off again. Many configuration pages work this way, but in some cases you’ll need to work a bit more in order to figure out how to affect a change programmatically, which has the neat side effect of providing you a better understanding of how Drupal works.

Now, simply add the following code to a hook_update_N() in your deployment module’s .install file:

/**
 * #12345: Enable javascript aggregation
 */
function example_deploy_update_7001() {
  variable_set('preprocess_js', 1);
  // you can also do this with Features and the Strongarm module.
}

Now, calling drush updb -y on any environment, including your local environment, should enable Javascript aggregation.

It is important to realize that hook_update_N()s are only called on environments where the deployment module is already in place, and not on new deployments. To make sure that new deployments and incremental deployments behave similarly, I call all my update hooks from my hook_install, as described in a previous post:

/**
 * Implements hook_install().
 *
 * See http://dcycleproject.org/node/43
 */
function example_deploy_install() {
  for ($i = 7001; $i < 8000; $i++) {
    $candidate = 'example_deploy_update_' . $i;
    if (function_exists($candidate)) {
      $candidate();
    }
  }
}

Once you are satisfied with your work, commit it to version control:

git add sites/all/modules/custom/example_deploy/example_deploy.install
git commit -am '#12345 Enabled javascript aggregation'
git push origin master

Now you can deploy this functionality to any other environment using the standard incremental deployment procedure, ideally after your continuous integration server has given you the green (or in the case of Jenkins, blue) light.

If we already have a feature which is a dependency of our deployment module, we can modify our view; update our features using the Features interface at admin/structure/features or using drush fu xyz_feature -y; then adding a new hook_update_N() to our deployment module:

/**
 * #12346: Change view to remove html tags from trimmed body
 */
function example_deploy_update_7002() {
  features_revert(array('xyz_feature' => array('views_view')));
}

In the above example, views_view is the machine name of the Features component affecting views. If you want to revert other components, make sure you’re using the 2.x branch of Features, visit the page at admin/structure/features/xyz_feature/recreate (where xyz_feature is the machine name of your feature), and you’ll find the machine names of each component next to its human name (for example node for content types, filter for text formats, etc.).

Say we create a new default theme xyz and want to enable it:

/**
 * #12347: New theme for the site
 */
function example_deploy_update_7003() {
  theme_enable(array('xyz'));
  variable_set('theme_default', 'xyz');
}

I normally remove toolbar on all my sites and put admin_menu’s admin_menu_toolbar instead. To deploy the change, add admin_menu to sites/*/modules/contrib and add the following code to your deployment module:

/**
 * #12348: Add a drop-down menu instead of the default menu for admins
 */
function example_deploy_update_7004() {
  // make sure admin_menu has been downloaded and added to your git repo,
  // or this will fail.
  module_enable(array('admin_menu_toolbar'));
  module_disable(array('toolbar'));
}

Of course, nothing prevents clueless users from modifying views, modules and settings on the production site directly, so I like to add hook_requirements() to perform certain checks on each environment: for example, if Javascript aggregation is turned off, you might see a red line on admin/reports/status saying “This site is designed to use Javascript aggregation, please turn it back on”. You might also check that all your Features are not overridden, that the right theme is on etc. If this technique is used correctly, when a bug is reported on the production site, the admin/reports/status page will let you know if any settings on the production site are not what you intended, and what your automated tests expect.

Now that everything we do is in version control, we no longer need to clone databases, except in some very limited circumstances. We can always fire up a new environment and add dummy content for development or testing; and, provided we’re using the same commit and the same operating system and version of PHP, etc., we’re sure to always get the same result (which is not the case with database cloning).

Specifically, I normally add a .test file in my deployment module which enables the deployment module on a test environment, and runs tests to make sure things are working as expected.

Once that is done, it becomes easy to create a Jenkins continuous integration job to monitor the master branch, and confirm that a new environment can be created and simpletests pass.

Please enable JavaScript to view the comments powered by Disqus.

Nov 13 2013
Nov 13

November 13, 2013

I recently inherited a Drupal project which periodically imported content from a Nuxeo server, synchronizing it with Drupal nodes, thus creating, updating and deleting nodes as need be. Nuxeo content was in no case modified by Drupal.

The Nuxeo server was set up by a third-party provider with whom I had no contact.

The site was not using mock objects or automated testing. A custom Drupal module was used, which leveraged the CMIS module.

A series of problems were occurring with the setup, among them:

  • Nuxeo Categories were supposed to map to Drupal taxonomy terms, but if a category was deleted from Nuxeo, the corresponding taxonomy term was not removed from the node in Drupal
  • If more than one category was added to a Nuxeo content, only the first was imported to Drupal
  • The site used what seemed like a custom implementation of the Nuxeo API, so it was hard to get help from the community. The custom implementation returns Nuxeo content IDs for some contents and Nuxeo revision IDs for others. After some testing, I did not manage to figure out in which circumstances content IDs or revision IDs were used.
  • The Nuxeo server’s clock was a few minutes late, and the custom module was comparing timestamps rather than revision numbers. As a result, if a Nuxeo content was modified less than five minutes after its previous modification, synchronisation did not occur correctly.

These problems had a common symptom from the client’s perspective: “synchronisation does not happen correctly”. They also caused a common emotion: frustration.

In order to implement a robust fix, trial and error was not enough. Here are the steps I followed to reproduce the problems, implement testing, and finally fix them.

Making sure I had access to a “staging” nuxeo folder allowed me do do testing without messing up production data; I could also control the number of content items, making testing that much faster. This was a convenient stop-gap measure until I could set up mock objects and internal testing.

This might go without saying, but of course you should have a local environment before attempting to modify Drupal. We use git for version control, and I just grabbed a copy of the production database to my local dev site. Because Drupal is only reading Nuxeo data, not modifying it, this is not too risky.

Before setting up a true mock object for interaction with Nuxeo, I first had to set up a level of abstraction, in effect a sort of switch, between Drupal and Nuxeo. Later on I would plug in a mock object.

The CMIS module is not using automated testing, and does not seem to allow for any form of mocking or simulation, from what I can tell (a search of the strings ‘mock’, ‘simulate’, or ‘simulation’ came up empty for that project’s code).

So now I had to decide between these two approaches:

  • provide a patch for CMIS providing mock object functionality.
  • provide a level of abstraction between the custom code and CMIS itself.

Because there is no real standard way (yet) to provide mock object functionality in Drupal modules, I have been using my own solution for a few projects: the Mockable module. Although I have released a beta version, the module is not widely used and I would rather wait for this or some other solution to be more accepted before submitting patches for third-party modules. I therefore decided to use Mockable between my own module and CMIS.

The Mockable module is not meant to be active on production sites. Here is how I used it:

First, I downloaded Mockable and activate the mockable module (but not the other modules in the Mockable project) on the local development site.

Then, I identified the lines of code in my custom module which interacted with an external system (in this case the CMIS module). Here is one example:

...
$object = cmisapi_getProperties('default',$document_id);
...

cmisapi_getProperties() is defined in CMIS and I did not want to modify that module, so I am going to mock it instead.

I started by installing Devel, and calling my custom code from the devel/php page with different sets of data on my Nuxeo staging environment. Adding a var dump or dpm() call helped me figure out the structure of the response from cmisapi_getProperties() in different circumstances:

...
$object = cmisapi_getProperties('default',$document_id);
// this should be removed after testing. dpm() is defined in the devel
// module.
dpm($object);
...

Once I had a good idea of how this function works, I defined a new set of functions instead of calling cmisapi_getProperties() directly:

...
$object = cmis_ms_get_properties('default',$document_id);
...

/**
 * Mockable version of CMIS's cmisapi_getProperties().
 */
function cmis_ms_get_properties($type, $document_id) {
  if (module_exists('mockable')) {
    // if the Mockable function is active, as it might be on testing
    // and dev environments (not on prod), then call cmisapi_getProperties()
    // or cmisapi_getProperties_mock() (if it exists) depending on whether
    // mocking is turned on or not.
    $return = mockable('cmisapi_getProperties', $address, $document_id);
  }
  else {
    $return = cmisapi_getProperties($address, $document_id);
  }
  return $return;
}

Now that my abstraction layer was in place, all I had to do was define some functions and objects to replace, in my developement and continuous environments, those used in production.

/**
 * Mock version of CMIS's cmisapi_getProperties(), which will be called
 * instead of cmisapi_getProperties() if the Mockable version is installed
 * and mocking is turned on (using drush mockable-set).
 */
function cmis_ms_get_properties_mockable_mock($type, $document_id) {
  $return = new stdClass;
  ...

  // Do whatever you want here to best simulate all possible responses of
  // the real cmisapi_getProperties()

  return $return;
}

cmisapi_getProperties() was not the only function which interacted with the third-party system. new SoapClient($address, $options) and other such calls were scattered across my code. I had to figure out how each of these worked and mock them appropriately.

Your mock objects or mock functions can be as simple or complex as you need them to be. In my case I am using variables to switch between different simulated behaviours. For example, I can easily simulate a timeout or 500 error on my remote system.

Now that my mock function is in place, I can turn on mocking. With the Mockable module, this can be done with a GUI or with Drush:

drush mockable-set

Starting now, I need to understand and reproduce exactly, in my mock object, how each error occurs.

In the case of taxonomy import problems, I can set cmis_ms_get_properties_mockable_mock() to always return two categories, and confirm that they don’t get imported into Drupal.

Reproducing the problem manually with a mock object is a step in the right direction, but we need to make sure that once it’s fixed, it stays fixed. To do that I added the following .test file to my custom module (and linked to it in my .info file).

/**
 * The test case
 */
class mymoduleTestCase extends DrupalWebTestCase {
  /**
   * Info for this test case.
   */
  public static function getInfo() {
    return array(
      'name' => t('mymodule: basic test'),
      'description' => t('describe test.'),
      'group' => 'mymodule',
    );
  }

  /*
   * Enable your module
   */
  public function setUp() {
    // set up a new site with default core modules, mymodule, and
    // dependencies.
    parent::setUp('mymodule', 'mockable');
  }

  /*
   * Test case for mymodule.
   */
  public function testModule() {
    // start using mock objects
    mockable_set();
  
    // sync with our mock version of Nuxeo, in which documents all have
    // two categories. Note that before adding this function to a test,
    // I had to modify it to use mockable functions and objects instead
    // of always interacting with external servers. Because we called
    // mockable_set(), above, if we have correctly defined mock objects,
    // the external server should not be hit at this point. A good way
    // of making sure is to deactivate internet access during the local
    // test.
    mymodule_sync_nuxeo();
  
    $node = node_load(1);
  
    $taxonomy_count = count($node->field_tags[LANGUAGE_NONE]);
    $this->assertTrue($taxonomy_count == 2, format_string('We were expecting 2 taxonomy terms and we have obtained @count', array('@count' => $taxonomy_count)));
  }
}

When I ran this test, I could confirm that the test failed because even though my mock object was defining two categories, only the first ended up as a taxonomy term on my node.

Now that I had a failing test, my job was to make sure the test passed. Now the trial and error phase can really being:

  1. Try something in code (note: both your test and your logic are “code”).
  2. Run the test.
  3. If the test still fails, make sure your test’s logic makes sense and go back to step 1.
  4. Do a manual test. If it fails, go back to step 1.
  5. If the test passes, do a manual test, commit your code and push to master.

To avoid this test being broken by another change in the future, you can set up a Continous integration server (Jenkins, for example), and set it up so that it runs your test, and indeed all tests for your project, on each commit:

drush test-run mymodule

Only once all of this is done, can we be confident to show our fix to the client, and mark it as fixed. A bug should be marked as fixed, or a new feature marked as done, when:

  • A test exists.
  • Mock objects are used to define external system behaviour.
  • The test passes.
  • Ideally, the test is checked with every new commit to avoid regressions.

Please enable JavaScript to view the comments powered by Disqus.

Nov 13 2013
Nov 13

November 13, 2013

Let’s say you are working locally and you need to add a new module to the site. Here is an example with Login Toboggan:

Let’s start by downloading the module to our local dev site

drush dl logintoboggan

Instead of enabling it outright, we’ll want to add it as a dependency to our deploy module, and define a hook_update_n() to enable it on existing environments.

; in your deploy module's .info file
dependencies[] = logintoboggan

/**
 * Enable logintoboggan in our deploy module's .install file
 */
function mysite_deploy_update_7001() {
  module_enable(array('logintoboggan'));
}

Deployable module settings need to be in code as well, so we can have the same configuration on all environments (local, development, stage, production). Here is an example with Login Toboggan’s user login form on Access Denied:

/**
 * Set logintoboggan to put login form on 403 pages
 */
function mysite_deploy_update_7002() {
  variable_set('site_403', 'toboggan/denied');
}

Make sure the above is also called during initial deployment, in your deployment module’s .install file (don’t forget: your continuous integration server and new members of the team will need to deploy new environments for your site):

/**
 * Implements hook_install().
 */
function mysite_deploy_install() {
  mysite_deploy_update_7002();
}

Now, you can enjoy this new functionality either on a new site environment by enabling mysite_deploy, or an an existing environment by updating your database:

drush updb -y

However, if you are running a multilingual site, the newly-added Login Toboggan module will not be translated. I avoid fetching the translations from localize.drupal.org in the update hook, because you can’t be sure it will work on all environments (for example, some clients disable internet access on their web servers for security reasons).

Rather, I prefer to download the translations myself and import them from the local repo in the update hook. Here’s how:

  • Start by visiting http://localize.drupal.org and finding your project
  • In the export tab, I like to check the Add Suggestions box, so, in the case of strings without “official” translations, at least I have something.
  • Click Export Gettext file, which will save a .po or .po.txt file to your computer, for example, for Login Toboggan in French, the file is logintoboggan-all.fr.po.txt
  • I put this file in my deployment module, for example sites/all/modules/custom/mysite_deploy/translations/ (if you have a .txt extension you can remove it)

Next, import in an install hook

/**
 * Import translations for Login Toboggan.
 */
function mysite_deploy_update_7003() {
  _mysite_deploy_import_po('logintoboggan-all.fr.po');
}

/**
 * Import translation po file
 *
 * @param $filename
 *   A filename in mysite_deploy/translations/
 */
function _mysite_deploy_import_po($file) {
  try {
    // Figure out the full filepath
    $filepath = DRUPAL_ROOT . '/' . drupal_get_path('module', 'mysite_deploy') . '/translations/' . $file;
    // move the contents of the file to the public stream. I can't get
    // _locale_import_po() to work with the file directly.
    $contents = file_get_contents($filepath);
    // In some cases the destination file might already exist, so I'll create a
    // random name; there probably is a better way of doing this...
    $random = md5($filepath) . rand(1000000000, 9999999999);
    $file = file_save_data($contents, 'public://po-import' . $random . '.po');
    // finally import the file from the public stream.
    _locale_import_po($file, 'fr', LOCALE_IMPORT_OVERWRITE, 'default');
  }
  catch (Exception $e) {
    // don't break the update process for translations.
    drupal_set_message(t('Oops, could not import the translation @f. Other updates will still take place (@r).', array('@f' => $filename, '@r' => $e->getMessage())));
  }
}

Don’t forget to call mysite_deploy_update_7003() from mysite_install(), to make sure that the translations are available to new deployments as well.

There is room for improvement in the above code, I admit, but I like this approach because it’s testable and does not depend on internet access during updates. Also, the same approach can be used to translate your custom modules.

Update: thanks to mikran for implementing a module (for now in sandbox mode) which expands on this idea.

Please enable JavaScript to view the comments powered by Disqus.

Nov 11 2013
Nov 11

November 11, 2013

Examples like these are rampant throughout Drupal 7, in block_admin_display_form_submit(), for example:

/**
 * Form submission handler for block_admin_display_form().
 *
 * @see block_admin_display_form()
 */
function block_admin_display_form_submit($form, &$form_state) {
  $transaction = db_transaction();
  try {
    foreach ($form_state['values']['blocks'] as $block) {
      $block['status'] = (int) ($block['region'] != BLOCK_REGION_NONE);
      $block['region'] = $block['status'] ? $block['region'] : '';
      db_update('block')
        ->fields(array(
          'status' => $block['status'],
          'weight' => $block['weight'],
          'region' => $block['region'],
        ))
        ->condition('module', $block['module'])
        ->condition('delta', $block['delta'])
        ->condition('theme', $block['theme'])
        ->execute();
    }
  }
  catch (Exception $e) {
    $transaction->rollback();
    watchdog_exception('block', $e);
    throw $e;
  }
  drupal_set_message(t('The block settings have been updated.'));
  cache_clear_all();
}

What’s wrong with this is that the logic for performing the desired action (moving a block to a different region) is tied to the use of a form, in the GUI. Let’s say you want a third-party module to move the navigation and powered-by block out of the theme xyz, one would expect there to exist, in the API, a function resembling block_move($blocks, $region, $theme), but there is no such function.

On a recent project where I needed to do exactly that, I installed and enabled devel, and put dpm($form); dpm($form_state); at the top of the block_admin_display_form_submit() function then went through the actions in the GUI, to figure out what Drupal is doing, finally coming up with this code.

foreach (array('navigation', 'powered-by') as $block) {
  $num_updated = db_update('block') // Table name no longer needs {}
    ->fields(array(
      'region' => '-1',
    ))
    ->condition('delta', $block, '=')
    ->condition('theme', 'xyz', '=')
    ->execute();
}

That’s reverse engineering, and it’s time-consuming, error-prone, and developer-unfriendly.

Recently in one of my own modules I realized I had made the same mistake.

The correct approach is to define an api (for example a function like block_move() in the above example, and call that API function from your form- and GUI-related functions like block_admin_display_form_submit()).

The result will be that developers will have as easy a time interacting with your module as human users. This will open the door to third-party interaction, adding value to your module.

Also, it will allow you to run more automated tests without actually loading pages, which is faster.

Please enable JavaScript to view the comments powered by Disqus.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web