Apr 07 2019
Apr 07

April 07, 2019

Accessibility tests can be automated to a degree, but not completely; to succeed at accessibility, it needs to be a mindset shared by developers, UX and front-end folks, business people and other stakeholders. In this article, we will attempt to run tests and produce meaningful metrics which can help teams who are already committed to produce more accessible websites.

Premise

Say your team is developing a Drupal 8 site and you have decided that you want to reduce its accessibility issues by 50% over the course of six months.

In this article, we will look at a subset of accessibility issues which can be automatically checked – color contrast, placement of tags and HTML attributes, for example. Furthermore, we will only test the code itself with some dummy data, not actual live data or environment. Therefore, if you use the approach outlined in this article, it is best to do so within a global approach which includes stakeholder training; and automated and manual monitoring of live environments, all of which are outside the scope of this article.

Approach

Your team is probably perpetually “too busy” to fix accessibility issues; and therefore too busy to read and process reports with dozens, perhaps hundreds, of accessibility problems on thousands of pages.

Instead of expecting teams to process accessibility reports, we will use a threshold approach:

First, determine a standard towards which you’d like to work, for example WCAG 2.0 AA is more stringent than WCAG 2.0 A (but if you’re working on a U.S. Government website, WCAG 2 AA is mandated by the Americans with Disabilities Act). Be realistic as to the level of effort your team is ready to deploy.

Next (we’ll see how to do this later), figure out which pages you’d like to test against: perhaps one article, one event page, the home page, perhaps an internal page for logged in users.

In this article, to keep things simple, we’ll test for:

  • the home page;
  • an public-facing internal page, /node/1;
  • the /user page for users who are logged in;
  • the node editing form at /node/1/edit (for users who are logged in, obviously).

Running accessibility checks on each of the above pages, we will end up with our baseline threshold, the current number of errors, for example this might be:

  • 6 for the home page
  • 6 for /node/1
  • 10 for /user
  • 10 for /node/1/edit

We will then make our tests fail if there are more errors on a given page than we allow for. The test should pass at first, and this approach meets several objectives:

  • First, have an idea of the state of your site: are there 10 accessibility errors on the home page, or 1000?
  • Fail immediately if a developer opens a pull request where the number of accessibility errors increases past the threshold for any given page. For example, if a widget is added to the /user page which makes the number of accessibility errors jump to 12 (in this example), we should see a failure in our continuous integration infrastructure because 12 >= 10.
  • Provide your team with the tools to reduce the threshold over time. Concretely, a discussion with all stakeholders can be had once the initial metrics are in place; a decision might be made that we want to reduce thresholds for each page by 50% within 6 months. This allows your technical team to justify the prioritization of time spent on accessibility fixes vs. other tasks seen by able-bodied stakeholders as having a more direct business value.

Principles

Principle #1: Docker for everything

Because we want to run tests on a continuous integration server, we want to avoid dependencies. Specifically, we want a system which does not require us to install specific versions of MySQL, PHP, headless browsers, accessibility checkers, etc. All our dependencies will be embedded into our project using Docker and Docker Compose. That way, all you need to install in order to run your project and test for accessibility (and indeed other tests) is Docker, which in most cases includes Docker Compose.

Principle #2: A starter database

In our continous integration setup, will will be testing our code on every commit. Although it can be useful to test, or monitor, a remote environment such as the live or staging site, this is not what this article is about. This means we need some way to include dummy data into our codebase. We will do this by adding dummy data into a “starter database” committed to version control. (Be careful not to rely on this starter database to move configuration to the production site – use configuration management for that – we only want to store dummy data in our starter database; all configuration should be in code.) In our example, our starter database will contain node/1 with some realistic dummy data. This is required because as part of our test we want to run accessibility checks agains /node/1 and /node/1/edit.

A good practice during development would be that for new data types, say a new content type “sandwich”, a new version of the starter database be created with, say, node/2 of type “sandwich”, with realistic data in all its fields. This will allow us to add an accessibility test for /node/2, and /node/2/edit if we wish.

Don’t forget, as per principle #1, above, you will never need to install anything other than Docker on your computer or CI server, so don’t attempt to install these tools locally, they will run on Docker containers which will be built automatically for you.

  • Pa11y: There are dozens of tools to check for accessibility; in this article we’ve settled on Pa11y because it provides clear error reports; and allows the concept of a threshold above which the script fails.
  • Chromium: In order to check a page for accessibility issues without actually having a browser open, a so-called headless browser is needed. Chromium is a fully functional browser which works on the command line and can be scripted. This works under the hood and you will have no need to install it or interact with it directly, it’s just good to know it’s there.
  • Puppeteer: most accessibility tools, including Pa11y, are good at testing one page. Say, if you point Pa11y to /node/1 or the home page, it will generate nice reports with thresholds. However if you point Pa11y to /user or /node/1/edit it will see those pages anonymously, which is not what we want to test. This is where Puppeteer, a browser scripting tool, comes into play. We will use Puppeteer later on to log into our site and save the markup of /user and /node/1/edit as /dom-captures/user.html and /dom-captures/node-1-edit.html, respectively, which will then allow Pa11y to access and test those paths anonymously.
  • And of course, Drupal 8, although you could apply the technique in this article to any web technology, because our accessibility checks are run against the web pages just like an end user would see them; there is no interaction with Drupal.

Setup

To follow along, you can install and start Docker Desktop and download the Dcycle Drupal 8 starterkit.

git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/deploy.sh

You are also welcome to fork the project and link it to a free CircleCI account, in which case continuous integration tests should start running immediately on every commit.

A few minutes after running ./scripts/deploy.sh, you should see a login link to a full Drupal installation on a random local port (for example http://0.0.0.0:32769) with some dummy data (/node/1). Deploying this site locally or on a CI server such as Circle CI is a one-step, one-dependency process.

In the rest of this article we will refer to this local environment as http://0.0.0.0:YOUR_PORT; always substitute your own port number (in our example 32769) for YOUR_PORT.

Introducing Pa11y

We will use a Dockerized version of Pa11y, dcycle/pa11y, here is how it works against, say, amazon.com:

docker run --rm dcycle/pa11y:1 https://amazon.com

No site that I know of has zero accessibility issues; so you’ll see a bunch of issues in this format:

• Error: This element's role is "presentation" but contains child elements with semantic meaning.
  ├── WCAG2AA.Principle1.Guideline1_3.1_3_1.F92,ARIA4
  ├── #navFooter > div:nth-child(2)
  └── <div class="navFooterVerticalColumn navAccessibility" role="presentation"><div class="navFooterVerticalRo...</div>

Running Pa11y against a local site

Developers and continuous integration servers will need to run Pa11y against a local site. We would be tempted to run Pa11y on 0.0.0.0:YOUR_PORT, but that won’t work because Pa11y is being run inside its own container and will not have access to the host machine. You could give it access, but that raises another issue: the port is not guaranteed to be the same at every run, which requires ugly logic to figure out the port. Ugh! Instead, we will attach Pa11y to the Docker network used by our Starter site, in this case called starterkit_drupal8site_default (you can use docker network ls to list networks). Because our docker-compose.yml file defines the Drupal container as having the name drupal and port 80 (the default port), we can now run:

docker run --network starterkit_drupal8site_default \
  --rm dcycle/pa11y:1 http://drupal

This has some errors, just as we expected. Before doing anything else, type echo $?; this will give a non-zero code, meaning running this will make your continuous integration script fail. However, because we decided earlier that we will tolerate, for now, 6 errors on the home page, let’s set a threshold of 6 (or however many errors you get – there are 6 at the time of this writing) instead of the default zero:

docker run --network starterkit_drupal8site_default \
  --rm dcycle/pa11y:1 http://drupal --threshold 6

If you run echo $? right after, you should get the “passing” exit code of zero. There, we’ve met our threshold, so we will not have a failure!

How about pages where you need to be logged in?

The above solution breaks down, though, when you want to test http://drupal/node/1/edit. Although it will produce results, what we are actually checking against here is the “Access denied” page, not /node/1/edit when we are logged in. We will approach this in the following way:

  • Set a random password for user 1;
  • Use Puppeteer (see “Tools”, above) to click around your local site with its dummy data, do whatever you want to, and, every step of the way, save the DOM (the document object model, or the current markup after it has been processed by Javascript) as a temporary flat file, named, say, http://drupal/dom-captures/user.html;
  • Use Pa11y to test the temporary file we just created.

Putting it all together

In our Drupal 8 Starterkit, we can test the entire process. Start by running the Puppeteer script:

./scripts/end-to-end-tests.sh

What does this look like?

Astute readers have realized that using Puppeteer to click through the site to create our dom captures has the added benefit of confirming that our site functionality works as expected, which is why I called the script end-to-end-tests.sh.

To confirm this actually worked, you can visit, in an incognito window:

Yes it looks like you’re logged in, but you are not: these are anonymous webpages which Pa11y can check.

So if this worked correctly (and it should, because we hav it under continuous integration), we can run our Pa11y tests agains all these pages:

./scripts/a11y-tests.sh
echo $?

You will see the errors, but because the number of errors is below our threshold, the exit code will be zero, allowing our Continuous integration tests to pass.

Conclusion

Making a site accessible is, in my opinion, akin to making a site secure: it is not something to add to a to-do list, but rather an approach including all site stakeholders. Neither is accessibility something which can be automated; it really is a team culture. However, approaches like the one outlined in this article, or whatever works in your organization, will give teams metrics to facilitate the integration of accessibility into their day-to-day operations.

Please enable JavaScript to view the comments powered by Disqus.

Mar 14 2019
Mar 14

March 14, 2019

Often, during local Drupal development (or if we’re really unlucky, in production), we get the dreaded message, “Unable to send e-mail. Contact the site administrator if the problem persists.”

This can make it hard to debug anything email-related during local development.

Enter Mailhog

Mailhog is a dummy SMTP server with a browser GUI, which means you view all outgoing messages with a Gmail-type interface.

It is a major pain to install, but we can automate the entire process with the magic of Docker.

Let’s see how it works, and discuss after. Follow along by installing Docker Desktop – no other dependencies are required – and installing a Drupal 8 starterkit:

git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/deploy.sh

This will install the following Docker containers: a MySQL server with a starter database, a configured Drupal site, and Mailhog. You wil see something like this at the end of the output:

If all went well you can now access your site at:

=> Drupal: http://0.0.0.0:32791/user/reset/...
=> Dummy email client: http://0.0.0.0:32790

You might be seeing different port numbers instead of 32791 and 32790, so use your own instead of the example ports.

Now, the magic

(In my example, DRUPAL_PORT is 32791 and MAILHOG_PORT is 32790. In your case it will probably be different.)

As you can see, all emails produced by Drupal are now visible on a cool GUI!

So how does it work?

A dedicated “Mailhog” docker container, using on the Mailhog Docker image is defined in our docker-compose.yml file. It exposes port 8025 for public GUI access, which is mapped to a random unused port on the host computer (in the above example, 32790). Port 1025 is the SMTP mailhog port as you can see in the Mailhog Dockerfile. We are not mapping port 1025 to a random port on the host computer because it’s only needed in the Drupal container, not the host machine.

In the same docker-compose.yml, the “drupal” container (service) defines a link to the “mail” service; this means that when you are inside the Drupal container, you can access Mailhog SMPT server “mail” at port 1025.

In the Starterkit’s Dockerfile, we download the SMTP modules, and in our configuration, we install SMTP (0, in this case, is the module’s weight, it doesn’t mean “disabled”!).

Next, configuration: because this is for local development, we are leaving SMTP off in the exported configuration; in production we don’t want SMTP to link to Mailhog. Then, in our overridden settings, we enable SMTP and set the server to “mail” and the port to 1025.

Now, you can debug sent emails in a very realistic way!

You can remove the starterkit environment by running:

docker-compose down -v

Please enable JavaScript to view the comments powered by Disqus.

Oct 27 2018
Oct 27

October 27, 2018

This article discusses how to use HTTPS for local development if you use Docker and Docker Compose to develop Drupal 7 or Drupal 8 (indeed any other platform as well) projects. We’re assuming you already have a technique to deploy your code to production (either a build step, rsync, etc.).

In this article we will use the Drupal 8 site starterkit, a Docker Compose-based Drupal application that comes with everything you need to build a Drupal site with a few commands (including local HTTPS); we’ll then discuss how HTTPS works.

If you want to follow along, install and launch the latest version of Docker, make sure ports 80 and 443 are not used locally, and run these commands:

cd ~/Desktop
git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/https-deploy.sh

The script will prompt you for a domain (for example my-website.local) to access your local development environment. You might also be asked for your password if you want the script to add “127.0.0.1 my-website.local” to your /etc/hosts file. (If you do not want to supply your password, you can add that line to /etc/hosts before running ./scripts/https-deploy.sh).

After a few minutes you will be able to access a Drupal environment on http://my-website.local and https://my-website.local. For https, you will need to explicitly accept the certificate in the browser, because it’s self-signed.

Troubleshooting: if you get a connection error, try using an incongnito (private) window in your browser, or a different browser.

Being a security-conscious developer, you probably read through ./scripts/https-deploy.sh before running it on your computer. If you haven’t, you are encouraged to do so now, as we will be explaining how it works in this article.

You cannot use Let’s Encrypt locally

I often see questions related to setting up Let’s Encrypt for local development. This is not possible because the idea behind Let’s Encrypt is to certify that you own the domain on which you’re working; because no one uniquely owns localhost, or my-project.local, no one can get a certificate for it.

For local development, the Let’s Encrypt folks suggest using trusted, self-signed certificates instead, which is what we are doing in our script.

(If you are interested in setting up Let’s Encrypt for a publicly-available domain, this article is not for you. You might be interested, instead, in Letsencrypt HTTPS for Drupal on Docker and Deploying Letsencrypt with Docker-Compose.)

Make sure your project works without https first

So let’s look at how the ./scripts/https-deploy.sh script we used above works.

Let’s start by making sure our project works without https, then add a https access in a separate container.

In our starterkit project, you can run:

./scripts/deploy.sh

At the end of that scripts, you will see something like:

If all went well you can now access your site at:

 => http://0.0.0.0:32780/user/reset/...

Docker is serving our application using a random non-secure port, in this case 32780, and mapping it to port 80 on our container.

If you use Docker Compose for local development, you might have several applications running at the same time on different host ports, all mapped to port 80 on their respective container. At the end of this article you should be able to see each of them on port 443, something like:

The secret to all your local projects sharing port 443 is a reverse proxy container which receives requests to port 443, and indeed port 80 also, and acts as a sort of traffic cop to direct traffic the appropriate container.

That is why your individual projects should not directly use ports 80 and/or 443.

Adding an Nginx proxy container in front of your project’s container

An oft-seen approach to making your project available locally via HTTPS is to fiddle with your Dockerfile, installing openssl, setting up the certificate there; and rebuilding your container. This can work, but I would argue that it has significant drawbacks:

  • If you have several projects running on https port 443 locally, you could only develop one at a time because you only have one 443 port on your host machine.
  • You would need to maintain the SSL portion of your code for each of your projects.
  • It would go against the principle of separation of concerns which makes containers so robust.
  • You would be reinventing the wheel: there’s already a well-maintained Nginx proxy image which does exactly what you want.
  • Your job as a software developer is not to set up SSL.
  • If you decide to deploy your project to production Kubernetes cluster, it would longer makes sense for each of your Apache containers to support SSL.

For all those reasons, we will loosely couple our project with the act of serving it via HTTPS; we’ll leave our project alone and place an Nginx proxy in front of it to deal with the SSL/HTTPS portion of our local deployment.

Local https for one or more running projects

In this example we set up only one starterkit application, but real-world developers often need HTTPS with more than one application. Because you only have one local 443 port for HTTPS, We need a way to differentiate between our running applications.

Our approach will be for each of our projects to have an assigned local domain. This is why the https script we used in our example asked you to choose a domain like starterkit-drupal8.local.

Our script stored this information in the .env file at the root or your project, and also made sure it resolves to localhost in your /etc/hosts file.

Launching the Nginx reverse proxy

To me the terms “proxy” and “reverse proxy” are not intuitive. I’ll try to demystify them here.

The term “proxy” means something which represents something else; that term is already widely used to denote a web client being hidden from the user. So, a server might deliver content to a proxy which then delivers it to the end user, thereby hiding the end user from the server.

In our case we want to do the reverse: the client (you) is not placing a proxy in front of it; rather the application is placing a proxy in front of it, thereby hiding the project server from the browser: the browser communicates with Nginx, and Nginx communicates with your project.

Hence, “reverse proxy”.

Our reverse proxy uses a widely used and well-maintained GitHub project. The script you used earlier in this article launched a container based on that image.

Linking the reverse proxy to our application

With our starterkit application running on a random port (something like 32780) and our nginx proxy application running on ports 80 and 443, how are the two linked?

We now need to tell our Nginx proxy that when it receives a request for domain starterkit-drupal8.local, it should display our starterkit application.

There are a few steps to this, most handled by our script:

  • Your project’s docker-compose.yml file should look something like this: it needs to contain the environment variable VIRTUAL_HOST=${VIRTUAL_HOST}. This takes the VIRTUAL_HOST environment variable that our script added to the ./.env file, and makes it available inside the container.
  • Our script assumes that your project contains a ./scripts/deploy.sh file, which deploys our project to a random, non-secure port.
  • Our script assumes that only the Nginx Proxy container is published on ports 80 and 443, so if these ports are already used by something else, you’ll get an error.
  • Our script appends VIRTUAL_HOST=starterkit-drupal8.local to the ./.env file.
  • Our script attempts to add 127.0.0.1 starterkit-drupal8.local to our /etc/hosts file, which might require a password.
  • Our script finds the network your project is running on locally (all Docker-compose projects run on their own local named network), and gives the reverse proxy accesss to it.

That’s it!

You should now be able to access your project locally with https://starterkit-drupal8.local (port 443) and http://starterkit-drupal8.local (port 80), and apply this technique to any number of Docker Compose projects.

Troubleshooting: if you get a connection error, try using an incongnito (private) window in your browser, or a different browser; also note that you need to explicitly trust the certificate.

You can copy paste the script to your Docker Compose project at ./scripts/https-deploy.sh if:

  • Your ./docker-compose.yml contains the environment variable VIRTUAL_HOST=${VIRTUAL_HOST};
  • You have a script, ./scripts/deploy.sh, which launches a non-secure version of your application on a random port.

Happy coding!

Please enable JavaScript to view the comments powered by Disqus.

Oct 05 2018
Oct 05

October 05, 2018

I recently ran into a series of weird issues on my Acquia production environment which I traced back to some code I deployed which depended on my site being served securely using HTTPS.

Acquia Staging environments don’t use HTTPS by default and require you to install SSL certificates using a tedious manual process, which in my opinion is outdated, because competitors such as Platform.sh and Pantheon, Aegir, even Github pages support lots of automation around HTTPS using Let’s Encrypt.

Anyhow, because staging did not have HTTPS, I could not test some code I deployed, which ended up costing me an evening debugging an outage on a production environment. (Any difference between environments will eventually result in an outage.)

I found a great blog post which explains how to set up Let’s Encrypt on Acquia environments, Installing (FREE) Let’s Encrypt SSL Certificates on Acquia, by Chris at Redfin solutions, May 2, 2017. Although the process is very well documented, I made some tweaks:

  • First, I prefer using Docker-based solutions rather than install softward on my computer. So, instead of install certbot on my Mac, I opted to use the Certbot Docker Image, this has two advantages for me: first, I don’t need to install certbot on every machine I use this script on; and second, I don’t need to worry about updating certbot, as the Docker image is updated automatically. Of course, this does require that you install Docker on your machine.
  • Second, I automated everything I could. This result in this gist (a “gist” a basically a single file hosted on Github), a script which you can install locally.

Running the script

When you put the script locally on your computer (I added it to my project code), at, say ./scripts/set-up-letsencrypt-acquia-stage.sh, and run it:

  • the first time you run it, it will tell you where to put your environment information (in ./acquia-stage-letsencrypt-environments/environment-my-acquia-project-one.source, ./acquia-stage-letsencrypt-environments/environment-my-acquia-project-two.source, etc.), and what to put in those files.
  • the next time you run it, it will automate what it can and tell you exactly what you need to do manually.

I tried this and it works for creating new certs, and should work for renewals as well!

Please enable JavaScript to view the comments powered by Disqus.

Apr 07 2018
Apr 07

April 07, 2018

The process documented process for setting up a local environment and running tests locally is, in my opinion, so complex that it can be a barrier to even determined developers.

For those wishing to locally test and develop core patches, I think it is possible to automate the process down to a few steps and few minutes; here is an example with a core issue, #2273889 Don’t use one language’s plural index formula with another language’s string in the case of untranslated strings using format_plural(), which, at the time of this writing, results in the number 0 being displayed as 1 in certain cases.

Is it possible to start useful local development on this within 10 minutes on a computer with nothing installed except Docker? Let’s try…

Step 1: install Docker

Install and launch Docker. Everything we need, Apache web server, MySql server, Drush, Drupal, will reside on Docker containers, so we won’t need to install anything locally except Docker.

Step 2: launch a dev environment

I have create a project hosted on GitHub which will help you set up everything you need in Docker contains without local dependencies other than Docker, or any manual steps. Set it up by running:

git clone https://github.com/dcycle/drupal8_core_dev_helper.git && \
  cd drupal8_core_dev_helper && \
  ./scripts/deploy.sh`

This will create everything you need: a webserver container and database container, and your Drupal core code which will be placed in ./drupal8_core_dev_helper/drupal; near the end of the output of ./scripts/deploy.sh, you will see a login link to your development environment. Confirm you can access that local development environment at an address like http://0.0.0.0:SOME-PORT. (The port is random.)

The first time you run this, it will have to download Docker images with Drupal, MySQL, and install everything you need for local development. Future runs will be a lot faster.

See the project’s README for more details.

In your dev environment, you can confirm that the problem exists (provided the issue has not yet been fixed) by following the instructions in the “To reproduce this problem:” section of the issue description on your local development environment.

Any calls to drush can be run on the Docker container like so:

docker-compose exec drupal /bin/bash -c 'drush ...'

For example:

docker-compose exec drupal /bin/bash -c 'drush en locale language -y'

If you want to run drush directly, you can connect to your container like so:

docker-compose exec drupal /bin/bash

This will result in the following prompt on the container:

[email protected]:/var/www/html#

Now you can run drush commands directly on the container:

drush eval "print_r(\Drupal::translation()->formatPlural(0, '1 whatever', '@count whatevers', array(), array('langcode' => 'fr')) . PHP_EOL);"

Because the drupal8_core_dev_helper project also pre-installs devel on your environment, you can also confirm the problem exists by visiting /devel/php and executing:

dpm((string) (\Drupal::translation()->formatPlural(0, '1 whatever', '@count whatevers', array(), array('langcode' => 'fr'))));

Whether you do this by Drush or /devel/php, the result should be the same if the issue has not been resolved: 1 whatever instead of 0 whatevers.

Step 3: get a local version of the patch and apply it

In this example, we’ll look at the patch in comment #32 of our formatPlural issue, referenced above. If the issue has been resolved since this blog post has been written, follow along with another patch.

cd drupal8_core_dev_helper
curl https://www.drupal.org/files/issues/2018-04-07/2273889-31-core-8.5.x-plural-index-no-test.patch -O
cd ./drupal && patch -p1 < ../2273889-31-core-8.5.x-plural-index-no-test.patch

You have now patched your local version of Drupal. You can try the “0 whatevers” test again and the bug should be fixed.

Running tests

Now the real fun begins… and the “fast-track” ends.

For any patch to be considered for inclusion in Drupal core, it will need to (a) not break existing tests; and (b) provide a test which, without the patch, confirms that the problem exists.

Let’s head back to comment #32 of issue #2273889 and see if our patch is breaking anything. Clicking on “PHP 7 & MySQL 5.5 23,209 pass, 17 fail” will bring us to the test results page, which at first glance seems indecipherable. You’ll notice that our seemingly simple change to the PluralTranslatableMarkup.php file is causing a number of tests to fail: HelpEmptyPageTest, EntityTypeTest…

Let’s start by finding the test which is most likely to be directly related to our change by searching on the test results page for the string “PluralTranslatableMarkupTest” (this is name of the class we changed, with the word Test appended), which shows that it is failing:

Testing Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest
.E

We need to figure out where that file resides, by typing:

cd /path/to/drupal8_core_dev_helper/drupal/core
find . -name 'PluralTranslatableMarkupTest.php'

This tells us it is at ./tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php.

Because we have a predictable Docker container, we can relatively easily run this test locally:

cd /path/to/drupal8_core_dev_helper
docker-compose exec drupal /bin/bash -c 'cd core && \
  ../vendor/bin/phpunit \
  ./tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php'

You should now see the test results for only PluralTranslatableMarkupTest:

PHPUnit 6.5.7 by Sebastian Bergmann and contributors.

Testing Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest
.E                                                                  2 / 2 (100%)

Time: 16.48 seconds, Memory: 6.00MB

There was 1 error:

1) Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest::testPluralTranslatableMarkupSerialization with data set #1 (2, 'plural 2')
Error: Call to undefined method Mock_TranslationInterface_4be32af3::getStringTranslation()

/var/www/html/core/lib/Drupal/Core/StringTranslation/PluralTranslatableMarkup.php:150
/var/www/html/core/lib/Drupal/Core/StringTranslation/PluralTranslatableMarkup.php:121
/var/www/html/core/tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php:31

ERRORS!
Tests: 2, Assertions: 1, Errors: 1.

How to fix this, indeed whether this will be fixed, is a whole nother story, a story fraught with dependency injection, mock objects, method stubs… More an adventure, really, than a story. An adventure which deserves to be told, just not right now.

Please enable JavaScript to view the comments powered by Disqus.

Jan 24 2018
Jan 24

January 24, 2018

Here are a few things I learned about caching for REST resources.

There are probably better ways to accomplish this, but here is what works for me.

Let’s say we have a REST resource that looks something like this in .../my_module/src/Plugin/rest/resource/MyRestResource.php and we have enabled it using the Rest UI module and given anonymous users permission to view it:

<?php

namespace Drupal\my_module\Plugin\rest\resource;

use Drupal\rest\ResourceResponse;

/**
 * This is just an example.
 *
 * @RestResource(
 *   id = "this_is_just_an_example",
 *   label = @Translation("Display the title of node 1"),
 *   uri_paths = {
 *     "canonical" = "/api/v1/get"
 *   }
 * )
 */
class MyRestResource extends ResourceBase {

  /**
   * {@inheritdoc}
   */
  public function get() {
    $node = node_load(1);
    $response = new ResourceResponse(
      [
        'title' => $node->getTitle(),
        'time' => time(),
      ]
    );
    return $response;
  }

}

Now, we can visit http://example.localhost/api/v1/get?_format=json and we will see something like:

{"title":"Some Title","time":1516803204}

Reloading the page, ‘time’ stays the same. That means caching is working; we are not re-computing our Json output each time someone requests it.

How to invalidate the cache when the title changes.

If we edit node 1 and change its title to, say, “Another title”, and reload http://example.localhost/api/v1/get?_format=json, we’ll see the old title. To make sure the cache is invalidated when this happens, we need to provide cacheability metadata to our response telling it when it needs to be recomputed.

Our node, when it’s loaded, contains within it all the caching metadata needed to describe when it should be recomputed: when the title changes, when new filters are added to the text format that’s being used, etc. We can add this information to our ResourceResponse like this:

...
$response->addCacheableDependency($node);
return $response;
...

When we clear our cache with drush cr and reload our page, we’ll see something like:

{"title":"Another title","time":1516804411}

Even more fun is changing the title of node 1 and reloading our Json page, and seeing the title and time change without clearing the cache:

{"title":"Yet another title","time":1516804481}

How to set custom cache invalidation events

Let’s say you want to trigger a cache rebuild for some reason other than those defined by the node itself (title change, etc.).

A real-world example might be events: an “upcoming events” page should only display events which start later than now. If we invalidate the cache every day, then we’ll never show yesterday’s events in our events feed. Here, we need to add our custom cache invalidation event, in this case “rebuild events feed”.

For the purpose of this demo, we won’t actually build an events feed, but we’ll see how cron might be able to trigger cache invalidation.

Let’s add the following code to our response:

...
use Drupal\Core\Cache\CacheableMetadata;
...
$response->addCacheableDependency($node);
$response->addCacheableDependency(CacheableMetadata::createFromRenderArray([
  '#cache' => [
    'tags' => [
      'rebuild-events-feed',
    ],
  ],
]));
return $response;
...

This uses Drupal’s cache tags concept and tells Drupal that when the cache tag ‘rebuild-events-feed’ is invalidated, all cacheable responses which have that cache tag should be invalidated as well. I prefer this to the ‘max-age’ cache tag because it allows us more fine-grained control over when to invalidate our caches.

On cron, we could only invalidate ‘rebuild-events-feed’ if events have passed since our last invalidation of that tag, for example.

For this example, we’ll just invalidate it manually. Clear your cache to begin using the new code (drush cr), then load the page, you will see something like:

{"hello":"Yet another title","time":1516805677}

As always, the time remains the same no matter how many times you reload the page.

Let’s say you are in the midst of a cron run and you have determined that you need to invalidate your cache for response which have the cache tag ‘rebuild-events-feed’, you can run:

\Drupal::service('cache_tags.invalidator')->invalidateTags(['rebuild-events-feed'])

Let’s do it in Drush to see it in action:

drush ev "\Drupal::service('cache_tags.invalidator')->\
  invalidateTags(['rebuild-events-feed'])"

We’ve just invalidated our ‘rebuild-events-feed’ tag and, hence, Responses that use it.

This one is beyond my competence level, but I wanted to mention it anyway.

Let’s say you want to output your node’s URL to Json, you might consider computing it using $node->toUrl()->toString(). This will give us “/node/1”.

Let’s add it to our code:

...
'title' => $node->getTitle(),
'url' => $node->toUrl()->toString(),
'time' => time(),
...

This results in a very ugly error which completely breaks your site (at least at the time of this writing): “The controller result claims to be providing relevant cache metadata, but leaked metadata was detected. Please ensure you are not rendering content too early.”.

The problem, it seems, is that Drupal detects that the URL object, like the node we saw earlier, contains its own internal information which tells it when its cache should be invalidated. Converting it to a string prevents the Response from being informed about that information somehow (again, if someone can explain this better than me, please leave a comment), so an exception is thrown.

The ‘toString()’ function has an optional parameter, “$collect_bubbleable_metadata”, which can be used to get not just a string, but also information about when its cache should be invalidated. In Drush, this will look like something like:

drush ev 'print_r(node_load(1)->toUrl()->toString(TRUE))'
Drupal\Core\GeneratedUrl Object
(
    [generatedUrl:protected] => /node/1
    [cacheContexts:protected] => Array
        (
        )

    [cacheTags:protected] => Array
        (
        )

    [cacheMaxAge:protected] => -1
    [attachments:protected] => Array
        (
        )

)

This changes the return type of toString(), though: toString() no longer returns a string but a GeneratedUrl, so this won’t work:

...
'title' => $node->getTitle(),
'url' => $node->toUrl()->toString(TRUE),
'time' => time(),
...

It gives us the error “Could not normalize object of type Drupal\Core\GeneratedUrl, no supporting normalizer found”.

ohthehugemanatee commented on Drupal.org on how to fix this. Integrating his suggestion, our code now looks like:

...
$url = $node->toUrl()->toString(TRUE);
$response = new ResourceResponse(
  [
    'title' => $node->getTitle(),
    'url' => $url->getGeneratedUrl(),
    'time' => time(),
  ]
);
$response->addCacheableDependency($node);
$response->addCacheableDependency($url);
...

This will now work as expected.

With all the fun we’re having, though, let’s take this a step further, let’s say we want to export the feed of frontpage items in our Response:

$url = $node->toUrl()->toString(TRUE);
$view = \Drupal\views\Views::getView("frontpage"); 
$view->setDisplay("feed_1");
$view_render_array = $view->render();
$rendered_view = render($view_render_array);

$response = new ResourceResponse(
  [
    'title' => $node->getTitle(),
    'url' => $url->getGeneratedUrl(),
    'view' => $rendered_view,
    'time' => time(),
  ]
);
$response->addCacheableDependency($node);
$response->addCacheableDependency($url);
$response->addCacheableDependency(CacheableMetadata::createFromRenderArray($view_render_array));

You will not be surpised to see the “leaked metadata was detected” error again… In fact you have come to love and expect this error at this point.

Here is where I’m completely out of my league; according to Crell, “[i]f you [use render() yourself], you’re wrong and you should fix your code “, but I’m not sure how to get a rendered view without using render() myself… I’ve implemented a variation on a comment on Drupal.org by mikejw suggesting using different render context to prevent Drupal from complaining.

$view_render_array = NULL;
$rendered_view = NULL;
\Drupal::service('renderer')->executeInRenderContext(new RenderContext(), function () use ($view, &$view_render_array, &$rendered_view) {
  $view_render_array = $view->render();
  $rendered_view = render($view_render_array);
});

If we check to make sure we have this line in our code:

$response->addCacheableDependency(CacheableMetadata::createFromRenderArray($view_render_array));

we’re telling our Response’s cache to invalidate whenever our view’s cache invaliates. So, for example, if we have several nodes promoted to the front page in our view, we can modify any one of them and our entire Response’s cache will be invalidated and rebuilt.

Resources and further reading

Please enable JavaScript to view the comments powered by Disqus.

Dec 18 2017
Dec 18

December 18, 2017

I recently needed to port hundreds of Drupal 7 webforms with thousands of submissions from Drupal 7 to Drupal 8.

My requirements were:

  • Node ids need to remain the same
  • Webforms need to be treated as data: they should be ignored by config export and import, just like nodes and taxonomy terms are. The reasonining is that in my setup, forms are managed by site editors, not developers. (This is not related to migration per se, but was a success criteria for my migration so I’ll document my solution here)

Migration from Drupal 7

I could not find a reliable upgrade or migration path from Drupal 7 to Drupal 8. I found webform_migrate lacks documentation (I don’t know where to start) and migrate_webform is meant for Drupal 6, not Drupal 7 as a source.

I settled on a my own combination of tools and workflows to perform the migration, all of them available on my Github account.

Using version 8.x-5.x of webform, I started by enabling webform, webform_node and webform_ui on my Drupal 8 site, this gives me an empty webform node type.

I then followed the instructions for a basic migration, which is outside the scope of this article. I have a project on Github which I use as starting point from my Drpual 6 and 7 to 8 migrations. The blog post Custom Drupal-to-Drupal Migrations with Migrate Tools, Drupalize.me, April 26, 2016 by William Hetherington provides more information on performing a basic migration of data.

Once you have set up your migration configurations as per those instructions, you should be able to run:

drush migrate-import upgrade_d7_node_webform --execute-dependencies

And you should see something like:

Processed 25 items (25 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_node_type'
Processed 11 items (11 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user_role'
Processed 0 items (0 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user_role'
Processed 95 items (95 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user'
Processed 109 items (109 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_node_webform'

At this point I had all my webforms as nodes with the same node ids on Drupal 7 and Drupal 8, however this does nothing to import the actual forms or submissions.

Importing the data itself

I found that the most efficient way of importing the data was to create my own Drupal 8 module, which I have published on Dcycle’s Github account, called webform_d7_to_d8. (I have decided against publishing this on Drupal.org because I don’t plan on maintaining it long-term, and I don’t have the resources to combine efforts with existing webform migration modules.)

I did my best to make that module self-explanatory, so you should be able to follow the steps the README file, which I will summarize here:

Start by giving your Drupal 8 site access to your Drupal 7 database in ./sites/default/settings.php:

$databases['upgrade']['default'] = array (
  'database' => 'drupal7database',
  'username' => 'drupal7user',
  'password' => 'drupal7password',
  'prefix' => '',
  'host' => 'drupal7host',
  'port' => '3306',
  'namespace' => 'Drupal\\Core\\Database\\Driver\\mysql',
  'driver' => 'mysql',
);

Run the migration with our without options:

drush ev 'webform_d7_to_d8()'

or

drush ev 'webform_d7_to_d8(["nid" => 123])'

or

drush ev 'webform_d7_to_d8(["simulate" => TRUE])'

More detailed information can be found in the module’s README file.

Treating webforms as data

Once you have imported your webforms to Drupal 8, they are treated as configuration, that is, the Webform module assumes that developers, not site builders, will be creating the forms. This may be fine in many cases, however my usecase is that site editors want to create and edit forms directly on the production site, and we don’t want them to be tracked by the configuration management system.

Jacob Rockowitz pointed me in the right direction for making sure webforms are not treated as configuration. For that purpose I am using Drush CMI tools by Previous Next and documented on their blog post, Introducing Drush CMI tools, 24 Aug. 2016.

Once you install Drush CMI tools in your ~/.drush folder and run drush cc drush, you can use druch cexy and druch cimy instead of drush cim and drush cex in your conguration management process. Here is how and why:

Normally, if you develop your site locally and, say, add a content type or field, or remove a content type of field, you can run drush cex to export your newly created configuration. Then, your colleagues can pull your code and run drush cim to pull your configuration. drush cim can also be used in continuous integration, preproduction, dev, and production environments.

The problem is that drush cex exports all configuration, and drush cim deletes everything in the database which is not in configuration. In our case, we don’t want to consider webforms as configuration but as data, just as nodes as taxonomy terms: we don’t want them to be exported along with other configuration; and if they exist on a target environment we want to leave them as they are.

Using Drush CMI tools, you can add a file such as the following to ~/.drush/config-ignore.yml:

# See http://blog.dcycle.com/blog/2017-12-18
ignore:
  - webform.webform.*

This has to be done on all developers’ machines or, if you use Docker, on a shared Docker container (which is outside the scope of this article).

Now, for exporting configuration, run:

drush cexy --destination='/path/to/config/folder'

Now, webforms will not be exported along with other configuration.

We also need to avoid erasing webforms on target environments: if you create a webform on a target environment, then run drush cim, you will see something like:

webform.webform.webform_9521   delete
webform.webform.webform_8996   delete
webform.webform.webform_8991   delete
webform.webform.webform_8986   delete

So, we need to avoid deleting webforms on the target environment when we import configuration. We could just do drush cim --partial but this avoids deleting everything, not just webforms.

Drush CMI tools provides an alternative:

drush cimy --source=/path/to/config/folder

This works much like drush cim --partial, but it allows you to specify another parameter, –delete-list=/path/to/config-delete.yml

Then, in config-delete.yml, you can specify items that you actually want to delete on the target environment, for example content types, fields, and views which do not exist in code. This is dependent on your workflow and they way to set it up isdocumented on the Drush CMI tools project homepage.

With this in place, we’ll have our Drupal 7 webforms on our Drupal 8 site.

Please enable JavaScript to view the comments powered by Disqus.

Oct 03 2017
Oct 03

October 03, 2017

This article is about serving your Drupal Docker container, and/or any other container, via https with a valid Let’s encrypt SSL certificate.

Edit: if you’re having trouble with Docker-Compose, read this follow-up post.

Step one: make sure you have a public VM

To follow along, create a new virtual machine (VM) with Docker, for example using the “Docker” distribution in the “One-click apps” section of Digital Ocean.

This will not work on localhost, because in order to use Let’s Encrypt, you need to demonstrate ownership over your domain(s) to the outside world.

In this tutorial we will serve two different sites, one simple HTML site and one Drupal site, each using standard ports, on the same Docker host, using a reverse proxy, a container which sits in front of your other containers and directs traffic.

Step two: Set up two domains or subdomains you own and point them to your server

Start by making sure you have two domains which point to your server, in this example we’ll use:

  • test-one.example.com will be a simple HTML site.
  • test-two.example.com will be a Drupal site.

Step three: create your sites

We do not want to map our containers’ ports directly to our host ports using -p 80:80 -p 443:443 because we will have more than one app using the same port (the secure 443). Port mapping will be the responsibility of the reverse proxy (more on that later). Replace example.com with your own domain:

DOMAIN=example.com
docker run -d \
  -e "VIRTUAL_HOST=test-one.$DOMAIN" \
  -e "LETSENCRYPT_HOST=test-one.$DOMAIN" \
  -e "[email protected]$DOMAIN" \
  --expose 80 --name test-one \
  httpd
docker run -d \
  -e "VIRTUAL_HOST=test-two.$DOMAIN" \
  -e "LETSENCRYPT_HOST=test-two.$DOMAIN" \
  -e "[email protected]$DOMAIN" \
  --expose 80 --name test-two \
  drupal

Now you have two running sites, but they’re not yet accessible to the outside world.

Step three: a reverse proxy and Let’s encrypt

The term “proxy” means something which represents something else. In our case we want to have a webserver container which represents our Drupal and html containers. The Drupal and html containers are effectively hidden in front of a proxy. Why “reverse”? The term “proxy” is already used and means that the web user is hidden from the server. If it is the web servers that are hidden (in this case Drupal or the html containers), we use the term “reverse proxy”.

Let’s encrypt is a free certificate authority which certifies that you are the owner of your domain.

We will use nginx-proxy as our reverse proxy. Because that does not take care of certificates, we will use LetsEncrypt companion container for nginx-proxy to set up and maintain Let’s Encrypt certificates.

Let’s start by creating an empty directory which will contain our certificates:

mkdir "$HOME"/certs

Now, following the instructions of the LetsEncrypt companion project, we can set up our reverse proxy:

docker run -d -p 80:80 -p 443:443 \
  --name nginx-proxy \
  -v "$HOME"/certs:/etc/nginx/certs:ro \
  -v /etc/nginx/vhost.d \
  -v /usr/share/nginx/html \
  -v /var/run/docker.sock:/tmp/docker.sock:ro \
  --label com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy \
  --restart=always \
  jwilder/nginx-proxy

And, finally, start the LetEncrypt companion:

docker run -d \
  --name nginx-letsencrypt \
  -v "$HOME"/certs:/etc/nginx/certs:rw \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  --volumes-from nginx-proxy \
  --restart=always \
  jrcs/letsencrypt-nginx-proxy-companion

Wait a few minutes for "$HOME"/certs to be populated with your certificate files, and you should now be able to access your sites:

A note about renewals

Let’s Encrypt certificates last 3 months, so we generally want to renew every two months. LetsEncrypt companion container for nginx-proxy states that it automatically renews certificates which are set to expire in less than a month, and it checks this hourly, although there are some renewal-related issues in the issue queue.

It seems to also be possible to force renewals by running:

docker exec nginx-letsencrypt /app/force_renew

So it might be worth considering to be on the lookout for failed renewals and force them if necessary.

Edit: domain-specific configurations

I used this technique to create a Docker registry, and make it accessible securely:

docker run \
  --entrypoint htpasswd \
  registry:2 -Bbn username password > auth/htpasswd

docker run -d --expose 5000 \
  -e "VIRTUAL_HOST=mydomain.example.com" \
  -e "LETSENCRYPT_HOST=mydomain.example.com" \
  -e "[email protected]" \
  -e "REGISTRY_AUTH=htpasswd" \
  -e "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm" \
  -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \ 
  --restart=always -v "$PWD"/auth:/auth \
  --name registry registry:2

But when trying to push an image, I was getting “413 Request Entity Too Large”. This is an error with the nginx-proxy, not the Docker registry. To fix this, you can set domain-specific configurations, in this example we are allowing a maximum of 600M to be passed but only to the Docker registry at mydomain.example.com:

docker exec nginx-proxy /bin/bash -c 'cp /etc/nginx/vhost.d/default /etc/nginx/vhost.d/mydomain.example.com'
docker exec nginx-proxy /bin/bash -c 'echo "client_max_body_size 600M;" >> /etc/nginx/vhost.d/mydomain.example.com'
docker restart nginx-proxy

Enjoy!

You can now bask in the knowledge that your cooking blog will not be man-in-the-middled.

Please enable JavaScript to view the comments powered by Disqus.

Feb 28 2017
Feb 28

February 28, 2017

As the maintainer of Realistic Dummy Content, having procrastinated long and hard before releasing a Drupal 8 version, I decided to leave my (admittedly inelegant) logic intact and abstract away the Drupal 7 code, with the goal of plugging in Drupal 7 or 8 code at runtime.

Example original Drupal 7 code

// Some logic.
$updated_file = file_save($drupal_file);
// More logic.

Example updated code

Here is a simplified example of how the updated code might look:

// Some logic.
$updated_file = Framework::instance()->fileSave($drupal_file);
// More logic.

abstract class Framework {

  static function instance() {
    if (!$this->instance) {
      if (defined('VERSION')) {
        $this->instance = new Drupal7();
      }
      else {
        $this->instance = new Drupal8();
      }
    }
    return $this->instance;
  }

  abstract function fileSave($drupal_file);

}

class Drupal8 extends Framework {
  public function fileSave($drupal_file) {
    $drupal_file->save();
    return $drupal_file;
  }
}

class Drupal7 extends Framework {
  public function fileSave($drupal_file) {
    return file_save($drupal_file);
  }
}

Once I have defined fileSave(), I can simply replace every instance of file_save() in my legacy code with Framework::instance()->fileSave().

In theory, I can then identify all Drupal 7 code my module and abstract it away.

Automated testing

As long as I surgically replace Drupal 7 code such as file_save() with “universal” code such Framework::instance()->fileSave(), without doing anything else, without giving in the impulse of “improving” the code, I can theoretically only test Framework::instance()->fileSave() itself on Drupal 7 and Drupal 8, and as long as both versions are the same, my underlying code should work. My approach to automated tests is: if it works and you’re not changing it, there is no need to test it.

Still, I want to make sure my framework-specific code works as expected. To set up my testing environment, I have used Docker-compose to set up three containers: Drupal 7, Drupal 8; and MySQL. I then have a script which builds the sites, installs my module on each, then run a selftest() function which can test the abstracted function such as fileSave() and make sure they work.

This can then be run on a continuous integration platform such as Circle CI which generates a cool badge:

CircleCI

Extending to Backdrop

Once your module is structured in this way, it is relatively easy to add new related frameworks, and I’m much more comfortable releasing a Drupal 9 update in 2021 (or whenever it’s ready).

I have included experimental Backdrop code in Realistic Dummy Content to prove the point. Backdrop is a fork of Drupal 7.

abstract class Framework {
  static function instance() {
    if (!$this->instance) {
      if (defined('BACKDROP_BOOTSTRAP_SESSION')) {
        $this->instance = new Backdrop();
      }
      elseif (defined('VERSION')) {
        $this->instance = new Drupal7();
      }
      else {
        $this->instance = new Drupal8();
      }
    }
    return $this->instance;
  }
}

// Most of Backdrop's API is identical to D7, so we can only override
// what differs, such as fileSave().
class Backdrop extends Drupal7 {
  public function fileSave($drupal_file) {
    file_save($drupal_file);
    // Unlike Drupal 7, Backdrop returns a result code, not the file itself,
    // in file_save(). We are expecting the file object.
    return $drupal_file;
  }
}

Disadvantages of this approach

Having just released Realisic Dummy Content 7.x-2.0-beta1 and 8.x-2.0-beta1 (which are identical), I can safely say that this approach was a lot more time-consuming than I initially thought.

Drupal 7 class autoloading is incompatible with Drupal 8 autoloading. In Drupal 7, classes cannot (to my knowledge) use namespaces, and must be added to the .info file, like this:

files[] = includes/MyClass.php

Once that is done, you can define MyClass in includes/MyClass.php, then use MyClass anywhere you want in your code.

Drupal 8 uses PSR-4 autoloading with namespaces, so I decided to create my own autoloader to use the same system in Drupal 7, something like:

spl_autoload_register(function ($class_name) {
  if (defined('VERSION')) {
    // We are in Drupal 7.
    $parts = explode('\\', $class_name);
    // Remove "Drupal" from the beginning of the class name.
    array_shift($parts);
    $module = array_shift($parts);
    $path = 'src/' . implode('/', $parts);
    if ($module == 'MY_MODULE_NAME') {
      module_load_include('php', $module, $path);
    }
  }
});

Hooks have different signatures in Drupal 7 and 8; in my case I was lucky and the only hook I need for Drupal 7 and 8 is hook_entity_presave() which has a similar signature and can be abstracted.

Deeply-nested associative arrays are a staple of Drupal 7, so a lot of legacy code expects this type of data. Shoehorning Drupal 8 to output something like Drupal 7’s field_info_fields(), for example, was a painful experience:

public function fieldInfoFields() {
  $return = array();
  $field_map = \Drupal::entityManager()->getFieldMap();
  foreach ($field_map as $entity_type => $fields) {
    foreach ($fields as $field => $field_info) {
      $return[$field]['entity_types'][$entity_type] = $entity_type;
      $return[$field]['field_name'] = $field;
      $return[$field]['type'] = $field_info['type'];
      $return[$field]['bundles'][$entity_type] = $field_info['bundles'];
    }
  }
  return $return;
}

Finally, making Drupal 8 work like Drupal 7 makes it hard to use Drupal 8’s advanced features such as Plugins. However, once your module is “universal”, adding Drupal 8-specific functionality might be an option.

Using this approach for website upgrades

This approach might remove a lot of the risk associated with complex site upgrades. Let’s say I have a Drupal 7 site with a few custom modules: each module can be made “universal” in this way. If automated tests are added for all subsequent development, migrating the functionality to Drupal 8 might be less painful.

A fun proof of concept, or real value?

I’ve been toying with this approach for some time, and had a good time (yes, that’s my definition of a good time!) implementing it, but it’s not for everyone or every project. If your usecase includes preserving legacy functionality without leveraging Drupal 8’s modern features, while reducing risk, it can have value though. The jury is still out on whether maintaining a single universal branch will really be more efficient than maintaining two separate branches for Realistic Dummy Content, and whether the approach can reduce risk during site upgrades of legacy custom code, which I plan to try on my next upgrade project.

Please enable JavaScript to view the comments powered by Disqus.

Oct 02 2016
Oct 02

October 02, 2016

Unless you work exclusively with Drupal developers, you might be hearing some criticism of the Drupal community, among them:

  • We are almost cult-like in our devotion to Drupal;
  • maintenance and hosting are expensive;
  • Drupal is really complicated;
  • we tend to be biased toward Drupal as a solution to any problem (the law of the instrument).

It is true that Drupal is a great solution in many cases; and I love Drupal and the Drupal community.

But we can only grow by getting off the Drupal island, and being open to objectively assess whether or not Drupal is right solution for a given use case and a given client.

“if you love something, set it free” —Unknown origin.

Case study: the Dcycle blog

I have built my entire career on Drupal, and I have been accused (with reason) several times of being biased toward Drupal; in 2016 I am making a conscious effort to be open to other technologies and assess my commitment to Drupal more objectively.

The result has been that I now tend to use Drupal for what it’s good at, data-heavy web applications with user-supplied content. However, I have integrated other technologies to my toolbox: among them node.js for real-time websocket communication, and Jekyll for sites that don’t need to be dynamic on the server-side. In fact, these technologies can be used alongside Drupal to create a great ecosystem.

My blog has looked like this for quite some time:

Very ugly design.

It seemed to be time to refresh it. My goals were:

  • Keeping the same paths and path aliases to all posts, for example blog/96/catching-watchdog-errors-your-simpletests and blog/96 and node/96 should all redirect to the same page;
  • Keep comment functionality;
  • Apply an open-source theme with minimal changes;
  • It should be easy for myself to add articles using the markdown syntax;
  • There should be a contact form.

My knee-jerk reaction would have been to build a Drupal 8 site, but looking at my requirements objectively, I realized that:

  • Comments can easily be exported to Disqus using the Disqus Migrate module;
  • For my contact form I can use formspree.io;
  • Other than the above, there is no user-generated content;
  • Upgrading my blog between major versions every few years is a problem with Drupal;
  • Security updates and hosting require a lot of resources;
  • Backups of the database and files need to be tested every so often, which also requires resources.

I eventually settled on moving this blog away from Drupal toward Jekyll, a static website generator which has the following advantages over Drupal for my use case:

  • What is actually publicly available is static HTML, ergo no security updates;
  • Because of its simplicity, testing backups is super easy;
  • My site can be hosted on GitHub using GitHub pages for free (although HTTPS is not supported yet for custom domain names Github pages now supports secure HTTPS via Let’s encrypt);
  • All content and structure is stored in my git repo, so adding a blog post is as simple as adding a file to my git repo;
  • No PHP, no MySQL, just plain HTML and CSS: my blog now feels lightning fast;
  • Existing free and open-source templates are more plentiful for Jekyll than for Drupal, and if I can’t find what I want, it is easier to convert an HTML template to Jekyll than it is to convert it to Drupal (for me anyway).
  • Jekyll offers plugins for all of my project’s needs, including the Jekyll Redirect Form gem to define several paths for a single piece of content, including a canonical URL (permalink).

In a nutshell, Jekyll works by regenerating an entirely new static website every time a change is made to underlying structured data, and putting the result in a subdirectory called _site. All content and layout is structured in the directory hierarchy, and no database is used.

Exporting content from Drupal to Jekyll

Depending on the complexity of your content, this will likely be the longest part of your migration, and will necessitate some trial and error. For the technical details of my own migration, see my blog post Migrating content from Drupal to Jekyll.

What I learned

I set out with the goal of performing the entire migration in less than a few days, and I managed to do so, all the while learning more about Jekyll. I decided to spend as little time as possible on the design, instead reusing brianmaierjr’s open-source Long Haul Jekyll theme. I estimate that I have managed to perform the migration to Jekyll in about 1/5th the time it would have taken me to migrate to Drupal 8, and I’m saving on hosting and maintenance as well. Some of my clients are interested in this approach as well, and are willing to trade an administrative backend for a large reduction in risk and cost.

So how do users enter content?

Being the only person who updates this blog, I am confortable adding my content (text and images) as files in Github, but most non-technical users will prefer a backend. A few notes on this:

  • First, I have noticed that even though it is possible for clients to modify their Drupal site, many actually do not;
  • Many editors consider the Drupal backend to be very user-unfriendly to begin with, and may be willing instead of it to accept the technical Github interface and a little training if it saves them development time.
  • I see a big future for Jekyll frontends such as Prose.io which provide a neat editing interface (including image insertion) for editors of Jekyll sites hosted on GitHub.

Conclusion

I am not advocating replacing your Drupal sites with Jekyll, but in some cases we may benefit as a community by adding tools other than the proverbial hammer to our toolbox.

Static site generators such as Jekyll are one example of this, and with the interconnected web, making use of Drupal for what it’s good at will be, in the long term, good for Drupal, our community, our clients, and ourselves as developers

Please enable JavaScript to view the comments powered by Disqus.

Sep 19 2016
Sep 19

Docker is now available natively on Mac OS in addition to Linux. Docker is also included with CoreOS which you can run on remote Virtual Machines, or locally through Vagrant.

Once you have installed Docker and Git, locally or remotely, you don't need to install anything else.

In these examples we will leverage the official Drupal and mySQL Docker images. We will use the mySQL image as is, and we will add Drush to our Drupal image.

Docker is efficient with caching: these scripts will be slow the first time you run them, but very fast thereafter.

Here are a few scripts I often use to set up quick Drupal 7 or 8 environments for module evaluation and development.

Keep in mind that using Docker for deployment to production is another topic entirely and is not covered here; also, these scripts are meant to be quick and dirty; docker-compose might be useful for more advanced usage.

Port mapping

In all cases, using -p 80, I map port 80 of Drupal to any port that happens to be available on my host, and in these examples I am using Docker for Mac OS, so my sites are available on localhost.

I use DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g') to figure out the current port of my running containers. When your containers are running, you can also just docker ps to see port mapping:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
f1bf6e7e51c9        drupal8-image       "apache2-foreground"     15 seconds ago      Up 11 seconds       0.0.0.0:32771->80/tcp   drupal8-container
...

In the above example (scroll right to see more outpu), port http://localhost:32771 will show your Drupal 8 site.

Using Docker to evaluate, patch or develop Drupal 7 modules

I can set up a quick environment to evaluate one or more Drupal 7 modules. In this example I'll evaluate Views.

mkdir ~/drupal7-modules-to-evaluate
cd ~/drupal7-modules-to-evaluate
git clone --branch 7.x-3.x https://git.drupal.org/project/views.git
# add any other modules for evaluation here.

echo 'FROM drupal:7' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal7-image .
docker run --name d7-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/sites/all/modules --name drupal7-container -p 80 --link d7-mysql-container:mysql -d drupal-image

DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 6

docker exec drupal7-container /bin/bash -c "echo 'create database drupal'|mysql -uroot -proot -hmysql"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush en views_ui -y"
# enable any other modules here. Dependencies will be downloaded
# automatically

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Note that we are linking (rather than adding) sites/all/modules as a volume, so any change we make to our local copy of views will quasi-immediately be reflected on the container, making this a good technique to develop modules or write patches to existing modules.

When you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal7-container d7-mysql-container
docker rm drupal7-container d7-mysql-container

Using Docker to evaluate, patch or develop Drupal 8 modules

Our script for Drupal 8 modules is slightly different:

  • ./modules is used on the container instead of ./sites/all/modules;
  • Our Dockerfile is based on drupal:8, not drupal:7;
  • Unlike with Drupal 7, your database is not required to exist prior to installing Drupal with Drush;
  • In my tests I need to chown /var/www/html/sites/default/files to www-data:www-data to enable Drupal to write files.

Here is an example where we are evaluating the Token module for Drupal 8:

mkdir ~/drupal8-modules-to-evaluate
cd ~/drupal8-modules-to-evaluate
git clone --branch 8.x-1.x https://git.drupal.org/project/token.git
# add any other modules for evaluation here.

echo 'FROM drupal:8' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal8-image .
docker run --name d8-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/modules --name drupal8-container -p 80 --link d8-mysql-container:mysql -d drupal8-image

DRUPALPORT=$(docker ps|grep drupal8-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 6

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal8-container /bin/bash -c "chown -R www-data:www-data /var/www/html/sites/default/files"
docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush en token -y"
# enable any other modules here.

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Again, when you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal8-container d8-mysql-container
docker rm drupal8-container d8-mysql-container

Sep 19 2016
Sep 19

September 19, 2016

Docker is now available natively on Mac OS in addition to Linux. Docker is also included with CoreOS which you can run on remote Virtual Machines, or locally through Vagrant.

Once you have installed Docker and Git, locally or remotely, you don’t need to install anything else.

In these examples we will leverage the official Drupal and mySQL Docker images. We will use the mySQL image as is, and we will add Drush to our Drupal image.

Docker is efficient with caching: these scripts will be slow the first time you run them, but very fast thereafter.

Here are a few scripts I often use to set up quick Drupal 7 or 8 environments for module evaluation and development.

Keep in mind that using Docker for deployment to production is another topic entirely and is not covered here; also, these scripts are meant to be quick and dirty; docker-compose might be useful for more advanced usage.

Port mapping

In all cases, using -p 80, I map port 80 of Drupal to any port that happens to be available on my host, and in these examples I am using Docker for Mac OS, so my sites are available on localhost.

I use DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g') to figure out the current port of my running containers. When your containers are running, you can also just docker ps to see port mapping:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
f1bf6e7e51c9        drupal8-image       "apache2-foreground"     15 seconds ago      Up 11 seconds       0.0.0.0:32771->80/tcp   drupal8-container
...

In the above example (scroll right to see more outpu), port http://localhost:32771 will show your Drupal 8 site.

Using Docker to evaluate, patch or develop Drupal 7 modules

I can set up a quick environment to evaluate one or more Drupal 7 modules. In this example I’ll evaluate Views.

mkdir ~/drupal7-modules-to-evaluate
cd ~/drupal7-modules-to-evaluate
git clone --branch 7.x-3.x https://git.drupal.org/project/views.git
# add any other modules for evaluation here.

echo 'FROM drupal:7' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal7-image .
docker run --name d7-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/sites/all/modules --name drupal7-container -p 80 --link d7-mysql-container:mysql -d drupal-image

DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 15

docker exec drupal7-container /bin/bash -c "echo 'create database drupal'|mysql -uroot -proot -hmysql"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush en views_ui -y"
# enable any other modules here. Dependencies will be downloaded
# automatically

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Note that we are linking (rather than adding) sites/all/modules as a volume, so any change we make to our local copy of views will quasi-immediately be reflected on the container, making this a good technique to develop modules or write patches to existing modules.

When you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal7-container d7-mysql-container
docker rm drupal7-container d7-mysql-container

Using Docker to evaluate, patch or develop Drupal 8 modules

Our script for Drupal 8 modules is slightly different:

  • ./modules is used on the container instead of ./sites/all/modules;
  • Our Dockerfile is based on drupal:8, not drupal:7;
  • Unlike with Drupal 7, your database is not required to exist prior to installing Drupal with Drush;
  • In my tests I need to chown /var/www/html/sites/default/files to www-data:www-data to enable Drupal to write files.

Here is an example where we are evaluating the Token module for Drupal 8:

mkdir ~/drupal8-modules-to-evaluate
cd ~/drupal8-modules-to-evaluate
git clone --branch 8.x-1.x https://git.drupal.org/project/token.git
# add any other modules for evaluation here.

echo 'FROM drupal:8' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal8-image .
docker run --name d8-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/modules --name drupal8-container -p 80 --link d8-mysql-container:mysql -d drupal8-image

DRUPALPORT=$(docker ps|grep drupal8-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 15

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal8-container /bin/bash -c "chown -R www-data:www-data /var/www/html/sites/default/files"
docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush en token -y"
# enable any other modules here.

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Again, when you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal8-container d8-mysql-container
docker rm drupal8-container d8-mysql-container

Please enable JavaScript to view the comments powered by Disqus.

Jul 06 2015
Jul 06

If you are using a site deployment module, and running simpletests against it in your continuous integration server using drush test-run, you might come across Simpletest output like this in your Jenkins console output:

Starting test MyModuleTestCase.                                         [ok]
...
WD rules: Unable to get variable some_variable, it is not           [error]
defined.
...
MyModuleTestCase 9 passes, 0 fails, 0 exceptions, and 7 debug messages  [ok]
No leftover tables to remove.                                           [status]
No temporary directories to remove.                                     [status]
Removed 1 test result.                                                  [status]
 Group  Class  Name

In the above example, the Rules module is complaining that it is misconfigured. You will probably be able to confirm this by installing a local version of your site along with rules_ui and visiting the rules admin page.

Here, it is rules which is logging a watchdog error, but it could by any module.

However, this will not necessarily cause your test to fail (see 0 fails), and more importantly, your continuous integration script will not fail either.

At first you might find it strange that your console output shows [error], but that your script is still passing. You script probably looks something like this:

set -e
drush test-run MyModuleTestCase

So: drush test-run outputs an [error] message, but is still exiting with the normal exit code of 0. How can that be?

Well, your test is doing exactly what you are asking of it: it is asserting that certain conditions are met, but you have never explicitly asked it to fail when a watchdog error is logged within the temporary testing environment. This is normal: consider a case where you want to assert that a given piece of code logs an error. In your test, you will create the necessary conditions for the error to be logged, and then you will assert that the error has in fact been logged. In this case your test will fail if the error has not been logged, but will succeed if the error has been logged. This is why the test script should not fail every time there is an error.

But in our above example, we have no way of knowing when such an error is introduced; to ensure more robust testing, let's add a teardown function to our test which asserts that no errors were logged during any of our tests. To make sure that the tests don't fail when errors are expected, we will allow for that as well.

Add the following code to your Simpletest (if you have several tests, consider creating a base test for all of them to avoid reusing code):

/**
 * {inheritdoc}
 */
function tearDown() {
  // See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
  $num_errors = $this->getNumWatchdogEntries(WATCHDOG_ERROR);
  $expected_errors = isset($this->expected_errors) ? $this->expected_errors : 0;
  $this->assertTrue($num_errors == $expected_errors, 'Expected ' . $expected_errors . ' watchdog errors and got ' . $num_errors . '.');

  parent::tearDown();
}

/**
 * Get the number of watchdog entries for a given severity or worse
 *
 * See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
 *
 * @param $severity = WATCHDOG_ERROR
 *   Severity codes are listed at https://api.drupal.org/api/drupal/includes%21bootstrap.inc/group/logging_severity_levels/7
 *   Lower numbers are worse severity messages, for example an emergency is 0, and an
 *   error is 3.
 *   Specify a threshold here, for example for the default WATCHDOG_ERROR, this function
 *   will return the number of watchdog entries which are 0, 1, 2, or 3.
 *
 * @return
 *   The number of watchdog errors logged during this test.
 */
function getNumWatchdogEntries($severity = WATCHDOG_ERROR) {
  $results = db_select('watchdog')
      ->fields(NULL, array('wid'))
      ->condition('severity', $severity, '<=')
      ->execute()
      ->fetchAll();
  return count($results);
}

Now, all your tests which have this code will fail if there are any watchdog errors in it. If you are actually expecting there to be errors, then at some point in your test you could use this code:

$this->expected_errors = 1; // for example

Jul 06 2015
Jul 06

July 06, 2015

If you are using a site deployment module, and running simpletests against it in your continuous integration server using drush test-run, you might come across Simpletest output like this in your Jenkins console output:

Starting test MyModuleTestCase.                                         [ok]
...
WD rules: Unable to get variable some_variable, it is not           [error]
defined.
...
MyModuleTestCase 9 passes, 0 fails, 0 exceptions, and 7 debug messages  [ok]
No leftover tables to remove.                                           [status]
No temporary directories to remove.                                     [status]
Removed 1 test result.                                                  [status]
 Group  Class  Name

In the above example, the Rules module is complaining that it is misconfigured. You will probably be able to confirm this by installing a local version of your site along with rules_ui and visiting the rules admin page.

Here, it is rules which is logging a watchdog error, but it could by any module.

However, this will not necessarily cause your test to fail (see 0 fails), and more importantly, your continuous integration script will not fail either.

At first you might find it strange that your console output shows [error], but that your script is still passing. You script probably looks something like this:

set -e
drush test-run MyModuleTestCase

So: drush test-run outputs an [error] message, but is still exiting with the normal exit code of 0. How can that be?

Well, your test is doing exactly what you are asking of it: it is asserting that certain conditions are met, but you have never explicitly asked it to fail when a watchdog error is logged within the temporary testing environment. This is normal: consider a case where you want to assert that a given piece of code logs an error. In your test, you will create the necessary conditions for the error to be logged, and then you will assert that the error has in fact been logged. In this case your test will fail if the error has not been logged, but will succeed if the error has been logged. This is why the test script should not fail every time there is an error.

But in our above example, we have no way of knowing when such an error is introduced; to ensure more robust testing, let’s add a teardown function to our test which asserts that no errors were logged during any of our tests. To make sure that the tests don’t fail when errors are expected, we will allow for that as well.

Add the following code to your Simpletest (if you have several tests, consider creating a base test for all of them to avoid reusing code):

/**
 * {inheritdoc}
 */
function tearDown() {
  // See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
  $num_errors = $this->getNumWatchdogEntries(WATCHDOG_ERROR);
  $expected_errors = isset($this->expected_errors) ? $this->expected_errors : 0;
  $this->assertTrue($num_errors == $expected_errors, 'Expected ' . $expected_errors . ' watchdog errors and got ' . $num_errors . '.');

  parent::tearDown();
}

/**
 * Get the number of watchdog entries for a given severity or worse
 *
 * See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
 *
 * @param $severity = WATCHDOG_ERROR
 *   Severity codes are listed at https://api.drupal.org/api/drupal/includes%21bootstrap.inc/group/logging_severity_levels/7
 *   Lower numbers are worse severity messages, for example an emergency is 0, and an
 *   error is 3.
 *   Specify a threshold here, for example for the default WATCHDOG_ERROR, this function
 *   will return the number of watchdog entries which are 0, 1, 2, or 3.
 *
 * @return
 *   The number of watchdog errors logged during this test.
 */
function getNumWatchdogEntries($severity = WATCHDOG_ERROR) {
  $results = db_select('watchdog')
      ->fields(NULL, array('wid'))
      ->condition('severity', $severity, '<=')
      ->execute()
      ->fetchAll();
  return count($results);
}

Now, all your tests which have this code will fail if there are any watchdog errors in it. If you are actually expecting there to be errors, then at some point in your test you could use this code:

$this->expected_errors = 1; // for example

Please enable JavaScript to view the comments powered by Disqus.

Jun 10 2015
Jun 10

To me, modern code must be tracked by a continuous integration server, and must have automated tests. Anything else is legacy code, even if it was rolled out this morning.

In the last year, I have adopted a policy of never modifying any legacy code, because even a one-line change can have unanticipated effects on functionality, plus there is no guarantee that you won't be re-fixing the same problem in 6 months.

This article will focus on a simple technique I use to bring legacy Drupal code under a test harness (hence transforming it into modern code), which is my first step before working on it.

Unit vs. functional testing

If you have already written automated tests for Drupal, you know about Simpletest and the concept of functional web-request tests with a temporary database: the vast majority of tests written for Drupal 7 code are based on the DrupalWebTestCase, which builds a Drupal site from scratch, often installing something like a site deployment module, using a temporary database, and then allows your test to make web requests to that interface. It's all automatic and temporary environments are destroyed when tests are done.

It's great, it really simulates how your site is used, but it has some drawbacks: first, it's a bit of a pain to set up: your continuous integration server needs to have a LAMP stack or spin up Vagrant boxes or Docker containers, you need to set up virtual hosts for your code, and most importantly, it's very time-consuming, because each test case in each test class creates a brand new Drupal site, installs your modules, and destroys the environment.

(I even had to write a module, Simpletest Turbo, to perform some caching, or else my tests were taking hours to run (at which point everyone starts ignoring them) -- but that is just a stopgap measure.)

Unit tests, on the other hand, don't require a database, don't do web requests, and are lightning fast, often running in less than a second.

This article will detail how I use unit testing on legacy code.

Typical legacy code

Typically, you will be asked to make a "small change" to a function which is often 200+ lines long, and uses global variables, performs database requests, and REST calls to external services. But I'm not judging the authors of such code -- more often than not, git blame tells me that I wrote it myself.

For the purposes of our example, let's imagine that you are asked to make change to a function which returns a "score" for the current user.

function mymodule_user_score() {
  global $user;
  $user = user_load($user->uid);
  $node = node_load($user->field_score_nid['und'][0]['value']);
  return $node->field_score['und'][0]['value'];
}

This example is not too menacing, but it's still not unit testable: the function calls the database, and uses global variables.

Now, the above function is not very elegant; our first task is to ignore our impulse to improve it. Remember: we're not going to even touch any code that's not under a test harness.

As mentioned above, we could write a subclass of DrupalWebTestCase which provisions a database, we could create a node, a user, populate it, and then run the function.

But we would rather write a unit test, which does not need externalities like the database or global variables.

But our function depends on externalities! How can we ignore them? We'll use a technique called dependency injection. There are several approaches to dependency injection; and Drupal 8 code supports it very well with PHPUnit; but we'll use a simple implementation which requires the following steps:

  • Move the code to a class method
  • Move dependencies into their own methods
  • Write a subclass replaces dependencies (not logic) with mock implementations
  • Write a test
  • Then, and only then, make the "small change" requested by the client

Let's get started!

Move the code to a class method

For dependency to work, we need to put the above code in a class, so our code will now look like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    global $user;
    $user = user_load($user->uid);
    $node = node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }
}

function mymodule_user_score() {
  $score = new MyModuleUserScore();
  return $score->mymodule_user_score();
}

That wasn't that hard, right? I like to keep each of my classes in its own file, but for simplicity's sake let's assume everything is in the same file.

Move dependencies into their own methods

There are a few dependencies in this function: global $user, user_load(), and node_load(). All of these are not available to unit tests, so we need to move them out of the function, like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    $user = $this->globalUser();
    $user = $this->user_load($user->uid);
    $node = $this->node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }

  function globalUser() {
    return global $user;
  }

  function user_load($uid) {
    return user_load($uid);
  }

  function node_load($nid) {
    return node_load($nid);
  }

}

Your dependency methods should generally only contain one line. The above code should behave in exactly the same way as the original.

Override dependencies in a subclass

Our next step will be to provide mock versions of our dependencies. The trick here is to make our mock versions return values which are expected by the main function. For example, we can surmise that our user is expected to have a field_score_nid, which is expected to contain a valid node id. We can also make similar assumptions about how our node is structured. Let's make mock responses with these assumptions:

class MyModuleUserScoreMock extends MyModuleUserScore {
  function globalUser() {
    return (object) array(
      'uid' => 123,
    );
  }

  function user_load($uid) {
    if ($uid == 123) {
      return (object) array {
        field_score_nid => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 234,
            ),
          ),
        ),
      }
    }
  }

  function node_load($nid) {
    if ($nid == 234) {
      return (object) array {
        field_score => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 3000,
            ),
          ),
        ),
      }
    }
  }

}

Notice that our return values are not meant to be complete: they only contain the minimal data expected by our function: our mock user object does not even contain a uid property! But that does not matter, because our function is not expecting it.

Write a test

It is now possible to write a unit test for our logic without requiring the database. You can copy the contents of this sample unit test to your module folder as mymodule.test, add files[] = mymodule.test to your mymodule.info, enable the simpletest modules and clear your cache.

There remains the task of actually writing the test: in your testModule() function, the following lines will do:

public function testModule() {
  // load the file or files where your classes are located. This can
  // also be done in the setUp() function.
  module_load_include('module', 'mymodule');

  $score = new MyModuleUserScoreMock();
  $this->assertTrue($score->mymodule_user_score() == 3000, 'User score function returns the expected score');
}

Run your test

All that's left now is to run your test:

php ./scripts/run-tests.sh --class mymoduleTestCase

Then add above line to your continuous integration server to make sure you're notified when someone breaks it.

Your code is now ready to be fixed

Now, when your client asks for a small or big change, you can use test-driven development to implement it. For example, let's say your client wants all scores to be multiplied by 10 (30000 should be the score when 3000 is the value in the node):

  • First, modify your unit test to make sure it fails: make the test expect 30000 instead of 3000
  • Next, change your code iteratively until your test passes.

What's next

This has been a very simple introduction to dependency injection and unit testing for legacy code: if you want to do even more, you can make your Mock subclass as complex as you wish, simulating corrupt data, nodes which don't load, and so on.

I highly recommend getting familiar with PHPUnit, which is part of Drupal 8, and which takes dependency injection to a whole new level: Juan Treminio's "Unit Testing Tutorial Part I: Introduction to PHPUnit", March 1, 2013 is the best introduction I've found.

I do not recommend doing away entirely with functional, database, and web tests, but a layered approach where most of your tests are unit tests, and you limit the use of functional tests, will allow you to keep your test runs below an acceptable duration, making them all the more useful, and increasing the overall quality of new and even legacy code.

Jun 10 2015
Jun 10

June 10, 2015

To me, modern code must be tracked by a continuous integration server, and must have automated tests. Anything else is legacy code, even if it was rolled out this morning.

In the last year, I have adopted a policy of never modifying any legacy code, because even a one-line change can have unanticipated effects on functionality, plus there is no guarantee that you won’t be re-fixing the same problem in 6 months.

This article will focus on a simple technique I use to bring legacy Drupal code under a test harness (hence transforming it into modern code), which is my first step before working on it.

Unit vs. functional testing

If you have already written automated tests for Drupal, you know about Simpletest and the concept of functional web-request tests with a temporary database: the vast majority of tests written for Drupal 7 code are based on the DrupalWebTestCase, which builds a Drupal site from scratch, often installing something like a site deployment module, using a temporary database, and then allows your test to make web requests to that interface. It’s all automatic and temporary environments are destroyed when tests are done.

It’s great, it really simulates how your site is used, but it has some drawbacks: first, it’s a bit of a pain to set up: your continuous integration server needs to have a LAMP stack or spin up Vagrant boxes or Docker containers, you need to set up virtual hosts for your code, and most importantly, it’s very time-consuming, because each test case in each test class creates a brand new Drupal site, installs your modules, and destroys the environment.

(I even had to write a module, Simpletest Turbo, to perform some caching, or else my tests were taking hours to run (at which point everyone starts ignoring them) – but that is just a stopgap measure.)

Unit tests, on the other hand, don’t require a database, don’t do web requests, and are lightning fast, often running in less than a second.

This article will detail how I use unit testing on legacy code.

Typical legacy code

Typically, you will be asked to make a “small change” to a function which is often 200+ lines long, and uses global variables, performs database requests, and REST calls to external services. But I’m not judging the authors of such code – more often than not, git blame tells me that I wrote it myself.

For the purposes of our example, let’s imagine that you are asked to make change to a function which returns a “score” for the current user.

function mymodule_user_score() {
  global $user;
  $user = user_load($user->uid);
  $node = node_load($user->field_score_nid['und'][0]['value']);
  return $node->field_score['und'][0]['value'];
}

This example is not too menacing, but it’s still not unit testable: the function calls the database, and uses global variables.

Now, the above function is not very elegant; our first task is to ignore our impulse to improve it. Remember: we’re not going to even touch any code that’s not under a test harness.

As mentioned above, we could write a subclass of DrupalWebTestCase which provisions a database, we could create a node, a user, populate it, and then run the function.

But we would rather write a unit test, which does not need externalities like the database or global variables.

But our function depends on externalities! How can we ignore them? We’ll use a technique called dependency injection. There are several approaches to dependency injection; and Drupal 8 code supports it very well with PHPUnit; but we’ll use a simple implementation which requires the following steps:

  • Move the code to a class method
  • Move dependencies into their own methods
  • Write a subclass replaces dependencies (not logic) with mock implementations
  • Write a test
  • Then, and only then, make the “small change” requested by the client

Let’s get started!

Move the code to a class method

For dependency to work, we need to put the above code in a class, so our code will now look like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    global $user;
    $user = user_load($user->uid);
    $node = node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }
}

function mymodule_user_score() {
  $score = new MyModuleUserScore();
  return $score->mymodule_user_score();
}

That wasn’t that hard, right? I like to keep each of my classes in its own file, but for simplicity’s sake let’s assume everything is in the same file.

Move dependencies into their own methods

There are a few dependencies in this function: global $user, user_load(), and node_load(). All of these are not available to unit tests, so we need to move them out of the function, like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    $user = $this->globalUser();
    $user = $this->user_load($user->uid);
    $node = $this->node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }

  function globalUser() {
    return global $user;
  }

  function user_load($uid) {
    return user_load($uid);
  }

  function node_load($nid) {
    return node_load($nid);
  }

}

Your dependency methods should generally only contain one line. The above code should behave in exactly the same way as the original.

Override dependencies in a subclass

Our next step will be to provide mock versions of our dependencies. The trick here is to make our mock versions return values which are expected by the main function. For example, we can surmise that our user is expected to have a field_score_nid, which is expected to contain a valid node id. We can also make similar assumptions about how our node is structured. Let’s make mock responses with these assumptions:

class MyModuleUserScoreMock extends MyModuleUserScore {
  function globalUser() {
    return (object) array(
      'uid' => 123,
    );
  }

  function user_load($uid) {
    if ($uid == 123) {
      return (object) array {
        field_score_nid => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 234,
            ),
          ),
        ),
      }
    }
  }

  function node_load($nid) {
    if ($nid == 234) {
      return (object) array {
        field_score => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 3000,
            ),
          ),
        ),
      }
    }
  }

}

Notice that our return values are not meant to be complete: they only contain the minimal data expected by our function: our mock user object does not even contain a uid property! But that does not matter, because our function is not expecting it.

Write a test

It is now possible to write a unit test for our logic without requiring the database. You can copy the contents of this sample unit test to your module folder as mymodule.test, add files[] = mymodule.test to your mymodule.info, enable the simpletest modules and clear your cache.

There remains the task of actually writing the test: in your testModule() function, the following lines will do:

public function testModule() {
  // load the file or files where your classes are located. This can
  // also be done in the setUp() function.
  module_load_include('module', 'mymodule');

  $score = new MyModuleUserScoreMock();
  $this->assertTrue($score->mymodule_user_score() == 3000, 'User score function returns the expected score');
}

Run your test

All that’s left now is to run your test:

php ./scripts/run-tests.sh --class mymoduleTestCase

Then add above line to your continuous integration server to make sure you’re notified when someone breaks it.

Your code is now ready to be fixed

Now, when your client asks for a small or big change, you can use test-driven development to implement it. For example, let’s say your client wants all scores to be multiplied by 10 (30000 should be the score when 3000 is the value in the node):

  • First, modify your unit test to make sure it fails: make the test expect 30000 instead of 3000
  • Next, change your code iteratively until your test passes.

What’s next

This has been a very simple introduction to dependency injection and unit testing for legacy code: if you want to do even more, you can make your Mock subclass as complex as you wish, simulating corrupt data, nodes which don’t load, and so on.

I highly recommend getting familiar with PHPUnit, which is part of Drupal 8, and which takes dependency injection to a whole new level: Juan Treminio’s “Unit Testing Tutorial Part I: Introduction to PHPUnit”, March 1, 2013 is the best introduction I’ve found.

I do not recommend doing away entirely with functional, database, and web tests, but a layered approach where most of your tests are unit tests, and you limit the use of functional tests, will allow you to keep your test runs below an acceptable duration, making them all the more useful, and increasing the overall quality of new and even legacy code.

Please enable JavaScript to view the comments powered by Disqus.

Feb 23 2015
Feb 23

Continuous integration (CI) is the practice of running a series of checks on every push of your code, to make sure it is always in a potentially deployable state; and to make sure you are alerted as soon as possible if it is not.

Continuous integration and Drupal projects

This blog post is aimed at module maintainers, and we'll look at how to use CI for modules hosted on Drupal.org. I'll use as an example a project I'm maintaining, Realistic Dummy Content.

The good news is that Drupal.org has a built-in CI service for hosted modules: to use it, project maintainers need to click on the "Automated Testing" tab of their projects, enable automated testing, and make sure some tests are defined.

Once you have enabled automated testing, every submitted patch will be applied to the code and tested, and the main branches will be tested continually as well.

If you're not sure how to write tests, you can learn by example by looking at the test code of any module which has automated testing enabled.

Limitations of the Drupal.org QA system

The system described above is great, and in this blog post we'll explore how to take it a bit further. Drupal's CI service runs your code on a new Drupal site with PHP 5.3 enabled. We know this by looking at the log for a test on Realistic Dummy content, which contains:

[13:50:02] Database backend [mysql] loaded.
...
[simpletest.db] =>
[test.php.version] => 5.3
...

For the sake of this article, let's say we want to use SQLite with php 5.5, and we also want to run checks from the coder project's coder_review module. We can't achieve this within the Drupal.org infrastructure, but it is possible using Docker, CircleCI, and GitHub. Here is how.

Step 1: get a local CoreOS+Docker environment

Let's start by setting up a local development environment on which we can run Docker. Docker is a system which uses Linux containers to run your software and all its dependencies in an isolated environment.

If you need a primer on Docker, check out Getting Started with Docker on Servers for Hackers (March 20, 2014), and A quick intro to Docker for a Drupal project.

Docker works best on CoreOS, which you can install quite easily on any computer using Vagrant and VirtualBox, as explained at Running CoreOS on Vagrant.

Step 2: Add a Dockerfile to your project

Because, in this example, we want to run tests which require changing things on the server, we'll use the Docker container management system to simulate a Ubuntu machine over which we have complete control.

To see how this works, download the latest dev version of realistic_dummy_content to your CoreOS VM, take a look at the included files ./Dockerfile and ./scripts/test.sh to see how they are structured, then run the test script:

./scripts/test.sh

Without any further configuration, you will see tests run on the desired environment: Ubuntu with the correct version of PHP, SQLite, and coder review. (You can also see the results on CircleCI on the project's CI dashbaord if you unfold the "test" section -- we'll see how to set that up for your project later on).

Setting up Docker for your own project is just a question of copy-pasting a few scripts.

Step 3: Make sure there is a mirror of your project on GitHub

Having test results on your command line is nice, but there is no reason to run them yourself. For that we use continuous integration (CI) servers, which run the tests every time someone commits something to your codebase.

Some of you might be familiar with Jenkins, which I use myself and which is great, but for open source projects, there are free CI services out there: the two I know of, CircleCI and Travis CI, synchronize with GitHub, not with Drupal.org, so you need a mirror of your project on GitHub.

Note that it is possible, using the tool HubDrop, to mirror your project on GitHub, but it's not on your account, whereas the CI tools sync only with projects on your own account. My solution has been to add a ./scripts/mirror.sh script to Realistic Dummy Content, and call it once every ten minutes via a Jenkins job on my personal Jenkins server. If you don't have access to a Jenkins server you can also use a cron job on any server to do this.

The mirror of Realistic Dummy Content on GitHub is here.

Step 4: Open a CircleCI account and link it to your GitHub account

As mentioned above, two of the CI tools out there are CircleCI and Travis CI. One of my requirements is that the CI tool integrate well with Docker, because that's my DevOps tool of choice.

As mentioned in Faster Builds with Container-Based Infrastructure and Docker (Mathias Meyer, Travis CI blog, 17 Dec. 2014), it seems that Travis CI is moving towards Docker, but it seems that its new infrastructure is based on Docker, but does not let you run your own Docker containers.

Circle CI, on the other hand, seems to provide more flexibility with regards to Docker, as explained in the article Continuous Integration and Delivery with Docker on CircleCI's website.

Although Travis is a great, widely-used tool (Drush uses it), we'll use CircleCI because I found it easier to set up with Docker.

Once you open a CircleCI account and link it to your GitHub account, you will be able to turn on CI for your mirrored project, in my case Realistic Dummy Content.

Step 5: Add a circle.yml file to your project

In order for Circle CI to know what to do with your project, it needs a circle.yml file at the root of your project. If you look at the circle.yml file at the root Realistic Dummy Content, it is actually quite simple:

machine:
  services:
    - docker

test:
  override:
    - ./scripts/test.sh

That's it! Commit your circle.yml file, and if mirroring with GitHub works correctly, Circle CI will test your build. Debug any errors you may have, and voilà!

Here is the result of a recent Realistic Dummy Content build on CircleCI: unfold the "test" section to see the complete output: PHP version, SQLite database, coder review...

Conclusion

We have seen how you can easily add Docker support to make sure the tests and checks you run on your code are in a controlled environment, with the extensions you need (one could imagine a module which requires some external system like ApacheSolr installed on the server -- Docker allows this too). This is one concrete application of DevOps: reducing the risk of glitches where "tests pass on my dev machine but not on my CI server".

Feb 23 2015
Feb 23

February 23, 2015

Continuous integration (CI) is the practice of running a series of checks on every push of your code, to make sure it is always in a potentially deployable state; and to make sure you are alerted as soon as possible if it is not.

Continuous integration and Drupal projects

This blog post is aimed at module maintainers, and we’ll look at how to use CI for modules hosted on Drupal.org. I’ll use as an example a project I’m maintaining, Realistic Dummy Content.

The good news is that Drupal.org has a built-in CI service for hosted modules: to use it, project maintainers need to click on the “Automated Testing” tab of their projects, enable automated testing, and make sure some tests are defined.

Once you have enabled automated testing, every submitted patch will be applied to the code and tested, and the main branches will be tested continually as well.

If you’re not sure how to write tests, you can learn by example by looking at the test code of any module which has automated testing enabled.

Limitations of the Drupal.org QA system

The system described above is great, and in this blog post we’ll explore how to take it a bit further. Drupal’s CI service runs your code on a new Drupal site with PHP 5.3 enabled. We know this by looking at the log for a test on Realistic Dummy content, which contains:

[13:50:02] Database backend [mysql] loaded.
...
[simpletest.db] =>
[test.php.version] => 5.3
...

For the sake of this article, let’s say we want to use SQLite with php 5.5, and we also want to run checks from the coder project’s coder_review module. We can’t achieve this within the Drupal.org infrastructure, but it is possible using Docker, CircleCI, and GitHub. Here is how.

Step 1: get a local CoreOS+Docker environment

Let’s start by setting up a local development environment on which we can run Docker. Docker is a system which uses Linux containers to run your software and all its dependencies in an isolated environment.

If you need a primer on Docker, check out Getting Started with Docker on Servers for Hackers (March 20, 2014), and A quick intro to Docker for a Drupal project.

Docker works best on CoreOS, which you can install quite easily on any computer using Vagrant and VirtualBox, as explained at Running CoreOS on Vagrant.

Step 2: Add a Dockerfile to your project

Because, in this example, we want to run tests which require changing things on the server, we’ll use the Docker container management system to simulate a Ubuntu machine over which we have complete control.

To see how this works, download the latest dev version of realistic_dummy_content to your CoreOS VM, take a look at the included files ./Dockerfile and ./scripts/test.sh to see how they are structured, then run the test script:

./scripts/test.sh

Without any further configuration, you will see tests run on the desired environment: Ubuntu with the correct version of PHP, SQLite, and coder review. (You can also see the results on CircleCI on the project’s CI dashbaord if you unfold the “test” section – we’ll see how to set that up for your project later on).

Setting up Docker for your own project is just a question of copy-pasting a few scripts.

Step 3: Make sure there is a mirror of your project on GitHub

Having test results on your command line is nice, but there is no reason to run them yourself. For that we use continuous integration (CI) servers, which run the tests every time someone commits something to your codebase.

Some of you might be familiar with Jenkins, which I use myself and which is great, but for open source projects, there are free CI services out there: the two I know of, CircleCI and Travis CI, synchronize with GitHub, not with Drupal.org, so you need a mirror of your project on GitHub.

Note that it is possible, using the tool HubDrop, to mirror your project on GitHub, but it’s not on your account, whereas the CI tools sync only with projects on your own account. My solution has been to add a ./scripts/mirror.sh script to Realistic Dummy Content, and call it once every ten minutes via a Jenkins job on my personal Jenkins server. If you don’t have access to a Jenkins server you can also use a cron job on any server to do this.

The mirror of Realistic Dummy Content on GitHub is here.

As mentioned above, two of the CI tools out there are CircleCI and Travis CI. One of my requirements is that the CI tool integrate well with Docker, because that’s my DevOps tool of choice.

As mentioned in Faster Builds with Container-Based Infrastructure and Docker (Mathias Meyer, Travis CI blog, 17 Dec. 2014), it seems that Travis CI is moving towards Docker, but it seems that its new infrastructure is based on Docker, but does not let you run your own Docker containers.

Circle CI, on the other hand, seems to provide more flexibility with regards to Docker, as explained in the article Continuous Integration and Delivery with Docker on CircleCI’s website.

Although Travis is a great, widely-used tool (Drush uses it), we’ll use CircleCI because I found it easier to set up with Docker.

Once you open a CircleCI account and link it to your GitHub account, you will be able to turn on CI for your mirrored project, in my case Realistic Dummy Content.

Step 5: Add a circle.yml file to your project

In order for Circle CI to know what to do with your project, it needs a circle.yml file at the root of your project. If you look at the circle.yml file at the root Realistic Dummy Content, it is actually quite simple:

machine:
  services:
    - docker

test:
  override:
    - ./scripts/test.sh

That’s it! Commit your circle.yml file, and if mirroring with GitHub works correctly, Circle CI will test your build. Debug any errors you may have, and voilà!

Here is the result of a recent Realistic Dummy Content build on CircleCI: unfold the “test” section to see the complete output: PHP version, SQLite database, coder review…

Conclusion

We have seen how you can easily add Docker support to make sure the tests and checks you run on your code are in a controlled environment, with the extensions you need (one could imagine a module which requires some external system like ApacheSolr installed on the server – Docker allows this too). This is one concrete application of DevOps: reducing the risk of glitches where “tests pass on my dev machine but not on my CI server”.

Please enable JavaScript to view the comments powered by Disqus.

Feb 18 2015
Feb 18

I recently added Docker support to Realistic Dummy Content, a project I maintain on Drupal.org. It is now possible (with Docker installed, preferably on a CoreOS VM) to run ./scripts/dev.sh directly from the project directory (use the latest dev version if you try this), and have a development environment, sans MAMP.

I don't consider myself an expert in Docker, virtualization, DevOps and config management, but here, nonetheless, is my experience. If I'm wrong about something, please leave a comment!

Intro: Docker and DevOps

The DevOps movement, popularized starting in about 2010, promises to include environment information along with application information in the same git repo for smoother development, testing, and production environments. For example, if your Drupal module requires version 5.4 of PHP, along with a given library, then that information should be somewhere in your Git repo. Building an environment for testing, development or production should then use that information and not be dependent on anything which is unversioned. Docker is a tool which is anchored in the DevOps movement.

DevOps: the Config management approach

The family of tools which has been around for awhile now includes Puppet, Chef, and Ansible. These tools are configuration management tools: they define environment information (PHP version should be 5.3, Apache mod_rewrite should be on, etc.) and make sure a given environment conforms to that information.

I have used Puppet, along with Vagrant, to deliver applications, including my Jenkins server hosted on GitHub.

Virtualization and containers

Using Puppet and Vagrant, you need to use Virtualization: create a Virtual Machine on your host machine.

Docker works with a different principle: instead of creating a VM on top of your host OS, Docker uses containers, so resources are shared. The article Getting Started with Docker (Servers for Hackers, 2014/03/20) contains some graphics which demonstrate how much more efficient containers are as opposed to virtualization.

Puppet and Vagrant are slow; Docker is fast

Puppet and Vagrant together work for packaging software and environment configuration, but it is excruciatingly slow: it can take several minutes to launch an environment. My reaction to this has been to cringe every time I have to do it.

Docker, on the other hand, uses caching agressively: if a server was already in a given state, Docker uses a cached version of it to move along faster. So, when building a container, Docker goes through a series of steps, and caches each step to make it lightning fast.

One example: launching a dev environment of the Jenkins Vagrant project on Mac OS takes over five minutes, but launching a dev environment of my Drupal project Realistic Dummy Content (which uses Docker), takes less than 15 seconds the first time it is run once the server code has been downloaded, and, because of caching, less than one (1) second subsequent times if no changes have been made. Less than one second to fire up a full-fledged development environment which is functionally independent from your host. That's huge to me.

Configuration management is idempotent, Docker is not

Before we move on, note that Docker is not incompatible with config management tools, but Docker does not require them. Here is why I think, in many cases, config management tools are not necessary.

The config management tools such as Puppet are idempotent: you define how an environment should be, and the tools run whatever steps are necessary to make it that way. This sounds like a good idea in theory, but it looks like this in practice. I have come to the conclusion that this is not the way I think, and it forces me to relearn how to think of my environments. I suspect that many developers have a hard time wrapping their heads around idempotence.

Docker is not idempotent; it defines a series of steps to get to a given state. If you like idempotence, one of the steps can be to run a puppet manifest; but if, like me, you think idempotence is overrated, then you don't need to use it. Here is what a Dockerfile looks like: I understood it at first glace, it doesn't require me to learn a new way of thinking.

The CoreOS project

The CoreOS project has seen the promise of Docker and containers. It is an OS which ships with Docker, Git, and a few other tools, but is designed so that everything you do happens within containers (using the included Docker, and eventually Rocket, a tool they are building). The result is that CoreOS is tiny: it takes 10 seconds to build a CoreOS instance on DigitalOcean, for example, but almost a minute to set up a CentOS instance.

Because Docker does not work on Mac OS without going through hoops, I decided to use Vagrant to set up a CoreOS VM on my Mac, which is speedy and works great.

Docker for deploying to production

We have seen that Docker can work for quickly setting up dev and testing environments. Can it be used to deploy to production? I don't see why not, especially if used with CoreOS. For an example see the blog post Building an Internal Cloud with Docker and CoreOS (Shopify, Oct. 15, 2014).

In conclusion, I am just beginning to play with Docker, and it just feels right to me. I remember working with Joomla in 2006, when I discovered Drupal, and it just felt right, and I have made a career of it since then. I am having the same feeling now discovering Docker and CoreOs.

I am looking forward to your comments explaining why I am wrong about not liking idempotence, how to make config management and virutalization faster, and how and why to integrate config management tools with Docker!

Feb 18 2015
Feb 18

February 18, 2015

I recently added Docker support to Realistic Dummy Content, a project I maintain on Drupal.org. It is now possible (with Docker installed, preferably on a CoreOS VM) to run ./scripts/dev.sh directly from the project directory (use the latest dev version if you try this), and have a development environment, sans MAMP.

I don’t consider myself an expert in Docker, virtualization, DevOps and config management, but here, nonetheless, is my experience. If I’m wrong about something, please leave a comment!

Intro: Docker and DevOps

The DevOps movement, popularized starting in about 2010, promises to include environment information along with application information in the same git repo for smoother development, testing, and production environments. For example, if your Drupal module requires version 5.4 of PHP, along with a given library, then that information should be somewhere in your Git repo. Building an environment for testing, development or production should then use that information and not be dependent on anything which is unversioned. Docker is a tool which is anchored in the DevOps movement.

DevOps: the Config management approach

The family of tools which has been around for awhile now includes Puppet, Chef, and Ansible. These tools are configuration management tools: they define environment information (PHP version should be 5.3, Apache mod_rewrite should be on, etc.) and make sure a given environment conforms to that information.

I have used Puppet, along with Vagrant, to deliver applications, including my Jenkins server hosted on GitHub.

Virtualization and containers

Using Puppet and Vagrant, you need to use Virtualization: create a Virtual Machine on your host machine.

Docker works with a different principle: instead of creating a VM on top of your host OS, Docker uses containers, so resources are shared. The article Getting Started with Docker (Servers for Hackers, 2014/03/20) contains some graphics which demonstrate how much more efficient containers are as opposed to virtualization.

Puppet and Vagrant are slow; Docker is fast

Puppet and Vagrant together work for packaging software and environment configuration, but it is excruciatingly slow: it can take several minutes to launch an environment. My reaction to this has been to cringe every time I have to do it.

Docker, on the other hand, uses caching agressively: if a server was already in a given state, Docker uses a cached version of it to move along faster. So, when building a container, Docker goes through a series of steps, and caches each step to make it lightning fast.

One example: launching a dev environment of the Jenkins Vagrant project on Mac OS takes over five minutes, but launching a dev environment of my Drupal project Realistic Dummy Content (which uses Docker), takes less than 15 seconds the first time it is run once the server code has been downloaded, and, because of caching, less than one (1) second subsequent times if no changes have been made. Less than one second to fire up a full-fledged development environment which is functionally independent from your host. That’s huge to me.

Configuration management is idempotent, Docker is not

Before we move on, note that Docker is not incompatible with config management tools, but Docker does not require them. Here is why I think, in many cases, config management tools are not necessary.

The config management tools such as Puppet are idempotent: you define how an environment should be, and the tools run whatever steps are necessary to make it that way. This sounds like a good idea in theory, but it looks like this in practice. I have come to the conclusion that this is not the way I think, and it forces me to relearn how to think of my environments. I suspect that many developers have a hard time wrapping their heads around idempotence.

Docker is not idempotent; it defines a series of steps to get to a given state. If you like idempotence, one of the steps can be to run a puppet manifest; but if, like me, you think idempotence is overrated, then you don’t need to use it. Here is what a Dockerfile looks like: I understood it at first glace, it doesn’t require me to learn a new way of thinking.

The CoreOS project

The CoreOS project has seen the promise of Docker and containers. It is an OS which ships with Docker, Git, and a few other tools, but is designed so that everything you do happens within containers (using the included Docker, and eventually Rocket, a tool they are building). The result is that CoreOS is tiny: it takes 10 seconds to build a CoreOS instance on DigitalOcean, for example, but almost a minute to set up a CentOS instance.

Because Docker does not work on Mac OS without going through hoops, I decided to use Vagrant to set up a CoreOS VM on my Mac, which is speedy and works great.

Docker for deploying to production

We have seen that Docker can work for quickly setting up dev and testing environments. Can it be used to deploy to production? I don’t see why not, especially if used with CoreOS. For an example see the blog post Building an Internal Cloud with Docker and CoreOS (Shopify, Oct. 15, 2014).

In conclusion, I am just beginning to play with Docker, and it just feels right to me. I remember working with Joomla in 2006, when I discovered Drupal, and it just felt right, and I have made a career of it since then. I am having the same feeling now discovering Docker and CoreOs.

I am looking forward to your comments explaining why I am wrong about not liking idempotence, how to make config management and virutalization faster, and how and why to integrate config management tools with Docker!

Please enable JavaScript to view the comments powered by Disqus.

Feb 09 2015
Feb 09

To get the most of this blog post, please read and understand Getting Started with Docker (Servers for Hackers, 2014/03/20). Also, all the steps outlined here have been done on a Vagrant CoreOS virtual machine (VM).

I recently needed a really simple non-production Drupal Docker image on which I could run tests. b7alt/drupal (which you can find by typing docker search drupal, or on GitHub) worked for my needs, except that it did not have the cUrl php library installed, so drush en simpletest -y was throwing an error.

Therefore, I decided to create a new Docker image which is based on b7alt/drupal, but with the php5-curl library installed.

I started by creating a new local directory (on my CoreOS VM), which I called docker-drupal:

mkdir docker-drupal

In that directory, I created Dockerfile which takes b7alt/drupal as its base, and runs apt-get install curl.

FROM b7alt/drupal

RUN apt-get update
RUN apt-get -y install curl

(You can find this code at my GitHub account at alberto56/docker-drupal.)

When you run this you will get:

docker build .
...
Successfully built 55a8c8999520

That hash is a Docker image ID, and your hash might be different. You can run it and see if it works as expected:

docker run -d 55a8c8999520
c9a98bdcab4e027e8571bde71ee92b4380247a44ef9314749ef5680864de2928

In the above, we are telling Docker to create a container based on the image we just created (55a8c8999520). The resulting container hash is displayed (yours might be different). We are using -d so that our containers runs in the background. You can see that the container is actually running by typing:

docker ps
CONTAINER ID        IMAGE               COMMAND...
c9a98bdcab4e        55a8c8999520        "/usr/bin/supervisor...

This tells you that there is a running container (c9a98bdcab4e) based on the image 55a8c8999520. Again, your hases will be different. Let's log into that container now:

docker exec -it c9a98bdcab4e bash
[email protected]:/#

To make sure that cUrl is successfully installed, I will figure out where Drupal resides on this container, and then try to enable Simpletest. If that works, I will consider my image a success, and exit from my container:

[email protected]:/# find / -name 'index.php'
/srv/drupal/www/index.php
[email protected]:/# cd /srv/drupal/www
[email protected]:/srv/drupal/www# drush en simpletest -y
The following extensions will be enabled: simpletest
Do you really want to continue? (y/n): y
simpletest was enabled successfully.                   [ok]
[email protected]:/srv/drupal/www# exit
exit

Now I know that my 55a8c8999520 image is good for now and for my purposes; I can create an account on Docker.com and push it to my account for later use:

Docker build -t alberto56/docker-drupal .
docker push alberto56/docker-drupal

Anyone can now run this Docker image by simply typing:

docker run alberto56/docker-drupal

One thing I had a hard time getting my head around was having a GitHub project and Docker project, and both are different but linked. The GitHub project is the the recipe for creating an image, whereas the Docker project is the image itself.

One we start thinking of our environments like this (as entities which should be versioned and shared), the risk of differences between environments is greatly reduced. I was used to running simpletests for my projects on an environment which is managed by hand; when I got a strange permissions error on the test environment, I decided to start using Docker and version control to manage the container where tests are run.

Feb 09 2015
Feb 09

February 09, 2015

To get the most of this blog post, please read and understand Getting Started with Docker (Servers for Hackers, 2014/03/20). Also, all the steps outlined here have been done on a Vagrant CoreOS virtual machine (VM).

I recently needed a really simple non-production Drupal Docker image on which I could run tests. b7alt/drupal (which you can find by typing docker search drupal, or on GitHub) worked for my needs, except that it did not have the cUrl php library installed, so drush en simpletest -y was throwing an error.

Therefore, I decided to create a new Docker image which is based on b7alt/drupal, but with the php5-curl library installed.

I started by creating a new local directory (on my CoreOS VM), which I called docker-drupal:

mkdir docker-drupal

In that directory, I created Dockerfile which takes b7alt/drupal as its base, and runs apt-get install curl.

FROM b7alt/drupal

RUN apt-get update
RUN apt-get -y install curl

(You can find this code at my GitHub account at alberto56/docker-drupal.)

When you run this you will get:

docker build .
...
Successfully built 55a8c8999520

That hash is a Docker image ID, and your hash might be different. You can run it and see if it works as expected:

docker run -d 55a8c8999520
c9a98bdcab4e027e8571bde71ee92b4380247a44ef9314749ef5680864de2928

In the above, we are telling Docker to create a container based on the image we just created (55a8c8999520). The resulting container hash is displayed (yours might be different). We are using -d so that our containers runs in the background. You can see that the container is actually running by typing:

docker ps
CONTAINER ID        IMAGE               COMMAND...
c9a98bdcab4e        55a8c8999520        "/usr/bin/supervisor...

This tells you that there is a running container (c9a98bdcab4e) based on the image 55a8c8999520. Again, your hases will be different. Let’s log into that container now:

docker exec -it c9a98bdcab4e bash
[email protected]:/#

To make sure that cUrl is successfully installed, I will figure out where Drupal resides on this container, and then try to enable Simpletest. If that works, I will consider my image a success, and exit from my container:

[email protected]:/# find / -name 'index.php'
/srv/drupal/www/index.php
[email protected]:/# cd /srv/drupal/www
[email protected]:/srv/drupal/www# drush en simpletest -y
The following extensions will be enabled: simpletest
Do you really want to continue? (y/n): y
simpletest was enabled successfully.                   [ok]
[email protected]:/srv/drupal/www# exit
exit

Now I know that my 55a8c8999520 image is good for now and for my purposes; I can create an account on Docker.com and push it to my account for later use:

Docker build -t alberto56/docker-drupal .
docker push alberto56/docker-drupal

Anyone can now run this Docker image by simply typing:

docker run alberto56/docker-drupal

One thing I had a hard time getting my head around was having a GitHub project and Docker project, and both are different but linked. The GitHub project is the the recipe for creating an image, whereas the Docker project is the image itself.

One we start thinking of our environments like this (as entities which should be versioned and shared), the risk of differences between environments is greatly reduced. I was used to running simpletests for my projects on an environment which is managed by hand; when I got a strange permissions error on the test environment, I decided to start using Docker and version control to manage the container where tests are run.

Please enable JavaScript to view the comments powered by Disqus.

Feb 06 2015
Feb 06

I have been using Simpletest on Drupal 7 for several years, and, used well, it can greatly enhance the quality of your code. I like to practice test-driven development: writing a failing test first, then run it multiple times, each time tweaking the code, until the test passes.

Simpletest works by spawning a completely new Drupal site (ignoring your current database), running tests, and destroying the database. Sometimes, a test will fail and you're not quite sure why. Here are two tips to help you debug why your tests are failing:

Tip #1: debug()

The Drupal debug() function can be placed anywhere in your test or your source code, and the result will appear on the test results page in the GUI.

For example, if when you are playing around with the dev version of your site, things work fine, but in the test, a specific node contains invalid data, you can add this line anywhere in your test or source code which is being called during your test:

...
debug($node);
...

This will provide formatted output of your $node variable, alongside your test results.

Tip #2: die()

Sometimes the temporary test environment's behaviour seems to make no sense. And it can be frustrating to not be able to simply log into it and play around with it, because it is destroyed after the test is over.

To understand this technique, here is quick primer on how Simpletest works:

  • In Drupal 7, running a test requires a host site and database. This is basically an installed Drupal site with Simpletest enabled, and your module somewhere in the modules directory (the module you are testing does not have to be enabled).
  • When you run a test, Simpletest creates a brand-new installation of Drupal using a special prefix simpletest123456 where 123456 is a random number. This allows Simpletest to have an isolated environment where to run tests, but on the same database and with the same credentials as the host.
  • When your test does something, like call a function, or load a page with, for example, $this->drupalGet('user'), the host environment is ignored and temporary environment (which uses the prefixed database tables) is used. In the previous example, the test loads the "user" page using a real HTTP calls. Simpletest knows to use the temporary environment because the call is made using a specially-crafted user agent.
  • When the test is over, all tables with the prefix simpletest123456 are destroyed.

If you have ever tried to run a test on a host environment which already contains a prefix, you will understand why you can get "table name too long" errors in certain cases: Simpletest is trying to add a prefix to another prefix. That's one reason to avoid prefixes when you can, but I digress.

Now you can try this: somewhere in your test code, add die(), this will kill Simpletest, leaving the temporary database intact.

Here is an example: a colleague recently was testing a feature which exported a view. In the dev environment, the view was available to users with the role manager, as was expected. However when the test logged in as a manager user and attempted to access the view, the result was an "Access denied" page.

Because we couldn't easily figure it out, I suggested adding die() to play around in the environment:

...
$this->drupalLogin($manager);
$this->drupalGet('inventory');
die();
$this->assertNoText('denied', 'A manager accessing the inventory page does not see "access denied"');
...

Now, when the test was run, we could:

  • wait for it to crash,
  • then examine our database to figure out which prefix the test was using,
  • change the database prefix in sites/default/settings.php from '' to (for example) 'simpletest73845'.
  • run drush uli to get a one-time login.

Now, it was easier to debug the source of the problem by visiting the views configuration for inventory: it turns out that features exports views with access by role using the role ID, not the role name (the role ID can be different for each environment). Simply changing the access method for the view from "by role" to "by permission" made the test pass, and prevented a potential security flaw in the code.

(Another reason to avoid "by role" access in views is that User 1 often does not have the role required, and it is often disconcerting to be user 1 and have "access denied" to a view.)

So in conclusion, Simpletest is great when it works as expected and when you understand what it does, but when you don't, it is always good to know a few techniques for further investigation.

Feb 06 2015
Feb 06

February 06, 2015

I have been using Simpletest on Drupal 7 for several years, and, used well, it can greatly enhance the quality of your code. I like to practice test-driven development: writing a failing test first, then run it multiple times, each time tweaking the code, until the test passes.

Simpletest works by spawning a completely new Drupal site (ignoring your current database), running tests, and destroying the database. Sometimes, a test will fail and you’re not quite sure why. Here are two tips to help you debug why your tests are failing:

Tip #1: debug()

The Drupal debug() function can be placed anywhere in your test or your source code, and the result will appear on the test results page in the GUI.

For example, if when you are playing around with the dev version of your site, things work fine, but in the test, a specific node contains invalid data, you can add this line anywhere in your test or source code which is being called during your test:

...
debug($node);
...

This will provide formatted output of your $node variable, alongside your test results.

Tip #2: die()

Sometimes the temporary test environment’s behaviour seems to make no sense. And it can be frustrating to not be able to simply log into it and play around with it, because it is destroyed after the test is over.

To understand this technique, here is quick primer on how Simpletest works:

  • In Drupal 7, running a test requires a host site and database. This is basically an installed Drupal site with Simpletest enabled, and your module somewhere in the modules directory (the module you are testing does not have to be enabled).
  • When you run a test, Simpletest creates a brand-new installation of Drupal using a special prefix simpletest123456 where 123456 is a random number. This allows Simpletest to have an isolated environment where to run tests, but on the same database and with the same credentials as the host.
  • When your test does something, like call a function, or load a page with, for example, $this->drupalGet('user'), the host environment is ignored and temporary environment (which uses the prefixed database tables) is used. In the previous example, the test loads the “user” page using a real HTTP calls. Simpletest knows to use the temporary environment because the call is made using a specially-crafted user agent.
  • When the test is over, all tables with the prefix simpletest123456 are destroyed.

If you have ever tried to run a test on a host environment which already contains a prefix, you will understand why you can get “table name too long” errors in certain cases: Simpletest is trying to add a prefix to another prefix. That’s one reason to avoid prefixes when you can, but I digress.

Now you can try this: somewhere in your test code, add die(), this will kill Simpletest, leaving the temporary database intact.

Here is an example: a colleague recently was testing a feature which exported a view. In the dev environment, the view was available to users with the role manager, as was expected. However when the test logged in as a manager user and attempted to access the view, the result was an “Access denied” page.

Because we couldn’t easily figure it out, I suggested adding die() to play around in the environment:

...
$this->drupalLogin($manager);
$this->drupalGet('inventory');
die();
$this->assertNoText('denied', 'A manager accessing the inventory page does not see "access denied"');
...

Now, when the test was run, we could:

  • wait for it to crash,
  • then examine our database to figure out which prefix the test was using,
  • change the database prefix in sites/default/settings.php from '' to (for example) 'simpletest73845'.
  • run drush uli to get a one-time login.

Now, it was easier to debug the source of the problem by visiting the views configuration for inventory: it turns out that features exports views with access by role using the role ID, not the role name (the role ID can be different for each environment). Simply changing the access method for the view from “by role” to “by permission” made the test pass, and prevented a potential security flaw in the code.

(Another reason to avoid “by role” access in views is that User 1 often does not have the role required, and it is often disconcerting to be user 1 and have “access denied” to a view.)

So in conclusion, Simpletest is great when it works as expected and when you understand what it does, but when you don’t, it is always good to know a few techniques for further investigation.

Please enable JavaScript to view the comments powered by Disqus.

Jan 20 2015
Jan 20

When building a Drupal 7 site, one oft-used technique is to keep the entire Drupal root under git (for Drupal 8 sites, I favor having the Drupal root one level up).

Starting a new project can be done by downloading an unversioned copy of D7, and initializing a git repo, like this:

Approach #1

drush dl
cd drupal*
git init
git add .
git commit -am 'initial project commit'
git remote add origin ssh://[email protected]/myproject

Another trick I learned from my colleagues at the Linux Foundation is to get Drupal via git and have two origins, like this:

Approach #2

git clone --branch 7.x http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected].example.com/myproject

This second approach lets you push changes to your own repo, and pull changes from the Drupal git repo. This has the advantage of keeping track of Drupal project commits, and your own project commits, in a unified git history.

git push origin 7.x
git pull drupal 7.x

If you are tight for space though, there might be one inconvenience: Approach #2 keeps track of the entire Drupal 7.x commit history, for example we are now tracking in our own repo commit e829881 by natrak, on June 2, 2000:

git log |grep e829881 --after-context=4
commit e8298816587f79e090cb6e78ea17b00fae705deb
Author: natrak <>
Date:   Fri Jun 2 18:43:11 2000 +0000

    CVS drives me nuts *G*

All of this information takes disk space: Approach #2 takes 156Mb, vs. 23Mb for approach #1. This may add up if you are working on several projects, and especially if for each project you have several environments for feature branches. If you have a continuous integration server tracking multiple projects and spawning new environments for each feature branch, several gigs of disk space can be used.

If you want to streamline the size of your git repos, you might want to try the --depth option of git clone, like this:

Approach #3

git clone --branch 7.x --depth 1 http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

Adding the --depth parameter here reduces the initial size of your repo to 18Mb in my test, which interestingly is even less than approach #1. Even though your repo is now linked to the Drupal git repo, by running git log you will see that the entire history is not being stored.

Jan 20 2015
Jan 20

January 20, 2015

When building a Drupal 7 site, one oft-used technique is to keep the entire Drupal root under git (for Drupal 8 sites, I favor having the Drupal root one level up).

Starting a new project can be done by downloading an unversioned copy of D7, and initializing a git repo, like this:

Approach #1

drush dl
cd drupal*
git init
git add .
git commit -am 'initial project commit'
git remote add origin ssh://[email protected]/myproject

Another trick I learned from my colleagues at the Linux Foundation is to get Drupal via git and have two origins, like this:

Approach #2

git clone --branch 7.x http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

This second approach lets you push changes to your own repo, and pull changes from the Drupal git repo. This has the advantage of keeping track of Drupal project commits, and your own project commits, in a unified git history.

git push origin 7.x
git pull drupal 7.x

If you are tight for space though, there might be one inconvenience: Approach #2 keeps track of the entire Drupal 7.x commit history, for example we are now tracking in our own repo commit e829881 by natrak, on June 2, 2000:

git log |grep e829881 --after-context=4
commit e8298816587f79e090cb6e78ea17b00fae705deb
Author: natrak <>
Date:   Fri Jun 2 18:43:11 2000 +0000

    CVS drives me nuts *G*

All of this information takes disk space: Approach #2 takes 156Mb, vs. 23Mb for approach #1. This may add up if you are working on several projects, and especially if for each project you have several environments for feature branches. If you have a continuous integration server tracking multiple projects and spawning new environments for each feature branch, several gigs of disk space can be used.

If you want to streamline the size of your git repos, you might want to try the --depth option of git clone, like this:

Approach #3

git clone --branch 7.x --depth 1 http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

Adding the --depth parameter here reduces the initial size of your repo to 18Mb in my test, which interestingly is even less than approach #1. Even though your repo is now linked to the Drupal git repo, by running git log you will see that the entire history is not being stored.

Please enable JavaScript to view the comments powered by Disqus.

Dec 03 2014
Dec 03

What is content? What is configuration? At first glance, the question seems simple, almost quaint, the kind one finds oneself patiently answering for the benefit of Drupal novices: content is usually information like nodes and taxonomy terms, while content types, views and taxonomy vocabularies are usually configuration.

Content lives in the database of each environment, we say, while configuration is exportable via Features or other mechanisms and should live in the Git repo (this has been called code-driven development).

Still, a definition of content and configuration is naggingly elusive: why "usually"? Why are there so many edge cases? We're engineers, we need precision! I often feel like I'm trying to define what a bird is: every child knows what a bird is, but it's hard to define it. Ostriches can't fly; platypuses lay eggs but aren't birds.

Why the distinction?

I recently saw an interesting comment titled "A heretic speaks" on a blog post about code-driven development. It sums up some of the uneasiness about the place of configuration in Drupal: "Drupal was built primarily with site builders in mind, and this is one reason [configuration] is in the database".

In effect, the primary distinction in Drupal is between code (Drupal core and config), and the database, which contains content types, nodes, and everything else.

As more complex sites were being built, a new distinction had to be made between two types of information in the database: configuration and content. This was required to allow development in a dev-stage-production workflow where features being developed outside of a production site could be deployed to production without squashing the database (and existing comments, nodes, and the like). We needed to move those features into code and we called them "configuration".

Thus the features module was born, allowing views, content types, and vocabularies (but not nodes and taxonomy terms) to be developed outside of the database, and then deployed into production.

Drupal 8's config management system takes that one step further by providing a mature, central API to deal with this.

The devil is in the details

This is all fine and good, but edge cases soon begin to arise:

  • What about an "About us" page? It's a menu item (deployable) linking to a node (content). Is it config? Is it content?
  • What about a "Social media" menu and its menu items? We want a Facebook link to be deployable, but we don't want to hard-code the actual link to our client's Facebook page (which feels like content) -- we probably don't even know what that link is during development.
  • What about a block whose placement is known, but whose content is not? Is this content? Is it configuration?
  • What about a view which references a taxonomy term id in a hard-coded filter. We can export the view, but the taxonomy term has an incremental ID ans is not guaranteed to work on all environments.

The wrong answer to any of these questions can lead to a misguided development approach which will come back to haunt you afterward. You might wind up using incremental IDs in your code or deploying something as configuration which is, in fact, content.

Defining our terms

At the risk of irking you, dear reader, I will suggest doing away with the terms "content" and "configuration" for our purposes: they are just too vague. Because we want a formal definition with no edge cases, I propose that we use these terms instead (we'll look at each in detail a bit further on):

  • Code: this is what our deliverable is for a given project. It should be testable, versioned, and deployable to any number of environments.
  • Data: this is whatever is potentially different on each environment to which our code is deployed. One example is comments: On a dev environment, we might generate thousands of dummy comments for theming purposes, but on prod there might be a few dozen only.
  • Placeholder content: this is any data which should be created as part of the installation process, meant to be changed later on.

Code

This is what our deliverable is for a given project. This is important. There is no single answer. Let's take the following examples:

  • If I am a contributor to the Views contrib project, my deliverable is a system which allows users to create views in the database. In this case I will not export many particular views.

  • For another project, my deliverable may be a website which contains a set number of lists (views). In this case I may use features (D7) or config management (D8) to export all the views my client asked for. Furthermore, I may enable views_ui (the Views User interface) only on my development box, and disable it on production.

  • For a third project, my deliverable may a website with a number of set views, plus the ability for the client to add new ones. In this only certain views will be in code, and I will enable the views UI as a dependency of my site deployment module. The views my client creates on production will be data.

Data

A few years ago, I took a step back from my day-to-day Drupal work and thought about what my main pain points were and how to do away with them. After consulting with colleagues, looking at bugs which took longest to fix, and looking at major sources of regressions, I realized that the one thing all major pain points had in common were our deployment techniques.

It struck me that cloning the database from production to development was wrong. Relying on production data to do development is sloppy and will cause problems. It is better to invest in realistic dummy content and a good site deployment module, allowing the standardized deployment of an environment in a few minutes from any commit.

Once we remove data from the development equation in this way, it is easier to define what data is: anything which can differ from one environment to the next without overriding a feature.

Furthermore, I like to think of production as just another environment, there is nothing special about it.

A new view or content type created on production outside of our development cycle resides on the database, is never used during the course of development, and is therefore data.

Nodes and taxonomy terms are data.

What about a view which is deployed through features and later changed on another environment? That's a tough one, I'll get to it (See Overriden features, below).

Placeholder content

Let's get back to our "About us" page. Three components are involved here:

  • The menu which contains the "About us" menu item. These types of menus are generally deployable, so let's call them code.
  • The "About us" node itself which has an incremental nid which can be different on each environment. On some environments it might not even exist.
  • The "About us" menu item, which should link to the node.

Remember: we are not cloning the production database, so the "About us" does not exist anywhere. For situations such as this, I will suggest the use of Placeholder content.

For sake of argument, let's define our deliverable for this sample project as follows:

"Define an _About us_ page which is modifiable".

We might be tempted to figure out a way to assign a unique ID to our "About us" node to make it deployable, and devise all kinds of techniques to make sure it cannot be deleted or overridden.

I have an approach which I consider more logical for these situations:

First, in my site deployment module's hook_update_N(), create the node and the menu item, bypassing features entirely. Something like:

function mysite_deploy_update_7023() {
  $node = new stdClass();
  $node->title = 'About us';
  $node->body[LANGUAGE_NONE][0]['format'] = 'filtered_html';
  $node->body[LANGUAGE_NONE][0]['value'] = 'Lorem ipsum...';
  $node->type = 'page';
  node_object_prepare($node);
  $node->uid = 1;
  $node->status = 1;
  $node->promote = 0;
  node_save($node);

  $menu_item = array(
    'link_path' => 'node/' . $node->nid,
    'link_title' => 'About us',
    'menu_name' => 'my-existing-menu-exported-via-features',
  );

  menu_link_save($item);
}

If you wish, you can also implement hook_requirements() in your custom module, to check that the About us page has not been accidentally deleted, that the menu item exists and points to a valid path.

What are the advantages of placeholder content?

  • It is deployable in a standard manner: any environment can simply run drush updb -y and the placeholder content will be deployed.
  • It can be changed without rendering your features (D7) or configuration (D8) overriden. This is a good thing: if our incremental deployment script calls features_revert() or drush fra -y (D7) or drush cim -y (D8), all changes to features are deleted. We do not want changes made to our placeholder content to be deleted.
  • It can be easily tested. All we need to do is make sure our site deployment module's hook_install() calls all hook_update_N()s; then we can enable our site deployment module within our simpletest, and run any tests we want against a known good starting point.

Overriden features

Although it is easy to override features on production, I would not recommend it. It is important to define with your client and your team what is code and what is data. Again, this depends on the project.

When a feature gets overridden, it is a symptom that someone does not understand the process. Here are a few ways to mitigate this:

  • Make sure your features are reverted (D7) or your configuration is imported (D8) as part of your deployment process, and automate that process with a continuous integration server. That way, if anyone overrides a feature on a production, it won't stay overridden long.
  • Limit administrator permissions so that only user 1 can override features (this can be more trouble than it's worth though).
  • Implement hook_requirements() to check for overridden features, warning you on the environment's dashboard if a feature has been overridden.

Some edge cases

Now, with our more rigorous approach, how do our edge cases fare?

Social media menu and items: Our deliverable here is the existence of a social media menu with two items (twitter and facebook), but whose links can be changed at any time on production without triggering an overridden feature. For this I would use placeholder content. Still, we need to theme each button separately, and our css does not know the incremental IDs of the menu items we are creating. I have successfully used the menu attributes module to associate classes to menu items, allowing easy theming. Here is an example, assuming menu_attributes exists and menu-social has been exported as a feature.

/**
 * Add facebook and twitter menu items
 */
function mysite_deploy_update_7117() {
  $item = array(
    'link_path' => 'http://twitter.com',
    'link_title' => 'Twitter',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'twitter',
      )
    )
  );
  menu_link_save($item);
  $item = array(
    'link_path' => 'http://facebook.com',
    'link_title' => 'Facebook',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'facebook',
      )
    )
  );
  menu_link_save($item);
}

The above code creates the menu items linking to Facebook and Twitter home pages, so that content editors can put in the correct links directly on production when they have them.

Placeholder content is just like regular data but it's created as part of the deployment process, as a service to the webmaster.

A block whose placement is known, but whose content is not. It may be tempting to use the box module which makes blocks exportable with feature. But in this case the block is more like placeholder content, so it should be deployed outside of features. And if you create your block programmatically, its id is incremental and it cannot be deployed with context, but should be placed in a region directly, again, programmatically in a hook_update_N().

Another approach here is to create a content type and a view with a block display, fetching the last published node of that content type and displaying it at the right place. If you go that route (which seems a bit overengineered to me), you can then place your block with the context module and export it via features.

A view which references a taxonomy term id in its filter: If a view requires access to a taxonomy term nid, then perhaps taxonomy is the wrong tool here. Taxonomy terms are data, they can be deleted, their names can be changed. It is not a good idea for a view to reference a specific taxonomy term. (Your view can use taxonomy terms for contextual filters without a problem, but we don't want to hard-code a specific term in a non-contextual filter -- See this issue for an example of how I learned this the hard way, I'll get around to fixing that soon...).

For this problem I would suggest rethinking our use of a taxonomy term. Rather I would define a select field with a set number of options (with defined keys and values). These are deployable and guaranteed to not change without triggering a features override. Thus, our views can safely use them. If you are implementing this change on an existing site, you will need to update all nodes from the old to the new technique in a hook_update_N() -- and probably add an automated test to make sure you're updating the data correctly. This is one more reason to think things through properly at the onset of your project, not midway through.

In conclusion

Content and configuration are hard to define, I prefer the following definitions:

  • Code: deployable, deliverable, versioned, tested piece of software.
  • Data: anything which can differ from one environment to the next.
  • Placeholder content: any data which should be created as part of the deployment process.

In my experience, what fits in each category depends on each project. Defining these with your team as part of your sprint planning will allow you create a system with less edge cases.

Dec 03 2014
Dec 03

December 03, 2014

What is content? What is configuration? At first glance, the question seems simple, almost quaint, the kind one finds oneself patiently answering for the benefit of Drupal novices: content is usually information like nodes and taxonomy terms, while content types, views and taxonomy vocabularies are usually configuration.

Content lives in the database of each environment, we say, while configuration is exportable via Features or other mechanisms and should live in the Git repo (this has been called code-driven development).

Still, a definition of content and configuration is naggingly elusive: why “usually”? Why are there so many edge cases? We’re engineers, we need precision! I often feel like I’m trying to define what a bird is: every child knows what a bird is, but it’s hard to define it. Ostriches can’t fly; platypuses lay eggs but aren’t birds.

Why the distinction?

I recently saw an interesting comment titled “A heretic speaks” on a blog post about code-driven development. It sums up some of the uneasiness about the place of configuration in Drupal: “Drupal was built primarily with site builders in mind, and this is one reason [configuration] is in the database”.

In effect, the primary distinction in Drupal is between code (Drupal core and config), and the database, which contains content types, nodes, and everything else.

As more complex sites were being built, a new distinction had to be made between two types of information in the database: configuration and content. This was required to allow development in a dev-stage-production workflow where features being developed outside of a production site could be deployed to production without squashing the database (and existing comments, nodes, and the like). We needed to move those features into code and we called them “configuration”.

Thus the features module was born, allowing views, content types, and vocabularies (but not nodes and taxonomy terms) to be developed outside of the database, and then deployed into production.

Drupal 8’s config management system takes that one step further by providing a mature, central API to deal with this.

The devil is in the details

This is all fine and good, but edge cases soon begin to arise:

  • What about an “About us” page? It’s a menu item (deployable) linking to a node (content). Is it config? Is it content?
  • What about a “Social media” menu and its menu items? We want a Facebook link to be deployable, but we don’t want to hard-code the actual link to our client’s Facebook page (which feels like content) – we probably don’t even know what that link is during development.
  • What about a block whose placement is known, but whose content is not? Is this content? Is it configuration?
  • What about a view which references a taxonomy term id in a hard-coded filter. We can export the view, but the taxonomy term has an incremental ID ans is not guaranteed to work on all environments.

The wrong answer to any of these questions can lead to a misguided development approach which will come back to haunt you afterward. You might wind up using incremental IDs in your code or deploying something as configuration which is, in fact, content.

Defining our terms

At the risk of irking you, dear reader, I will suggest doing away with the terms “content” and “configuration” for our purposes: they are just too vague. Because we want a formal definition with no edge cases, I propose that we use these terms instead (we’ll look at each in detail a bit further on):

  • Code: this is what our deliverable is for a given project. It should be testable, versioned, and deployable to any number of environments.
  • Data: this is whatever is potentially different on each environment to which our code is deployed. One example is comments: On a dev environment, we might generate thousands of dummy comments for theming purposes, but on prod there might be a few dozen only.
  • Placeholder content: this is any data which should be created as part of the installation process, meant to be changed later on.

Code

This is what our deliverable is for a given project. This is important. There is no single answer. Let’s take the following examples:

  • If I am a contributor to the Views contrib project, my deliverable is a system which allows users to create views in the database. In this case I will not export many particular views.

  • For another project, my deliverable may be a website which contains a set number of lists (views). In this case I may use features (D7) or config management (D8) to export all the views my client asked for. Furthermore, I may enable views_ui (the Views User interface) only on my development box, and disable it on production.

  • For a third project, my deliverable may a website with a number of set views, plus the ability for the client to add new ones. In this only certain views will be in code, and I will enable the views UI as a dependency of my site deployment module. The views my client creates on production will be data.

Data

A few years ago, I took a step back from my day-to-day Drupal work and thought about what my main pain points were and how to do away with them. After consulting with colleagues, looking at bugs which took longest to fix, and looking at major sources of regressions, I realized that the one thing all major pain points had in common were our deployment techniques.

It struck me that cloning the database from production to development was wrong. Relying on production data to do development is sloppy and will cause problems. It is better to invest in realistic dummy content and a good site deployment module, allowing the standardized deployment of an environment in a few minutes from any commit.

Once we remove data from the development equation in this way, it is easier to define what data is: anything which can differ from one environment to the next without overriding a feature.

Furthermore, I like to think of production as just another environment, there is nothing special about it.

A new view or content type created on production outside of our development cycle resides on the database, is never used during the course of development, and is therefore data.

Nodes and taxonomy terms are data.

What about a view which is deployed through features and later changed on another environment? That’s a tough one, I’ll get to it (See Overriden features, below).

Placeholder content

Let’s get back to our “About us” page. Three components are involved here:

  • The menu which contains the “About us” menu item. These types of menus are generally deployable, so let’s call them code.
  • The “About us” node itself which has an incremental nid which can be different on each environment. On some environments it might not even exist.
  • The “About us” menu item, which should link to the node.

Remember: we are not cloning the production database, so the “About us” does not exist anywhere. For situations such as this, I will suggest the use of Placeholder content.

For sake of argument, let’s define our deliverable for this sample project as follows:

"Define an _About us_ page which is modifiable".

We might be tempted to figure out a way to assign a unique ID to our “About us” node to make it deployable, and devise all kinds of techniques to make sure it cannot be deleted or overridden.

I have an approach which I consider more logical for these situations:

First, in my site deployment module’s hook_update_N(), create the node and the menu item, bypassing features entirely. Something like:

function mysite_deploy_update_7023() {
  $node = new stdClass();
  $node->title = 'About us';
  $node->body[LANGUAGE_NONE][0]['format'] = 'filtered_html';
  $node->body[LANGUAGE_NONE][0]['value'] = 'Lorem ipsum...';
  $node->type = 'page';
  node_object_prepare($node);
  $node->uid = 1;
  $node->status = 1;
  $node->promote = 0;
  node_save($node);

  $menu_item = array(
    'link_path' => 'node/' . $node->nid,
    'link_title' => 'About us',
    'menu_name' => 'my-existing-menu-exported-via-features',
  );

  menu_link_save($item);
}

If you wish, you can also implement hook_requirements() in your custom module, to check that the About us page has not been accidentally deleted, that the menu item exists and points to a valid path.

What are the advantages of placeholder content?

  • It is deployable in a standard manner: any environment can simply run drush updb -y and the placeholder content will be deployed.
  • It can be changed without rendering your features (D7) or configuration (D8) overriden. This is a good thing: if our incremental deployment script calls features_revert() or drush fra -y (D7) or drush cim -y (D8), all changes to features are deleted. We do not want changes made to our placeholder content to be deleted.
  • It can be easily tested. All we need to do is make sure our site deployment module’s hook_install() calls all hook_update_N()s; then we can enable our site deployment module within our simpletest, and run any tests we want against a known good starting point.

Overriden features

Although it is easy to override features on production, I would not recommend it. It is important to define with your client and your team what is code and what is data. Again, this depends on the project.

When a feature gets overridden, it is a symptom that someone does not understand the process. Here are a few ways to mitigate this:

  • Make sure your features are reverted (D7) or your configuration is imported (D8) as part of your deployment process, and automate that process with a continuous integration server. That way, if anyone overrides a feature on a production, it won’t stay overridden long.
  • Limit administrator permissions so that only user 1 can override features (this can be more trouble than it’s worth though).
  • Implement hook_requirements() to check for overridden features, warning you on the environment’s dashboard if a feature has been overridden.

Some edge cases

Now, with our more rigorous approach, how do our edge cases fare?

Social media menu and items: Our deliverable here is the existence of a social media menu with two items (twitter and facebook), but whose links can be changed at any time on production without triggering an overridden feature. For this I would use placeholder content. Still, we need to theme each button separately, and our css does not know the incremental IDs of the menu items we are creating. I have successfully used the menu attributes module to associate classes to menu items, allowing easy theming. Here is an example, assuming menu_attributes exists and menu-social has been exported as a feature.

/**
 * Add facebook and twitter menu items
 */
function mysite_deploy_update_7117() {
  $item = array(
    'link_path' => 'http://twitter.com',
    'link_title' => 'Twitter',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'twitter',
      )
    )
  );
  menu_link_save($item);
  $item = array(
    'link_path' => 'http://facebook.com',
    'link_title' => 'Facebook',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'facebook',
      )
    )
  );
  menu_link_save($item);
}

The above code creates the menu items linking to Facebook and Twitter home pages, so that content editors can put in the correct links directly on production when they have them.

Placeholder content is just like regular data but it’s created as part of the deployment process, as a service to the webmaster.

A block whose placement is known, but whose content is not. It may be tempting to use the box module which makes blocks exportable with feature. But in this case the block is more like placeholder content, so it should be deployed outside of features. And if you create your block programmatically, its id is incremental and it cannot be deployed with context, but should be placed in a region directly, again, programmatically in a hook_update_N().

Another approach here is to create a content type and a view with a block display, fetching the last published node of that content type and displaying it at the right place. If you go that route (which seems a bit overengineered to me), you can then place your block with the context module and export it via features.

A view which references a taxonomy term id in its filter: If a view requires access to a taxonomy term nid, then perhaps taxonomy is the wrong tool here. Taxonomy terms are data, they can be deleted, their names can be changed. It is not a good idea for a view to reference a specific taxonomy term. (Your view can use taxonomy terms for contextual filters without a problem, but we don’t want to hard-code a specific term in a non-contextual filter – See this issue for an example of how I learned this the hard way, I’ll get around to fixing that soon…).

For this problem I would suggest rethinking our use of a taxonomy term. Rather I would define a select field with a set number of options (with defined keys and values). These are deployable and guaranteed to not change without triggering a features override. Thus, our views can safely use them. If you are implementing this change on an existing site, you will need to update all nodes from the old to the new technique in a hook_update_N() – and probably add an automated test to make sure you’re updating the data correctly. This is one more reason to think things through properly at the onset of your project, not midway through.

In conclusion

Content and configuration are hard to define, I prefer the following definitions:

  • Code: deployable, deliverable, versioned, tested piece of software.
  • Data: anything which can differ from one environment to the next.
  • Placeholder content: any data which should be created as part of the deployment process.

In my experience, what fits in each category depends on each project. Defining these with your team as part of your sprint planning will allow you create a system with less edge cases.

Please enable JavaScript to view the comments powered by Disqus.

Sep 10 2014
Sep 10

What is code-driven development and why is it done?

Code-driven development is the practice of placing all development in code. How can development not be in code?, you ask.

In Drupal, what makes your site unique is often configuration which resides in the database: the current theme, active modules, module-specific configuration, content types, and so on.

For the purpose of this article, our goal will be for all configuration (the current theme, the content types, module-specific config, the active module list...) to be in code, and only content to be in the database. There are several advantages to this approach:

  • Because all our configuration is in code, we can package all of it into a single module, which we'll call a site deployment module. When enabled, this module should provide a fully workable site without any content.
  • When a site deployment module is combined with generated content, it becomes possible to create new instances of a website without cloning the database. Devel's devel_generate module, and Realistic Dummy Content can be used to create realistic dummy content. This makes on-ramping new developers easy and consistent.
  • Because unversioned databases are not required to be cloned to set up new environments, your continuous integration server can set up new instances of your site based on a known good starting point, making tests more robust.

Code-driven development for Drupal 7

Before moving on to D8, let's look at a typical D7 workflow: The technique I use for developing in Drupal 7 is making sure I have one or more features with my content types, views, contexts, and so on; as well as a site deployment module which contains, in its .install file, update hooks which revert my features when needed, enable new modules, and programmatically set configuration which can't be exported via features. That way,

  • incrementally deploying sites is as simple as calling drush updb -y (to run new update hooks).
  • deploying a site for the first time (or redeploying it from scratch) requires creating the database, enabling our site deployment module (which runs all or update hooks), and optionally generating dummy content if required. For example: drush si -y && drush en mysite_deploy -y && drush en devel_generate && drush generate-content 50.

I have been using this technique for a few years on all my D7 projects and, in this article, I will explore how something similar can be done in D8.

New in Drupal 8: configuration management

If, like me, you are using features exclusively to deploy websites (as opposed to using it to bundle generic functionality, for example having a "blog" feature, or a "calendar" feature you can add to any site), config management will replace features in D8. In D7, context is used to provide the ability to export block placement to features, and strongarm exports variables. In D8, variables no longer exist, and block placement is now exportable. All of these modules are thus no longer needed.

They are replaced by the concept of configuration management, a central API for importing and exporting configuration as yml files.

Configuration management and site UUIDs

In Drupal 8, sites are now assigned a UUID on install and configuration can only be synchronized between sites having the same UUID. This is fine if the site has been cloned at some point from one environment to another, but as mentioned above, we are avoiding database cloning: we want it to be possible to install a brand new instance of a site at any time.

We thus need a mechanism to assign the same UUID to all instances of our site, but still allow us to reinstall it without cloning the database.

The solution I am using is to assign a site UUID in the site deployment module. Thus, in Drupal 8, my site deployment module's .module file looks like this:

/**
 * @file
 * site deployment functions
 */
use Drupal\Core\Extension\InfoParser;

/**
 * Updates dependencies based on the site deployment's info file.
 *
 * If during the course of development, you add a dependency to your
 * site deployment module's .info file, increment the update hook
 * (see the .install module) and this function will be called, making
 * sure dependencies are enabled.
 */
function mysite_deploy_update_dependencies() {
  $parser = new InfoParser;
  $info_file = $parser->parse(drupal_get_path('module', 'mysite_deploy') . '/mysite_deploy.info.yml');
  if (isset($info_file['dependencies'])) {
    \Drupal::service('module_installer')->install($info_file['dependencies'], TRUE);
  }
}

/**
 * Set the UUID of this website.
 *
 * By default, reinstalling a site will assign it a new random UUID, making
 * it impossible to sync configuration with other instances. This function
 * is called by site deployment module's .install hook.
 *
 * @param $uuid
 *   A uuid string, for example 'e732b460-add4-47a7-8c00-e4dedbb42900'.
 */
function mysite_deploy_set_uuid($uuid) {
  \Drupal::configFactory() ->getEditable('system.site')
    ->set('uuid', $uuid)
    ->save();
}    

And the site deployment module's .install file looks like this:

/**
 * @file
 * site deployment install functions
 */

/**
 * Implements hook_install().
 */
function mysite_deploy_install() {
  // This module is designed to be enabled on a brand new instance of
  // Drupal. Settings its uuid here will tell this instance that it is
  // in fact the same site as any other instance. Therefore, all local
  // instances, continuous integration, testing, dev, and production
  // instances of a codebase will have the same uuid, enabling us to
  // sync these instances via the config management system.
  // See also https://www.drupal.org/node/2133325
  mysite_deploy_set_uuid('e732b460-add4-47a7-8c00-e4dedbb42900');
  for ($i = 8001; $i < 9000; $i++) {
    $candidate = 'mysite_deploy_update_' . $i;
    if (function_exists($candidate)) {
      $candidate();
    }
  }
}

/**
 * Update dependencies and revert features
 */
function mysite_deploy_update_8003() {
  // If you add a new dependency during your development:
  // (1) add your dependency to your .info file
  // (2) increment the number in this function name (example: change
  //     change 8003 to 8004)
  // (3) now, on each target environment, running drush updb -y
  //     will call the mysite_deploy_update_dependencies() function
  //     which in turn will enable all new dependencies.
  mysite_deploy_update_dependencies();
}

The only real difference between a site deployment module for D7 and D8, thus, is that the D8 version must define a UUID common to all instances of a website (local, dev, prod, testing...).

Configuration management directories: active, staging, deploy

Out of the box, there are two directories which can contain config management yml files:

  • The active directory, which is always empty and unused. It used to be there to store your active configuration, and it is still possible to do so, but I'm not sure how. We can ignore this directory for our purposes.
  • The staging directory, which can contain .yml files to be imported into a target site. (For this to work, as mentioned above, the .yml files will need to have been generated by a site having the same UUID as the target site, or else you will get an error message -- on the GUI the error message makes sense, but on the command line you will get the cryptic "There were errors validating the config synchronization.").

I will propose a workflow which ignores the staging directory as well, for the following reasons:

  • First, the staging directory is placed in sites/default/files/, a directory which contains user data and is explicitly ignored in Drupal's example.gitignore file (which makes sense). In our case, we want this information to reside in our git directory.
  • Second, my team has come to rely heavily on reinstalling Drupal and our site deployment module when things get corrupted locally. When you reinstall Drupal using drush si, the staging directory is deleted, so even if we did have the staging directory in git, we would be prevented from running drush si -y && drush en mysite_deploy -y, which we don't want.
  • Finally, you might want your config directory to be outside of your Drupal root, for security reasons.

For all of these reasons, we will add a new "deploy" configuration directory and put it in our git repo, but outside of our Drupal root.

Our directory hierarchy will now look like this:

mysite
  .git
  deploy
    README.txt
    ...
  drupal_root
    CHANGELOG.txt
    core
    ...

You can also have your deploy directory inside your Drupal root, but keep in mind that certain configuration information are sensitive, containing email addresses and the like. We'll see later on how to tell Drupal how it can find your "deploy" directory.

Getting started: creating your Drupal instance

Let's get started. Make sure you have version 7.x of Drush (compatible with Drupal 8), and create your git repo:

mkdir mysite
cd mysite
mkdir deploy
echo "Contains config meant to be deployed, see http://dcycleproject.org/blog/68" >> deploy/README.txt 
drush dl drupal-8.0.x
mv drupal* drupal_root
cp drupal_root/example.gitignore drupal_root/.gitignore
git init
git add .
git commit -am 'initial commit'

Now let's install our first instance of the site:

cd drupal_root
echo 'create database mysite'|mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysite -y

Now create a site deployment module: here is the code that works for me. We'll set the correct site UUID in mysite_deploy.install later. Add this to git:

git add drupal_root/modules/custom
git commit -am 'added site deployment module'

Now let's tell Drupal where our "deploy" config directory is:

  • Open sites/default/settings.php
  • Find the lines beginning with $config_directories
  • Add $config_directories['deploy'] = '../deploy';

We can now perform our first export of our site configuration:

cd drupal_root
drush config-export deploy -y

You will now notice that your "deploy" directory is filled with your site's configuration files, and you can add them to git.

git add .
git commit -am 'added config files'

Now we need to sync the site UUID from the database to the code, to make sure all subsequent instances of this site have the same UUID. Open deploy/system.site.yml and find UUID property, for example:

uuid: 03821007-701a-4231-8107-7abac53907b1
...

Now add this same value to your site deployment module's .install file, for example:

...
function mysite_deploy_install() {
  mysite_deploy_set_uuid('03821007-701a-4231-8107-7abac53907b1');
...

Let's create a view! A content type! Position a block!

To see how to export configuration, create some views and content types, position some blocks, and change the default theme.

Now let's export our changes

cd drupal_root
drush config-export deploy -y

Your git repo will be changed accordingly

cd ..
git status
git add .
git commit -am 'changed theme, blocks, content types, views'

Deploying your Drupal 8 site

At this point you can push your code to a git server, and clone it to a dev server. For testing purposes, we will simply clone it directly

cd ../
git clone mysite mysite_destination
cd mysite_destination/drupal_root
echo 'create database mysite_destination'|mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysite_destination -y

If you visit mysite_destination/drupal_root with a browser, you will see a plain new Drupal 8 site.

Before continuing, we need to open sites/default/settings.php on mysite_destination and add $config_directories['deploy'] = '../deploy';, as we did on the source site.

Now let the magic happen. Let's enable our site deployment module (to make sure our instance UUID is synched with our source site), and import our configuration from our "deploy" directory:

drush en mysite_deploy -y
drush config-import deploy -y

Now, on your destination site, you will see all your views, content types, block placements, and the default theme.

This deployment technique, which can be combined with generated dummy content, allows one to create new instances very quickly for new developers, testing, demos, continuous integration, and for production.

Incrementally deploying your Drupal 8 site

What about changes you make to the codebase once everything is already deployed. Let's change a view and run:

cd drupal_root
drush config-export deploy -y
cd ..
git commit -am 'more fields in view'

Let's deploy this now:

cd ../mysite_destination
git pull origin master
cd drupal_root
drush config-import deploy -y

As you can see, incremental deployments are as easy and standardized as initial deployments, reducing the risk of errors, and allowing incremental deployments to be run automatically by a continuous integration server.

Next steps and conclusion

Some aspects of your site's configuration (what makes your site unique) still can't be exported via the config management system, for example enabling new modules; for that we'll use update hooks as in Drupal 7. As of this writing Drupal 8 update hooks can't be run with Drush on the command line due to this issue.

Also, although a great GUI exists for importing and exporting configuration, I chose to do it on the command line so that I could easily create a Jenkins continuous integration job to deploy code to dev and run tests on each push.

For Drupal projects developed with a dev-stage-prod continuous integration workflow, the new config management system is a great productivity boost.

Sep 10 2014
Sep 10

September 10, 2014

What is code-driven development and why is it done?

Code-driven development is the practice of placing all development in code. How can development not be in code?, you ask.

In Drupal, what makes your site unique is often configuration which resides in the database: the current theme, active modules, module-specific configuration, content types, and so on.

For the purpose of this article, our goal will be for all configuration (the current theme, the content types, module-specific config, the active module list…) to be in code, and only content to be in the database. There are several advantages to this approach:

  • Because all our configuration is in code, we can package all of it into a single module, which we’ll call a site deployment module. When enabled, this module should provide a fully workable site without any content.
  • When a site deployment module is combined with generated content, it becomes possible to create new instances of a website without cloning the database. Devel’s devel_generate module, and Realistic Dummy Content can be used to create realistic dummy content. This makes on-ramping new developers easy and consistent.
  • Because unversioned databases are not required to be cloned to set up new environments, your continuous integration server can set up new instances of your site based on a known good starting point, making tests more robust.

Code-driven development for Drupal 7

Before moving on to D8, let’s look at a typical D7 workflow: The technique I use for developing in Drupal 7 is making sure I have one or more features with my content types, views, contexts, and so on; as well as a site deployment module which contains, in its .install file, update hooks which revert my features when needed, enable new modules, and programmatically set configuration which can’t be exported via features. That way,

  • incrementally deploying sites is as simple as calling drush updb -y (to run new update hooks).
  • deploying a site for the first time (or redeploying it from scratch) requires creating the database, enabling our site deployment module (which runs all or update hooks), and optionally generating dummy content if required. For example: drush si -y && drush en mysite_deploy -y && drush en devel_generate && drush generate-content 50.

I have been using this technique for a few years on all my D7 projects and, in this article, I will explore how something similar can be done in D8.

New in Drupal 8: configuration management

If, like me, you are using features exclusively to deploy websites (as opposed to using it to bundle generic functionality, for example having a “blog” feature, or a “calendar” feature you can add to any site), config management will replace features in D8. In D7, context is used to provide the ability to export block placement to features, and strongarm exports variables. In D8, variables no longer exist, and block placement is now exportable. All of these modules are thus no longer needed.

They are replaced by the concept of configuration management, a central API for importing and exporting configuration as yml files.

Configuration management and site UUIDs

In Drupal 8, sites are now assigned a UUID on install and configuration can only be synchronized between sites having the same UUID. This is fine if the site has been cloned at some point from one environment to another, but as mentioned above, we are avoiding database cloning: we want it to be possible to install a brand new instance of a site at any time.

We thus need a mechanism to assign the same UUID to all instances of our site, but still allow us to reinstall it without cloning the database.

The solution I am using is to assign a site UUID in the site deployment module. Thus, in Drupal 8, my site deployment module’s .module file looks like this:

/**
 * @file
 * site deployment functions
 */
use Drupal\Core\Extension\InfoParser;

/**
 * Updates dependencies based on the site deployment's info file.
 *
 * If during the course of development, you add a dependency to your
 * site deployment module's .info file, increment the update hook
 * (see the .install module) and this function will be called, making
 * sure dependencies are enabled.
 */
function mysite_deploy_update_dependencies() {
  $parser = new InfoParser;
  $info_file = $parser->parse(drupal_get_path('module', 'mysite_deploy') . '/mysite_deploy.info.yml');
  if (isset($info_file['dependencies'])) {
    \Drupal::service('module_installer')->install($info_file['dependencies'], TRUE);
  }
}

/**
 * Set the UUID of this website.
 *
 * By default, reinstalling a site will assign it a new random UUID, making
 * it impossible to sync configuration with other instances. This function
 * is called by site deployment module's .install hook.
 *
 * @param $uuid
 *   A uuid string, for example 'e732b460-add4-47a7-8c00-e4dedbb42900'.
 */
function mysite_deploy_set_uuid($uuid) {
  \Drupal::configFactory() ->getEditable('system.site')
    ->set('uuid', $uuid)
    ->save();
}    

And the site deployment module’s .install file looks like this:

/**
 * @file
 * site deployment install functions
 */

/**
 * Implements hook_install().
 */
function mysite_deploy_install() {
  // This module is designed to be enabled on a brand new instance of
  // Drupal. Settings its uuid here will tell this instance that it is
  // in fact the same site as any other instance. Therefore, all local
  // instances, continuous integration, testing, dev, and production
  // instances of a codebase will have the same uuid, enabling us to
  // sync these instances via the config management system.
  // See also https://www.drupal.org/node/2133325
  mysite_deploy_set_uuid('e732b460-add4-47a7-8c00-e4dedbb42900');
  for ($i = 8001; $i < 9000; $i++) {
    $candidate = 'mysite_deploy_update_' . $i;
    if (function_exists($candidate)) {
      $candidate();
    }
  }
}

/**
 * Update dependencies and revert features
 */
function mysite_deploy_update_8003() {
  // If you add a new dependency during your development:
  // (1) add your dependency to your .info file
  // (2) increment the number in this function name (example: change
  //     change 8003 to 8004)
  // (3) now, on each target environment, running drush updb -y
  //     will call the mysite_deploy_update_dependencies() function
  //     which in turn will enable all new dependencies.
  mysite_deploy_update_dependencies();
}

The only real difference between a site deployment module for D7 and D8, thus, is that the D8 version must define a UUID common to all instances of a website (local, dev, prod, testing…).

Configuration management directories: active, staging, deploy

Out of the box, there are two directories which can contain config management yml files:

  • The active directory, which is always empty and unused. It used to be there to store your active configuration, and it is still possible to do so, but I’m not sure how. We can ignore this directory for our purposes.
  • The staging directory, which can contain .yml files to be imported into a target site. (For this to work, as mentioned above, the .yml files will need to have been generated by a site having the same UUID as the target site, or else you will get an error message – on the GUI the error message makes sense, but on the command line you will get the cryptic “There were errors validating the config synchronization.”).

I will propose a workflow which ignores the staging directory as well, for the following reasons:

  • First, the staging directory is placed in sites/default/files/, a directory which contains user data and is explicitly ignored in Drupal’s example.gitignore file (which makes sense). In our case, we want this information to reside in our git directory.
  • Second, my team has come to rely heavily on reinstalling Drupal and our site deployment module when things get corrupted locally. When you reinstall Drupal using drush si, the staging directory is deleted, so even if we did have the staging directory in git, we would be prevented from running drush si -y && drush en mysite_deploy -y, which we don’t want.
  • Finally, you might want your config directory to be outside of your Drupal root, for security reasons.

For all of these reasons, we will add a new “deploy” configuration directory and put it in our git repo, but outside of our Drupal root.

Our directory hierarchy will now look like this:

mysite
  .git
  deploy
    README.txt
    ...
  drupal_root
    CHANGELOG.txt
    core
    ...

You can also have your deploy directory inside your Drupal root, but keep in mind that certain configuration information are sensitive, containing email addresses and the like. We’ll see later on how to tell Drupal how it can find your “deploy” directory.

Getting started: creating your Drupal instance

Let’s get started. Make sure you have version 7.x of Drush (compatible with Drupal 8), and create your git repo:

mkdir mysite
cd mysite
mkdir deploy
echo "Contains config meant to be deployed, see http://dcycleproject.org/blog/68" >> deploy/README.txt
drush dl drupal-8.0.x
mv drupal* drupal_root
cp drupal_root/example.gitignore drupal_root/.gitignore
git init
git add .
git commit -am 'initial commit'

Now let’s install our first instance of the site:

cd drupal_root
echo 'create database mysite'|mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysite -y

Now create a site deployment module: here is the code that works for me. We’ll set the correct site UUID in mysite_deploy.install later. Add this to git:

git add drupal_root/modules/custom
git commit -am 'added site deployment module'

Now let’s tell Drupal where our “deploy” config directory is:

  • Open sites/default/settings.php
  • Find the lines beginning with $config_directories
  • Add $config_directories['deploy'] = '../deploy';

Edit: using a config directory name other than ‘sync’ will cause an issue Config Split at the time of this writing.

We can now perform our first export of our site configuration:

cd drupal_root
drush config-export deploy -y

You will now notice that your “deploy” directory is filled with your site’s configuration files, and you can add them to git.

git add .
git commit -am 'added config files'

Now we need to sync the site UUID from the database to the code, to make sure all subsequent instances of this site have the same UUID. Open deploy/system.site.yml and find UUID property, for example:

uuid: 03821007-701a-4231-8107-7abac53907b1
...

Now add this same value to your site deployment module’s .install file, for example:

...
function mysite_deploy_install() {
  mysite_deploy_set_uuid('03821007-701a-4231-8107-7abac53907b1');
...

Let’s create a view! A content type! Position a block!

To see how to export configuration, create some views and content types, position some blocks, and change the default theme.

Now let’s export our changes

cd drupal_root
drush config-export deploy -y

Your git repo will be changed accordingly

cd ..
git status
git add .
git commit -am 'changed theme, blocks, content types, views'

Deploying your Drupal 8 site

At this point you can push your code to a git server, and clone it to a dev server. For testing purposes, we will simply clone it directly

cd ../
git clone mysite mysite_destination
cd mysite_destination/drupal_root
echo 'create database mysite_destination'|mysql -uroot -proot
drush si --db-url=mysql://root:[email protected]/mysite_destination -y

If you visit mysite_destination/drupal_root with a browser, you will see a plain new Drupal 8 site.

Before continuing, we need to open sites/default/settings.php on mysite_destination and add $config_directories['deploy'] = '../deploy';, as we did on the source site.

Now let the magic happen. Let’s enable our site deployment module (to make sure our instance UUID is synched with our source site), and import our configuration from our “deploy” directory:

drush en mysite_deploy -y
drush config-import deploy -y

Now, on your destination site, you will see all your views, content types, block placements, and the default theme.

This deployment technique, which can be combined with generated dummy content, allows one to create new instances very quickly for new developers, testing, demos, continuous integration, and for production.

Incrementally deploying your Drupal 8 site

What about changes you make to the codebase once everything is already deployed. Let’s change a view and run:

cd drupal_root
drush config-export deploy -y
cd ..
git commit -am 'more fields in view'

Let’s deploy this now:

cd ../mysite_destination
git pull origin master
cd drupal_root
drush config-import deploy -y

As you can see, incremental deployments are as easy and standardized as initial deployments, reducing the risk of errors, and allowing incremental deployments to be run automatically by a continuous integration server.

Next steps and conclusion

Some aspects of your site’s configuration (what makes your site unique) still can’t be exported via the config management system, for example enabling new modules; for that we’ll use update hooks as in Drupal 7. As of this writing Drupal 8 update hooks can’t be run with Drush on the command line due to this issue.

Also, although a great GUI exists for importing and exporting configuration, I chose to do it on the command line so that I could easily create a Jenkins continuous integration job to deploy code to dev and run tests on each push.

For Drupal projects developed with a dev-stage-prod continuous integration workflow, the new config management system is a great productivity boost.

Please enable JavaScript to view the comments powered by Disqus.

Jul 30 2014
Jul 30

I had this checklist documented internally, but I keep referring back to it so I'll make it available here in case anyone else needs it. The idea here is to document a minimum (not an ideal) set of modules and tasks which I do for almost all projects.

Questions to ask of a client at the project launch

  • Is your site bilingual? If so is there more than one domain? (if so, and you are exporting your languages as Features, your domain is exported with it. If your domains are different on different environments, you might want to use language_domain to override the domains per environment)
  • What type of compatibility do you need: tablet, mobile, which versions of IE?
  • How do you see your post-launch support and core/module update contract?
  • Do you need SSL support?
  • What is your hosting arrangement?
  • Do you have a contact form?
  • What is your anti-spam method? Note that CAPTCHA is no longer useful; I like Mollom, but it's giving me more and more false positives with time. Honeypot has given me good results as well.
  • Is WYSIWYG required? I strongly suggest using Markdown instead.
  • Confirm that all emails are sent in plain text, not HTML. If you're sending out HTML mail, do it right.
  • Do you need an on-site search utility? If so, some thought, and resources, need to go into it or it will be frustrating.
  • What kind of load do you expect on your site (anonymous and admin users)? This information can be used for load testing.
  • If you already have a site, should old paths of critical content map to paths on the new site?
  • Should users be allowed to create accounts (with spam considerations, and see if an admin should approve them).

Here is what should get done in the first Agile sprint, aka Sprint Zero:

  • If you are using continuous integration, a Jenkins job for tracking the master branch: this job should fail if any test fails on the codebase, or if quality metrics (code review, for example, or pdepend metrics) reach predefined thresholds.
  • A Jenkins job for pushing to dev. This is triggered by the first job if tests pass. It pushed the new code to the dev environment, and updates the dev environment's database. The database is never cloned; rather, a site deployment module is used.
  • An issue queue is set up and the client is given access to it, and training on how to use it.
  • A wiki is set up.
  • A dev environment is set up. This is where the code gets pushed automatically if all tests pass.
  • A prod environment is set up. This environment is normally updated manually after each end of sprint demo.
  • A git repo is set up with a basic Drupal site.
  • A custom module is set up in sites/*/modules/custom: this is where custom function go.
  • A site deployment module in sites/all/modules/custom. All deployment-related code and dependencies go here. A .test file and an .install should be included.
  • A site development module is set up in sites/*/modules/custom, which is meant to contain all modules required or useful for development, as dependencies.
  • A custom theme is created.
  • An initial feature is created in sites/*/modules/features. This is where all your features will be added.
  • A "sites/*/modules/patches" folder is created (with a README.txt file, to make sure it goes into git). This is where core and contrib patches should go. Your site's maintainers should apply these patches when core or contrib modules are updated. Patch names here should include the node id and comment number on Drupal.org.

Basic module list (always used)

Development modules (not enabled on production)

I normally create a custom development module with these as dependencies:

I make sure this module is in my repo but it is not enabled unless used:

Experimental modules

  • dcycle, this is a module that is in active development, not ready for prime yet, but where I try to add all my code to help with testing, etc.

Multilingual modules

  • i18n
  • potx
  • l10n_update
  • entity_translation if you need the same node id to display in several languages. This is useful if you have references to nodes which should be translated.
  • title if you are using entity translations and your titles can be multilingual.

Launch checklist

  • Design a custom 404, error and maintenance page.
  • Path, alias and permalink strategy. (Might require pathauto.)
  • Think of adding revisions to content types to avoid clients losing their data.
  • Don't display errors on production.
  • Optimize CSS, JS and page caching.
  • Views should be cached.
  • System messages are properly themed.
  • Prevent very simple passwords.
  • Using syslog instead of dblog on prod

In conclusion

Most shops, and most developers, have some sort of checklist like this. Mine is not any better or worse than most, but can be a good starting point. Another note: I've seen at least three Drupal teams try, and fail, to implement a "Drupal Starter kit for Company XYZ" and keep it under version control. The problem with that approach, as opposed to a checklist, is that it's not lightweight enough: it is a software product which needs maintenance, and after a while no one maintains it.

Jul 30 2014
Jul 30

July 30, 2014

I had this checklist documented internally, but I keep referring back to it so I’ll make it available here in case anyone else needs it. The idea here is to document a minimum (not an ideal) set of modules and tasks which I do for almost all projects.

Questions to ask of a client at the project launch

  • Is your site bilingual? If so is there more than one domain? (if so, and you are exporting your languages as Features, your domain is exported with it. If your domains are different on different environments, you might want to use language_domain to override the domains per environment)
  • What type of compatibility do you need: tablet, mobile, which versions of IE?
  • How do you see your post-launch support and core/module update contract?
  • Do you need SSL support?
  • What is your hosting arrangement?
  • Do you have a contact form?
  • What is your anti-spam method? Note that CAPTCHA is no longer useful; I like Mollom, but it’s giving me more and more false positives with time. Honeypot has given me good results as well.
  • Is WYSIWYG required? I strongly suggest using Markdown instead.
  • Confirm that all emails are sent in plain text, not HTML. If you’re sending out HTML mail, do it right.
  • Do you need an on-site search utility? If so, some thought, and resources, need to go into it or it will be frustrating.
  • What kind of load do you expect on your site (anonymous and admin users)? This information can be used for load testing.
  • If you already have a site, should old paths of critical content map to paths on the new site?
  • Should users be allowed to create accounts (with spam considerations, and see if an admin should approve them).

Here is what should get done in the first Agile sprint, aka Sprint Zero:

  • If you are using continuous integration, a Jenkins job for tracking the master branch: this job should fail if any test fails on the codebase, or if quality metrics (code review, for example, or pdepend metrics) reach predefined thresholds.
  • A Jenkins job for pushing to dev. This is triggered by the first job if tests pass. It pushed the new code to the dev environment, and updates the dev environment’s database. The database is never cloned; rather, a site deployment module is used.
  • An issue queue is set up and the client is given access to it, and training on how to use it.
  • A wiki is set up.
  • A dev environment is set up. This is where the code gets pushed automatically if all tests pass.
  • A prod environment is set up. This environment is normally updated manually after each end of sprint demo.
  • A git repo is set up with a basic Drupal site.
  • A custom module is set up in sites/*/modules/custom: this is where custom function go.
  • A site deployment module in sites/all/modules/custom. All deployment-related code and dependencies go here. A .test file and an .install should be included.
  • A site development module is set up in sites/*/modules/custom, which is meant to contain all modules required or useful for development, as dependencies.
  • A custom theme is created.
  • An initial feature is created in sites/*/modules/features. This is where all your features will be added.
  • A “sites/*/modules/patches” folder is created (with a README.txt file, to make sure it goes into git). This is where core and contrib patches should go. Your site’s maintainers should apply these patches when core or contrib modules are updated. Patch names here should include the node id and comment number on Drupal.org.

Basic module list (always used)

Development modules (not enabled on production)

I normally create a custom development module with these as dependencies:

I make sure this module is in my repo but it is not enabled unless used:

Experimental modules

  • dcycle, this is a module that is in active development, not ready for prime yet, but where I try to add all my code to help with testing, etc.

Multilingual modules

  • i18n
  • potx
  • l10n_update
  • entity_translation if you need the same node id to display in several languages. This is useful if you have references to nodes which should be translated.
  • title if you are using entity translations and your titles can be multilingual.

Launch checklist

  • Design a custom 404, error and maintenance page.
  • Path, alias and permalink strategy. (Might require pathauto.)
  • Think of adding revisions to content types to avoid clients losing their data.
  • Don’t display errors on production.
  • Optimize CSS, JS and page caching.
  • Views should be cached.
  • System messages are properly themed.
  • Prevent very simple passwords.
  • Using syslog instead of dblog on prod

In conclusion

Most shops, and most developers, have some sort of checklist like this. Mine is not any better or worse than most, but can be a good starting point. Another note: I’ve seen at least three Drupal teams try, and fail, to implement a “Drupal Starter kit for Company XYZ” and keep it under version control. The problem with that approach, as opposed to a checklist, is that it’s not lightweight enough: it is a software product which needs maintenance, and after a while no one maintains it.

Please enable JavaScript to view the comments powered by Disqus.

May 23 2014
May 23

One of the techniques I use to make sure I write tests is to write them before I do anything else, which is known as test-driven development. If you develop your functionality before writing a test, in most cases you will never write the test to go with it, because you will be pressured to move on to new features.

I have found, though, that when writing tests, our team tends to think only about the happy path: what happens if everything goes according to plan.

Let me give an quick example: let's say you are developing a donation system for anonymous users to make donations on your site. The user story calls for a form where a donation amount can be entered before redirecting the user to the payment form. Using test-driven development and Drupal's Simpletest framework, we might start by writing something like this in our site deployment module's .test file:

// @file mysite_deploy.test

class MysiteDonate extends DrupalWebTestCase {

  ...

  public function testSite() {

    $edit = array(
      'amount' => 420,
    );
    ...
    $this->drupalPost('donate', $edit, 'Donate now!');
    ...
    $this->assertText('You are about to donate $420', 'The donation amount has been recorded');
  }

  ...
}

When you first run this test it will fail, and your job as a developer will be to make this test pass. That's test-driven development.

The problem with this approach is that it only defines the happy path: what should happen when all goes according to plan. It makes no provision for the sad path: what happens if a user puts something other than a number? What happens if 0 is entered? These are known as sad paths, and most teams never think about them until they occur (human nature, I guess).

To make sure we think about the sad path, I start by making sure the right questions are asked during our Agile sprint planning sessions. In the case of the "donation" user story mentioned above, the following business questions should be asked during sprint planning:

  • What's the minimum donation? Obviously it should not be possible to donate $0, but is $0.01 OK?
  • Is there a maximum donation? Should the system bring you to the checkout page if you enter 1 billion dollars in the donation box?

Often, the client will not have thought of that, and will answer something like: sure there should be a minimum and a maximum, and we also want site administrators to be able to edit those. Let's say the team agrees on this (and the extra work it entails), the admin interface too should be tested.

Once the sprint planning session is over, I will start by writing the test based on business considerations above, and also integrating other sad paths I can think of, into my test.

Here is what our test might look like now, assuming we have a setUp() function which enables our site deployment module and dependent features (including roles); and we are using the loginAsRole() method, documented here:

// @file mysite_deploy.test

class MysiteDonate extends DrupalWebTestCase {

  ...

  public function testSite() {

    // Manage minimum and maximum donation amounts.
    $this->drupalGet('admin/options');
    $this->assertText('Access denied', 'Non-admin users cannot access the configuration page');
    $this->loginAsRole('administrator');
    $edit = array(
      'minimum' => '50',
      'maximum' => $this->randomName(),
    );
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum and maximum donation amounts must be numeric');
    $edit['maximum'] = '40';
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum amount must be equal to or less than maximum donation amount');
    $edit['maximum'] = '30';
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum maximum donation amounts have been saved');
    $this->drupalLogout();

    // Make a donation, sad path
    $edit = array(
      'amount' => '<script>alert("hello!")</script>',
    );
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Donation amount must be numeric', 'Intercept non-numeric input.');
    $edit['amount'] = 29;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Thanks for your generosity, but we do not accept donations below $30.');
    $edit['amount'] = 41;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Wow, $41! Do not do this through our website, please contact us and we will discuss this over the phone.');

    // Make a donation, happy path
    $edit['amount'] = 30;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('You are about to donate $30', 'The donation amount has been recorded');
  }

  ...
}

The above example is a much more complete portrait of what your site should do, and documenting everything in a failing test even before you or someone else starts coding ensures you don't forget validations and the like.

One interesting thing to note about our complete test is that sad paths actually take up a lot more effort than the happy path. There are many advantages to thinking of them first:

  • The client can be involved in making business decisions which can affect the sad path.
  • The entire team (including the client) is made aware as early as possible about sad path considerations, and the extra work they entail.
  • Nothing is taken for granted as obvious: time is set aside for sad path development.
  • The sad path becomes an integral part of your user story which can be part of the demo. Often in Agile sprint reviews, if no one has ever thought of the sad path, only the happy path is demonstrated.
  • There is less technical debt associated with sad path development: you are less likely to get a panicked call from your client once your site goes live about getting dozens of 50 cent donations when the payment processor is taking a dollar in fees.
  • Your code will be more secure: you will think about how your system can be hacked and integrate hacking attempts (and the appropriate response) directly into your test.
  • You will be more confident putting a failing test on a feature branch and handing it to junior developers: they will be less likely to forget something.
  • Thinking of the sad path can make you reconsider how to define your features: a contact form or commenting system can seem trivial when you only think of the happy path. However, when you take into account how to deal with spam, you might decide to not allow comments at all, or to allow only authenticated users to post comments or use the contact form.

Note that as in all test-driven development, your test is not set in stone. It is like any other code: developers can modify it as long as they follow the spirit of your test. For example, maybe your config page is not admin/option but something else. Developers should feel that they own the test and can change it to fit the real system.

May 23 2014
May 23

May 23, 2014

One of the techniques I use to make sure I write tests is to write them before I do anything else, which is known as test-driven development. If you develop your functionality before writing a test, in most cases you will never write the test to go with it, because you will be pressured to move on to new features.

I have found, though, that when writing tests, our team tends to think only about the happy path: what happens if everything goes according to plan.

Let me give an quick example: let’s say you are developing a donation system for anonymous users to make donations on your site. The user story calls for a form where a donation amount can be entered before redirecting the user to the payment form. Using test-driven development and Drupal’s Simpletest framework, we might start by writing something like this in our site deployment module’s .test file:

// @file mysite_deploy.test

class MysiteDonate extends DrupalWebTestCase {

  ...

  public function testSite() {

    $edit = array(
      'amount' => 420,
    );
    ...
    $this->drupalPost('donate', $edit, 'Donate now!');
    ...
    $this->assertText('You are about to donate $420', 'The donation amount has been recorded');
  }

  ...
}

When you first run this test it will fail, and your job as a developer will be to make this test pass. That’s test-driven development.

The problem with this approach is that it only defines the happy path: what should happen when all goes according to plan. It makes no provision for the sad path: what happens if a user puts something other than a number? What happens if 0 is entered? These are known as sad paths, and most teams never think about them until they occur (human nature, I guess).

To make sure we think about the sad path, I start by making sure the right questions are asked during our Agile sprint planning sessions. In the case of the “donation” user story mentioned above, the following business questions should be asked during sprint planning:

  • What’s the minimum donation? Obviously it should not be possible to donate $0, but is $0.01 OK?
  • Is there a maximum donation? Should the system bring you to the checkout page if you enter 1 billion dollars in the donation box?

Often, the client will not have thought of that, and will answer something like: sure there should be a minimum and a maximum, and we also want site administrators to be able to edit those. Let’s say the team agrees on this (and the extra work it entails), the admin interface too should be tested.

Once the sprint planning session is over, I will start by writing the test based on business considerations above, and also integrating other sad paths I can think of, into my test.

Here is what our test might look like now, assuming we have a setUp() function which enables our site deployment module and dependent features (including roles); and we are using the loginAsRole() method, documented here:

// @file mysite_deploy.test

class MysiteDonate extends DrupalWebTestCase {

  ...

  public function testSite() {

    // Manage minimum and maximum donation amounts.
    $this->drupalGet('admin/options');
    $this->assertText('Access denied', 'Non-admin users cannot access the configuration page');
    $this->loginAsRole('administrator');
    $edit = array(
      'minimum' => '50',
      'maximum' => $this->randomName(),
    );
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum and maximum donation amounts must be numeric');
    $edit['maximum'] = '40';
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum amount must be equal to or less than maximum donation amount');
    $edit['maximum'] = '30';
    $this->drupalPost('admin/option', $edit, 'Save');
    $this->assertText('Minimum maximum donation amounts have been saved');
    $this->drupalLogout();

    // Make a donation, sad path
    $edit = array(
      'amount' => '<script>alert("hello!")</script>',
    );
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Donation amount must be numeric', 'Intercept non-numeric input.');
    $edit['amount'] = 29;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Thanks for your generosity, but we do not accept donations below $30.');
    $edit['amount'] = 41;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('Wow, $41! Do not do this through our website, please contact us and we will discuss this over the phone.');

    // Make a donation, happy path
    $edit['amount'] = 30;
    $this->drupalPost('donate', $edit, 'Donate now!');
    $this->assertText('You are about to donate $30', 'The donation amount has been recorded');
  }

  ...
}

The above example is a much more complete portrait of what your site should do, and documenting everything in a failing test even before you or someone else starts coding ensures you don’t forget validations and the like.

One interesting thing to note about our complete test is that sad paths actually take up a lot more effort than the happy path. There are many advantages to thinking of them first:

  • The client can be involved in making business decisions which can affect the sad path.
  • The entire team (including the client) is made aware as early as possible about sad path considerations, and the extra work they entail.
  • Nothing is taken for granted as obvious: time is set aside for sad path development.
  • The sad path becomes an integral part of your user story which can be part of the demo. Often in Agile sprint reviews, if no one has ever thought of the sad path, only the happy path is demonstrated.
  • There is less technical debt associated with sad path development: you are less likely to get a panicked call from your client once your site goes live about getting dozens of 50 cent donations when the payment processor is taking a dollar in fees.
  • Your code will be more secure: you will think about how your system can be hacked and integrate hacking attempts (and the appropriate response) directly into your test.
  • You will be more confident putting a failing test on a feature branch and handing it to junior developers: they will be less likely to forget something.
  • Thinking of the sad path can make you reconsider how to define your features: a contact form or commenting system can seem trivial when you only think of the happy path. However, when you take into account how to deal with spam, you might decide to not allow comments at all, or to allow only authenticated users to post comments or use the contact form.

Note that as in all test-driven development, your test is not set in stone. It is like any other code: developers can modify it as long as they follow the spirit of your test. For example, maybe your config page is not admin/option but something else. Developers should feel that they own the test and can change it to fit the real system.

Please enable JavaScript to view the comments powered by Disqus.

Apr 22 2014
Apr 22

My development team is using a site deployment module which, when enabled, deploys our entire website (with translations, views, content types, the default theme, etc.).

We defined about 30 tests (and counting) which are linked to Agile user stories and confirm that the site is doing what it's supposed to do. These tests are defined in Drupal's own Simpletest framework, and works as follows: for every test, our site deployment module is enabled on a new database (the database is never cloned), which can take about two minutes; the test is run, and then the temporary database is destroyed.

This created the following problem: because we were deploying our site 30 times during our test run, a single test run was taking over 90 minutes. Furthermore, we are halfway into the project, and we anticipate doubling, perhaps tripling our test coverage, which would mean our tests would take over four hours to run.

Now, we have a Jenkins server which performs all the tests every time a change is detected in Git, but even so, when several people are pushing to the git repo, test results which are 90 minutes old tend to be harder to debug, and developers tend to ignore, subvert and resent the whole testing process.

We could combine tests so the site would be deployed less often during the testing process, but this causes another problem: tests which are hundreds of lines long, and which validate unrelated functionality, are harder to debug than short tests, so it is not a satisfactory solution.

When we look at what is taking so long, we notice that a majority of the processing power goes to install (deploy) our testing environment for each test, which is then destroyed after a very short test.

Enter Simpletest Turbo, which provides very simple code to cache your database once the setUp() function is run, so the next test can simply reuse the same database starting point rather than recreate everything from scratch.

Although Simpletest Turbo is in early stages of development, I have used it to almost quadruple the speed of my tests, as you can see from this Jenkins trend chart:

I know: my tests are failing more than I would like them to, but now I'm getting feedback every 25 minutes instead of every 95 minutes, so failures are easier to pinpoint and fix.

Furthermore, fairly little time is spent deploying the site: this is done once, and the following tests use a cached deployment, so we are not merely speeding up our tests (as we would if we were adding hardware): we are streamlining duplicate effort. It thus becomes relatively cheap to add new independent tests, because they are using a cached site setup.

Apr 22 2014
Apr 22

April 22, 2014

My development team is using a site deployment module which, when enabled, deploys our entire website (with translations, views, content types, the default theme, etc.).

We defined about 30 tests (and counting) which are linked to Agile user stories and confirm that the site is doing what it’s supposed to do. These tests are defined in Drupal’s own Simpletest framework, and works as follows: for every test, our site deployment module is enabled on a new database (the database is never cloned), which can take about two minutes; the test is run, and then the temporary database is destroyed.

This created the following problem: because we were deploying our site 30 times during our test run, a single test run was taking over 90 minutes. Furthermore, we are halfway into the project, and we anticipate doubling, perhaps tripling our test coverage, which would mean our tests would take over four hours to run.

Now, we have a Jenkins server which performs all the tests every time a change is detected in Git, but even so, when several people are pushing to the git repo, test results which are 90 minutes old tend to be harder to debug, and developers tend to ignore, subvert and resent the whole testing process.

We could combine tests so the site would be deployed less often during the testing process, but this causes another problem: tests which are hundreds of lines long, and which validate unrelated functionality, are harder to debug than short tests, so it is not a satisfactory solution.

When we look at what is taking so long, we notice that a majority of the processing power goes to install (deploy) our testing environment for each test, which is then destroyed after a very short test.

Enter Simpletest Turbo, which provides very simple code to cache your database once the setUp() function is run, so the next test can simply reuse the same database starting point rather than recreate everything from scratch.

Although Simpletest Turbo is in early stages of development, I have used it to almost quadruple the speed of my tests, as you can see from this Jenkins trend chart:

I know: my tests are failing more than I would like them to, but now I’m getting feedback every 25 minutes instead of every 95 minutes, so failures are easier to pinpoint and fix.

Furthermore, fairly little time is spent deploying the site: this is done once, and the following tests use a cached deployment, so we are not merely speeding up our tests (as we would if we were adding hardware): we are streamlining duplicate effort. It thus becomes relatively cheap to add new independent tests, because they are using a cached site setup.

Please enable JavaScript to view the comments powered by Disqus.

Feb 26 2014
Feb 26

Many Drupal projects now under maintenance suffer from technical debt: a lot of the functionality is in the database and outside of git, and the code lacks automated testing. Furthermore, the functionality is often brittle: a change to one feature breaks something seemingly unrelated.

As our community and our industry mature, teams are increasingly interested in automated testing. Having worked on several Drupal projects with and without automated testing, I've come to the conclusion that any line of code which is not subject to automated testing is legacy code; and I agree with Michael Feathers who stated in his book Working Effectively with Legacy Code[1] that a site with zero automated tests is a legacy site from the moment you deliver it.

But the road to automatic testing for Drupal is, as I've learned the hard way, strewn with obstacles, and first-time implementations of automated testing tend to fail. Here are a few tips to keep in mind if your team is willing to implement automated testing.

Tip #1: Use a continuous integration server

Tests are only useful if someone actually runs them. If you don't automate running the test suite on each push to your git repo, no one will run your tests, however good their intentions are.

The absolute first thing you need to do is set up a continuous integration (CI) server which runs a script every time your git repo changes. To make this easier I've set up a project on GitHub which uses Vagrant and Puppet to set up a quick Jenkins server tailored for use with Drupal.

Even before starting to write tests, make sure your continuous integration job actually runs on your master branch. When your project passes tests (which is easy at first because you won't have tests), your project will be marked as stable.

Notice that I mentioned the master branch: although git has advanced branching features, the only branch you should track in your CI server is your stable branch (often master, although for projects with more than one stable release, like Drupal itself, you may have two or three stable branches).

It is important at this point to get the team (including the client) used to seeing the continuous integration dashboard, ideally by having a monitor in a visible place (this team even plugged Jenkins into a stop light, which really grabs attention in case of a failure). If your code is flagged as failed by your CI server, you want it to be known as soon as possible, and you want the entire team to have responsibility for fixing it immediately. Your main enemy here is failure fatigue: if your master branch is broken, and no one is working at fixing it, you will get used to seeing failures and you will fail at implementing automated testing.

Eventually, you will want to add value to your continuous integration job by running Code Review tests, and other code analysis tools like Pdepend. With these kinds of tools, you can get a historical perspective on metrics like adherance to Drupal coding standards, the number of lines of code per function, code abstraction, and the like. I even like to have my Jenkins job take a screenshot of my site on every push (using PhantomJS), and comparing the latest screenshot to the previous one ImageMagick's compare utility.

Basically, any testing and analysis you can do on the command line should be done within your continuous integration job.

If done right, and if you have high confidence in your test suite, you can eventually use your CI server to deploy continuously to preproduction, but let's not get ahead of ourselves.

Tip #2: Test your code, not the database

Most Drupal developers I've talked to create their local development environment by bringing their git repo up to date, and cloning the production database.

They also tend to clone the production or preproduction database back to Jenkins in their continuous integration.

For me, this is the wrong approach, as I've documented in this blog post.

Basically, any tests you write should reside in your git repo and be limited to testing what's in the git repo. If you try to test the production database, here is a typical scenario:

  • Someone will do something to your database which will break a test.

  • Your Jenkins job will clone the database, run the test, and fail.

  • Another person will make another change to the database, and your test will now pass.

You will now see a history of failures which will indicate problems outside of your code. These will be very hard to reproduce and fix.

Keep in mind that the tests you write should depend on a known good starting point: you should be able to consistently reproduce an environment leading to a success or a failure. Drupal's Simpletests completely ignore the current host database and create a new database from scratch just for testing, then destroy that database.

How to do this? First, I always use a site deployment module whose job it is to populate the database with everything that makes your site unique: enabling the site deployment module should enable all modules used by your site, and, using Features and related modules, deploy all views, content types, and the like, set all variables and set the default theme. The site deployment module can then be used by new developers on your team who need a development environment, and also by the CI server, all without cloning the database. If you need dummy content for development, you can use Devel's devel_generate utility, along with this trick to make your generated content more realistic.

When a bug is reported on your production site, you should reproduce it consistently in your dummy content, and then run your test against the simulation, not the real data. An example of this is the use of Wysiwyg: often, lorem ipsum works fine, but once the client starts copy-pasting from Word, all kinds of problems arise. Simulated word-generated markup is the kind of thing your test should set up, and then test against.

If you are involved in a highly-critical project, you might eventually want to run certain tests on a clone of your production database, but this, in my opinion, should not be attempted until you have proper test coverage and metrics for your code itself. If you do test a clone of your production database and a bug is found, reproduce the bug in a simulation, add a test to confirm the bug, and fix your code. Fixing your code to deal with a problem in production without simulating the problem first, and testing the simulation, just results in more legacy code.

Tip #3: Understand the effort involved

Testing is time-consuming. If your client or employer asks for it, that desire needs to come with the appropriate resources. Near the beginning of a project, you can easily double all time estimates, and the payoff will come later on.

Stakeholders cannot expect the same velocity for a project with and without automated testing: if you are implementing testing correctly, your end-of-sprint demos will contain less features. On the other hand, once you have reached your sweet spot (see chart, above), the more manageable number of bugs will mean you can continue working on features.

Tip #4: Start gradually

Don't try to test everything at once. If your team is called upon to "implement automated testing" on a project, you are very likely to succumb to test paralysis if you try to implement it all at once.

When working with legacy sites, or even new sites for which there is pressure to deliver fast, I have seen many teams never deliver a single test, instead delivering excuses such as "it's really simple, we don't need to test it", or "we absolutely had to deliver it this week". In reality, we tend to see "automated testing" as insurmountable and try to weasel our way of it.

To overcome this, I often start a project with a single test: find a function in your code which you can run against a unit test (no database required), and write your first test. In Drupal, you can use a Simpletest Unit test (as in this example) and then run it straight from the browser.

Once you're satisfied, add this line to your CI job so the test is run on every push:

drush test-run mytestgroup

Once that is done, it becomes easier for developers to write their own tests by adding it to the test file already present.

Tip #5: Don't overestimate how good a developer you are

We all think we're good developers, and really we can't imagine anything ever going wrong with our code, I mean, it's so elegant! Well, we're wrong.

I've seen really intelligent people write code which looks really elegant, but still breaks.

I've seen developers never write tests for the simple stuff because it's too simple, and never write tests for the more complex stuff because they never practiced with the simple stuff.

Even though you're positive your code is so robust it will never break, just test it.

Tip #6: Start with the low-hanging fruit

This is an error I made myself and which proved very painful. Consider a system with three possible use cases for the end user. Each use case uses the same underlying calls to the database, and the same underlying pure functions.

Now, let's say you are using a high-level testing framework like Behat and Selenium to test the rich user interface and you write three tests, one for each use case. You think (wrongly, as we'll see) that you don't need unit tests, because whatever it is you want to test with your unit tests is already tested by your high-level rich user interface tests.

Don't forget, your specs also call for you to support IE8, IE9, Webkit (Safari) and Firefox. You can set up Jenkins to run the rich GUI tests via Selenium Grid on a Windows VM, and other fancy stuff.

This approach is wrong, because when you start having 5, 8, 10, 20 use cases, you will be tempted to continue just implement dozens of new, expensive rich GUI tests, and your tests will end up taking hours.

In my experience, if your entire test suite takes more than two hours to run, developers will start resenting the process and ignoring the test results, and you are back to square one.

In his book Succeeding with Agile, Mike Cohn came up with the idea of a test pyramid, as shown in the diagram below (you can learn more about the concept in this blog post).

Based on this concept, we quickly realize that:

  • Several steps are redundant among the GUI use cases.
  • The exact same underlying functionality is tested several times over.

Thinking of this from a different angle, we can start by testing our pure functions using unit tests. This will make for lightning-fast tests, and will get the team into the habit of not mixing UI functions, database functions and pure functions (for an example of what not to do, see Drupal's own block_admin_display_form_submit).

Once you have built up a suite of unit tests which actually has value, move on to the next step: tests which require the database. This requires some variation of a site deployment module or another technique to bring the database to a known-good starting point before you run the test; it is harder to grasp and setting up a CI job for these types of tests is difficult too. However, your team will more likely be willing to work hard to overcome these obstacles because of the success they achieved with unit tests.

All of the above can be done with Drupal's core simpletest.

Finally, when you are satisfied with your unit test suites and your database tests, you can move outside of Drupal and on targeted tests (not all usecases, only a few to make sure your widgets work) with Behat, Mink, Selenium, Windows/IE VMs. If you start with the fancy stuff, though, or have too much of it, the risk of failure is much greater.

Tip #7: Don't underestimate developers' ability to avoid writing tests

If you implement all the tips you've seen until now in this article, something curious will happen: no one will write any tests. Not even you.

Here's the psychology behind not writing tests:

  • You really have the intention of writing tests, you just want to get your feature working first.
  • You work hard at getting your feature ready for the end-of-sprint demo.
  • You show off your feature to the team and they like it.
  • You don't write any tests.

The above will happen to you. And keep in mind, you're actually very interested in automated testing (enough to have read this article until now!). Now imagine your teammates, who are less interested in automated testing. They don't stand a chance.

These are some techniques to get people to write tests:

The first is used by the Drupal project itself and is based on peer review of patches. If you submit a patch to core and it does not contain tests, it will not make it in. This requires that all code be reviewed before making it into your git repo's stable branch. There are tools for this, like Phabricator, but I've never successfully implemented this approach (if you have, let me know!).

The second approach is to write your tests before writing a new feature or fixing a bug. This is known as test-driven development (TDD) and it generally requires people to see things from a different angle. Here is a typical scenario of TDD:

  • A bug comes in for project xyz, and you are assigned to it.

  • You write a test for it. If you don't know something (no function exists yet, so you don't know what it's called; no field exists yet, so you don't know how to target it), just put something feasible. If you're dealing with the body field in your test, just use body. Try to test all conceivable happy paths and sad paths.

  • Now switch modes: your goal is to make the test pass. This is an iterative process which entails writing code and changing your test as well (your test is code too, don't forget!). For example, perhaps the body field's machine name is not body but something like field_body[und][0]. If such is the case, change the test, as long as the spirit of the test remains.

The above techniques, and code coverage tools like code_coverage or the experimental cover, which I like, will help you write tests, but changing a team's approach can only be achieved through hard work, evangelizing, presentations, blogging, and the like.

Tip #8: Don't subvert your process

When it becomes challenging to write tests, you might figure that, just this once, you'll not test something. A typical example I've seen of this, in project after project, is communication with outside systems and outside APIs. Because we're not controlling the outside system, it's hard to test it, right? True, but not impossible. If you've set aside enough time in your estimates to do things right, you will be able to implement mock objects, making sure you test everything.

For example, in this blog post, I demonstrate how I used the Mockable module to define mock objects to test integration between Drupal and a content deployment system.

You will come across situations where implementing testing seems very hard, but however much effort I put into implementing automated testing for something, I have never regretted it.

Bonus tip: the entire team should own the tests

Your tests cannot be imposed by any one member of the team if they are to succeed. Instead, agree on what should be tested during your sprint planning.

For example, some developers (myself included) like to have close to zero Drupal styling errors. Others don't really see the point of using two spaces instead of a tab. Unless you agree on what defines a failure (more than 100 minor styling errors? 1000? No threshold at all?), developers will feel resentful of having to fix it.

Because in Agile, your client is part of team as well, it is a good idea to involve them in defining what you are testing, providing them with the costs and benefits of each test. Perhaps your client doesn't know what a MySQL query is, but if told that keeping the number of queries to less than 100 on the home page (something that can be tracked automatically) will keep performance up, they will be more likely to accept the extra cost associated.

Conclusion

Automated testing is about much more than tools (often the tools are quite simple to set up). The human aspect and the methodology are much more important to get your automated testing project off the ground.

[1] See Jez Humble and David Farley's Continuous Delivery, Addison Wesley.

Feb 26 2014
Feb 26

February 26, 2014

Many Drupal projects now under maintenance suffer from technical debt: a lot of the functionality is in the database and outside of git, and the code lacks automated testing. Furthermore, the functionality is often brittle: a change to one feature breaks something seemingly unrelated.

As our community and our industry mature, teams are increasingly interested in automated testing. Having worked on several Drupal projects with and without automated testing, I’ve come to the conclusion that any line of code which is not subject to automated testing is legacy code; and I agree with Michael Feathers who stated in his book Working Effectively with Legacy Code[1] that a site with zero automated tests is a legacy site from the moment you deliver it.

But the road to automatic testing for Drupal is, as I’ve learned the hard way, strewn with obstacles, and first-time implementations of automated testing tend to fail. Here are a few tips to keep in mind if your team is willing to implement automated testing.

Tip #1: Use a continuous integration server

Tests are only useful if someone actually runs them. If you don’t automate running the test suite on each push to your git repo, no one will run your tests, however good their intentions are.

The absolute first thing you need to do is set up a continuous integration (CI) server which runs a script every time your git repo changes. To make this easier I’ve set up a project on GitHub which uses Vagrant and Puppet to set up a quick Jenkins server tailored for use with Drupal.

Even before starting to write tests, make sure your continuous integration job actually runs on your master branch. When your project passes tests (which is easy at first because you won’t have tests), your project will be marked as stable.

Notice that I mentioned the master branch: although git has advanced branching features, the only branch you should track in your CI server is your stable branch (often master, although for projects with more than one stable release, like Drupal itself, you may have two or three stable branches).

It is important at this point to get the team (including the client) used to seeing the continuous integration dashboard, ideally by having a monitor in a visible place (this team even plugged Jenkins into a stop light, which really grabs attention in case of a failure). If your code is flagged as failed by your CI server, you want it to be known as soon as possible, and you want the entire team to have responsibility for fixing it immediately. Your main enemy here is failure fatigue: if your master branch is broken, and no one is working at fixing it, you will get used to seeing failures and you will fail at implementing automated testing.

Eventually, you will want to add value to your continuous integration job by running Code Review tests, and other code analysis tools like Pdepend. With these kinds of tools, you can get a historical perspective on metrics like adherance to Drupal coding standards, the number of lines of code per function, code abstraction, and the like. I even like to have my Jenkins job take a screenshot of my site on every push (using PhantomJS), and comparing the latest screenshot to the previous one ImageMagick’s compare utility.

Basically, any testing and analysis you can do on the command line should be done within your continuous integration job.

If done right, and if you have high confidence in your test suite, you can eventually use your CI server to deploy continuously to preproduction, but let’s not get ahead of ourselves.

Tip #2: Test your code, not the database

Most Drupal developers I’ve talked to create their local development environment by bringing their git repo up to date, and cloning the production database.

They also tend to clone the production or preproduction database back to Jenkins in their continuous integration.

For me, this is the wrong approach, as I’ve documented in this blog post.

Basically, any tests you write should reside in your git repo and be limited to testing what’s in the git repo. If you try to test the production database, here is a typical scenario:

  • Someone will do something to your database which will break a test.

  • Your Jenkins job will clone the database, run the test, and fail.

  • Another person will make another change to the database, and your test will now pass.

You will now see a history of failures which will indicate problems outside of your code. These will be very hard to reproduce and fix.

Keep in mind that the tests you write should depend on a known good starting point: you should be able to consistently reproduce an environment leading to a success or a failure. Drupal’s Simpletests completely ignore the current host database and create a new database from scratch just for testing, then destroy that database.

How to do this? First, I always use a site deployment module whose job it is to populate the database with everything that makes your site unique: enabling the site deployment module should enable all modules used by your site, and, using Features and related modules, deploy all views, content types, and the like, set all variables and set the default theme. The site deployment module can then be used by new developers on your team who need a development environment, and also by the CI server, all without cloning the database. If you need dummy content for development, you can use Devel’s devel_generate utility, along with this trick to make your generated content more realistic.

When a bug is reported on your production site, you should reproduce it consistently in your dummy content, and then run your test against the simulation, not the real data. An example of this is the use of Wysiwyg: often, lorem ipsum works fine, but once the client starts copy-pasting from Word, all kinds of problems arise. Simulated word-generated markup is the kind of thing your test should set up, and then test against.

If you are involved in a highly-critical project, you might eventually want to run certain tests on a clone of your production database, but this, in my opinion, should not be attempted until you have proper test coverage and metrics for your code itself. If you do test a clone of your production database and a bug is found, reproduce the bug in a simulation, add a test to confirm the bug, and fix your code. Fixing your code to deal with a problem in production without simulating the problem first, and testing the simulation, just results in more legacy code.

Tip #3: Understand the effort involved

Testing is time-consuming. If your client or employer asks for it, that desire needs to come with the appropriate resources. Near the beginning of a project, you can easily double all time estimates, and the payoff will come later on.

Stakeholders cannot expect the same velocity for a project with and without automated testing: if you are implementing testing correctly, your end-of-sprint demos will contain less features. On the other hand, once you have reached your sweet spot (see chart, above), the more manageable number of bugs will mean you can continue working on features.

Tip #4: Start gradually

Don’t try to test everything at once. If your team is called upon to “implement automated testing” on a project, you are very likely to succumb to test paralysis if you try to implement it all at once.

When working with legacy sites, or even new sites for which there is pressure to deliver fast, I have seen many teams never deliver a single test, instead delivering excuses such as “it’s really simple, we don’t need to test it”, or “we absolutely had to deliver it this week”. In reality, we tend to see “automated testing” as insurmountable and try to weasel our way of it.

To overcome this, I often start a project with a single test: find a function in your code which you can run against a unit test (no database required), and write your first test. In Drupal, you can use a Simpletest Unit test (as in this example) and then run it straight from the browser.

Once you’re satisfied, add this line to your CI job so the test is run on every push:

drush test-run mytestgroup

Once that is done, it becomes easier for developers to write their own tests by adding it to the test file already present.

Tip #5: Don’t overestimate how good a developer you are

We all think we’re good developers, and really we can’t imagine anything ever going wrong with our code, I mean, it’s so elegant! Well, we’re wrong.

I’ve seen really intelligent people write code which looks really elegant, but still breaks.

I’ve seen developers never write tests for the simple stuff because it’s too simple, and never write tests for the more complex stuff because they never practiced with the simple stuff.

Even though you’re positive your code is so robust it will never break, just test it.

Tip #6: Start with the low-hanging fruit

This is an error I made myself and which proved very painful. Consider a system with three possible use cases for the end user. Each use case uses the same underlying calls to the database, and the same underlying pure functions.

Now, let’s say you are using a high-level testing framework like Behat and Selenium to test the rich user interface and you write three tests, one for each use case. You think (wrongly, as we’ll see) that you don’t need unit tests, because whatever it is you want to test with your unit tests is already tested by your high-level rich user interface tests.

Don’t forget, your specs also call for you to support IE8, IE9, Webkit (Safari) and Firefox. You can set up Jenkins to run the rich GUI tests via Selenium Grid on a Windows VM, and other fancy stuff.

This approach is wrong, because when you start having 5, 8, 10, 20 use cases, you will be tempted to continue just implement dozens of new, expensive rich GUI tests, and your tests will end up taking hours.

In my experience, if your entire test suite takes more than two hours to run, developers will start resenting the process and ignoring the test results, and you are back to square one.

In his book Succeeding with Agile, Mike Cohn came up with the idea of a test pyramid, as shown in the diagram below (you can learn more about the concept in this blog post).

Based on this concept, we quickly realize that:

  • Several steps are redundant among the GUI use cases.
  • The exact same underlying functionality is tested several times over.

Thinking of this from a different angle, we can start by testing our pure functions using unit tests. This will make for lightning-fast tests, and will get the team into the habit of not mixing UI functions, database functions and pure functions (for an example of what not to do, see Drupal’s own block_admin_display_form_submit).

Once you have built up a suite of unit tests which actually has value, move on to the next step: tests which require the database. This requires some variation of a site deployment module or another technique to bring the database to a known-good starting point before you run the test; it is harder to grasp and setting up a CI job for these types of tests is difficult too. However, your team will more likely be willing to work hard to overcome these obstacles because of the success they achieved with unit tests.

All of the above can be done with Drupal’s core simpletest.

Finally, when you are satisfied with your unit test suites and your database tests, you can move outside of Drupal and on targeted tests (not all usecases, only a few to make sure your widgets work) with Behat, Mink, Selenium, Windows/IE VMs. If you start with the fancy stuff, though, or have too much of it, the risk of failure is much greater.

Tip #7: Don’t underestimate developers’ ability to avoid writing tests

If you implement all the tips you’ve seen until now in this article, something curious will happen: no one will write any tests. Not even you.

Here’s the psychology behind not writing tests:

  • You really have the intention of writing tests, you just want to get your feature working first.
  • You work hard at getting your feature ready for the end-of-sprint demo.
  • You show off your feature to the team and they like it.
  • You don’t write any tests.

The above will happen to you. And keep in mind, you’re actually very interested in automated testing (enough to have read this article until now!). Now imagine your teammates, who are less interested in automated testing. They don’t stand a chance.

These are some techniques to get people to write tests:

The first is used by the Drupal project itself and is based on peer review of patches. If you submit a patch to core and it does not contain tests, it will not make it in. This requires that all code be reviewed before making it into your git repo’s stable branch. There are tools for this, like Phabricator, but I’ve never successfully implemented this approach (if you have, let me know!).

The second approach is to write your tests before writing a new feature or fixing a bug. This is known as test-driven development (TDD) and it generally requires people to see things from a different angle. Here is a typical scenario of TDD:

  • A bug comes in for project xyz, and you are assigned to it.

  • You write a test for it. If you don’t know something (no function exists yet, so you don’t know what it’s called; no field exists yet, so you don’t know how to target it), just put something feasible. If you’re dealing with the body field in your test, just use body. Try to test all conceivable happy paths and sad paths.

  • Now switch modes: your goal is to make the test pass. This is an iterative process which entails writing code and changing your test as well (your test is code too, don’t forget!). For example, perhaps the body field’s machine name is not body but something like field_body[und][0]. If such is the case, change the test, as long as the spirit of the test remains.

The above techniques, and code coverage tools like code_coverage or the experimental cover, which I like, will help you write tests, but changing a team’s approach can only be achieved through hard work, evangelizing, presentations, blogging, and the like.

Tip #8: Don’t subvert your process

When it becomes challenging to write tests, you might figure that, just this once, you’ll not test something. A typical example I’ve seen of this, in project after project, is communication with outside systems and outside APIs. Because we’re not controlling the outside system, it’s hard to test it, right? True, but not impossible. If you’ve set aside enough time in your estimates to do things right, you will be able to implement mock objects, making sure you test everything.

For example, in this blog post, I demonstrate how I used the Mockable module to define mock objects to test integration between Drupal and a content deployment system.

You will come across situations where implementing testing seems very hard, but however much effort I put into implementing automated testing for something, I have never regretted it.

Bonus tip: the entire team should own the tests

Your tests cannot be imposed by any one member of the team if they are to succeed. Instead, agree on what should be tested during your sprint planning.

For example, some developers (myself included) like to have close to zero Drupal styling errors. Others don’t really see the point of using two spaces instead of a tab. Unless you agree on what defines a failure (more than 100 minor styling errors? 1000? No threshold at all?), developers will feel resentful of having to fix it.

Because in Agile, your client is part of team as well, it is a good idea to involve them in defining what you are testing, providing them with the costs and benefits of each test. Perhaps your client doesn’t know what a MySQL query is, but if told that keeping the number of queries to less than 100 on the home page (something that can be tracked automatically) will keep performance up, they will be more likely to accept the extra cost associated.

Conclusion

Automated testing is about much more than tools (often the tools are quite simple to set up). The human aspect and the methodology are much more important to get your automated testing project off the ground.

[1] See Jez Humble and David Farley’s Continuous Delivery, Addison Wesley.

Please enable JavaScript to view the comments powered by Disqus.

Jan 20 2014
Jan 20

Drupal uses incremental IDs for such data as taxonomy terms and nodes, but not content types or vocabularies. If, like me, you believe your site's codebase should work with different environments and different databases, your incremental IDs can be different on each environment, causing your code to break.

But wait, you are thinking, I have only one environment: my production environment.

Even if such is the case, there are advantages to be able to spawn new environments independently of the production environment without cloning the database upstream:

  • Everything you need to create your website, minus the content, is under version control. The production database, being outside version control, should not be needed to install a new environment. See also "what is a deployment module?".
  • New developers can be up and running with a predictable environment and dummy content.
  • Your automated tests, using Drupal's Simpletest, by default deploy a new environment without cloning the database.
  • For predictable results in your continuous integration server, it is best to deploy a new envrionment. The production database is unpredictable and unversioned. If you test it, your test results will be unpredictable as well.
  • Maybe in the future you'll need a separate version of your site with different data (for a new market, perhaps).

Even if you choose to clone the database upstream for development, testing and continuous integration, it is still a good idea to avoid referencing incremental IDs of a particular database, because at some point you might decide that it is important to be able to have environments with different databases.

Example #1: using node IDs in CSS and in template files

I have often seen this: particular pages (say, nodes 56 and 400) require particular markup, so we see template files like page--node--56.tpl.php and css like this:

.page-node-56 #content,
.page-node-400 #content {
   ...
}

When, as developers, we decide to use this type of code on a website, we are tightly coupling our code, which is under version control, to our database, which is not under version control. In other words our project as a whole can no longer be said to be versioned as it requires a database clone to work correctly.

Also, this creates all sorts of problems: if, for example, a new node needs to be created which has the same characteristics as nodes 56 and 400, one must fiddle with the database (to create the node) and the code. Also, creating automatic tests for something like this is hard because the approach is not based on underlying logic.

A better approach to this problem might be to figure out why nodes 56 and 400 are somehow different than the others. The solution will depend on your answer to that question, and maybe these nodes need to be of a different content type; or maybe some other mechanism should be used. In all cases, though, their ID should be irrelevant to their specificity.

Example #2: filtering a view by taxonomy tag

You might have a website which uses Drupal's default implementation of articles, with a tag taxonomy field. You might decide that all articles tagged with "blog" should appear in your blog, and you might create a new view, filtered to display all articles with the "blog" tag.

Now, you might export your view into a feature and, perhaps, make your feature a dependency of a site deployment module (so that enabling this module on a new environment will deploy your blog feature, and do everything else necessary to make your site unique, such as enabling the default theme, etc.).

It is important to understand that with this approach, you are in effect putting an incremental ID into code. You view is in fact filtering by the ID of the "blog" taxonomy term as it happens to exist on the site used to create the view. When creating the view, we have no idea what this ID is, but we are saying that in order for our view to work, the "blog" taxonomy term needs to be identical on all environments.

Here is an example of how this bug will play out:

  • This being the most important feature of your site, when creating new environments, the "blog" taxonomy term might always have the ID 1 because it is the first taxonomy term created; you might also be in the habit of cloning your database for new environments, in which case the problem will remain latent.
  • You might decide that such a feature is too "simple" to warrant automated testing; but even if you do define an automated test, your test will run on a new database and will need to create the "blog" taxonomy term in order to validate. Because your tests are separate and simple, the "blog" taxonomy term is probably the only term created during testing, so it, too will have ID 1, and thus your test will pass.
  • Your continuous integration server which monitors changes to your versioned code will run tests against every push, but, again, on a new database, so your tests will pass and your code will be fine.

This might go on for quite some time until, on a given environment, someone decides to create another term before creating the "blog" term. Now the "blog" term will have ID #2 which will break your feature.

Consider, furthermore, that your client decides to create a new view for "jobs" and use the same tag mechanism as for the blog; and perhaps other tags as well. Before long, your entire development cycle becomes dependent on database cloning to work properly.

To come up with a better approach, it is important to understand what we are trying to accomplish; and what taxonomy terms are meant to be used for:

  • The "blog" category here is somehow, logically, immutable and means something very specific. Furthermore, the existence of the blog category is required for our site. Even if its name changes, the key (or underlying identity) of the blog category should always be the same.
  • Taxonomy terms are referenced with incremental IDs (like nodes) and thus, when writing our code, their IDs (and even their existence) cannot be counted upon.

In this case, we are using taxonomy terms for the wrong purpose. Taxonomy terms, like nodes, are meant to be potentially different for each environment: our code should not depend on them.

A potential solution in this case would be to create a new field for articles, perhaps a multiple selection field, with "blog" as one of the possible values. Now, when we create a view filtered by the value "blog" in our new field, we are no longer referencing an incremental ID in our code.

I myself made this very mistake with my own website code without realizing it. The code for this website (the one you are reading) is available on Github and the issue for this problem is documented here (I'll try to get around to fixing it soon!).

Deploying a fix to an existing site

If you apply these practices from the start of a project, it is relatively straightforward. However, what if a site is already in production with several articles already labelled "blog" (as is the case on the Dcycle website itself)? In this case we need to incrementally deploy the fix. For this, a site deployment module can be of use: in your site deployment module's .install file, you can add a new update hook to update all your existing articles labelled "blog", something like:

/**
 * Use a machine name rather than an incremental ID to display blog items.
 */
function mysite_deploy_update_7010() {
  // deploy the new version of the view to the target site
  features_revert(array('mysite_feature' => array('views_view')));
  ...
  // cycle through your nodes and add "blog" to your new field for any
  // content labelled "blog".
}

Of course, you need to test this first with a clone of your production site, perhaps even adding an automatic test to make sure your function works as expected. Also, if you have a lot of nodes, you might need to use the "sandbox" feature of hook_update_n(), to avoid timeouts.

Once all is tested, all that needs to be done, on each environment (production, every developer's laptop, etc.), is run drush updb -y on the command line.

Conclusion

Drupal makes it very easy to mix incremental IDs into views and code, and this will work well if you always use the same database on every environment. However, you will quickly run into problems if you want to write automated tests or deploy new sites without cloning the database. Being aware of this can help you write more logical, consistent and predictable code.

Jan 20 2014
Jan 20

January 20, 2014

Drupal uses incremental IDs for such data as taxonomy terms and nodes, but not content types or vocabularies. If, like me, you believe your site’s codebase should work with different environments and different databases, your incremental IDs can be different on each environment, causing your code to break.

But wait, you are thinking, I have only one environment: my production environment.

Even if such is the case, there are advantages to be able to spawn new environments independently of the production environment without cloning the database upstream:

  • Everything you need to create your website, minus the content, is under version control. The production database, being outside version control, should not be needed to install a new environment. See also “what is a deployment module?”.
  • New developers can be up and running with a predictable environment and dummy content.
  • Your automated tests, using Drupal’s Simpletest, by default deploy a new environment without cloning the database.
  • For predictable results in your continuous integration server, it is best to deploy a new envrionment. The production database is unpredictable and unversioned. If you test it, your test results will be unpredictable as well.
  • Maybe in the future you’ll need a separate version of your site with different data (for a new market, perhaps).

Even if you choose to clone the database upstream for development, testing and continuous integration, it is still a good idea to avoid referencing incremental IDs of a particular database, because at some point you might decide that it is important to be able to have environments with different databases.

Example #1: using node IDs in CSS and in template files

I have often seen this: particular pages (say, nodes 56 and 400) require particular markup, so we see template files like page--node--56.tpl.php and css like this:

.page-node-56 #content,
.page-node-400 #content {
   ...
}

When, as developers, we decide to use this type of code on a website, we are tightly coupling our code, which is under version control, to our database, which is not under version control. In other words our project as a whole can no longer be said to be versioned as it requires a database clone to work correctly.

Also, this creates all sorts of problems: if, for example, a new node needs to be created which has the same characteristics as nodes 56 and 400, one must fiddle with the database (to create the node) and the code. Also, creating automatic tests for something like this is hard because the approach is not based on underlying logic.

A better approach to this problem might be to figure out why nodes 56 and 400 are somehow different than the others. The solution will depend on your answer to that question, and maybe these nodes need to be of a different content type; or maybe some other mechanism should be used. In all cases, though, their ID should be irrelevant to their specificity.

Example #2: filtering a view by taxonomy tag

You might have a website which uses Drupal’s default implementation of articles, with a tag taxonomy field. You might decide that all articles tagged with “blog” should appear in your blog, and you might create a new view, filtered to display all articles with the “blog” tag.

Now, you might export your view into a feature and, perhaps, make your feature a dependency of a site deployment module (so that enabling this module on a new environment will deploy your blog feature, and do everything else necessary to make your site unique, such as enabling the default theme, etc.).

It is important to understand that with this approach, you are in effect putting an incremental ID into code. You view is in fact filtering by the ID of the “blog” taxonomy term as it happens to exist on the site used to create the view. When creating the view, we have no idea what this ID is, but we are saying that in order for our view to work, the “blog” taxonomy term needs to be identical on all environments.

Here is an example of how this bug will play out:

  • This being the most important feature of your site, when creating new environments, the “blog” taxonomy term might always have the ID 1 because it is the first taxonomy term created; you might also be in the habit of cloning your database for new environments, in which case the problem will remain latent.
  • You might decide that such a feature is too “simple” to warrant automated testing; but even if you do define an automated test, your test will run on a new database and will need to create the “blog” taxonomy term in order to validate. Because your tests are separate and simple, the “blog” taxonomy term is probably the only term created during testing, so it, too will have ID 1, and thus your test will pass.
  • Your continuous integration server which monitors changes to your versioned code will run tests against every push, but, again, on a new database, so your tests will pass and your code will be fine.

This might go on for quite some time until, on a given environment, someone decides to create another term before creating the “blog” term. Now the “blog” term will have ID #2 which will break your feature.

Consider, furthermore, that your client decides to create a new view for “jobs” and use the same tag mechanism as for the blog; and perhaps other tags as well. Before long, your entire development cycle becomes dependent on database cloning to work properly.

To come up with a better approach, it is important to understand what we are trying to accomplish; and what taxonomy terms are meant to be used for:

  • The “blog” category here is somehow, logically, immutable and means something very specific. Furthermore, the existence of the blog category is required for our site. Even if its name changes, the key (or underlying identity) of the blog category should always be the same.
  • Taxonomy terms are referenced with incremental IDs (like nodes) and thus, when writing our code, their IDs (and even their existence) cannot be counted upon.

In this case, we are using taxonomy terms for the wrong purpose. Taxonomy terms, like nodes, are meant to be potentially different for each environment: our code should not depend on them.

A potential solution in this case would be to create a new field for articles, perhaps a multiple selection field, with “blog” as one of the possible values. Now, when we create a view filtered by the value “blog” in our new field, we are no longer referencing an incremental ID in our code.

I myself made this very mistake with my own website code without realizing it. The code for this website (the one you are reading) is available on Github and the issue for this problem is documented here (I’ll try to get around to fixing it soon!).

Deploying a fix to an existing site

If you apply these practices from the start of a project, it is relatively straightforward. However, what if a site is already in production with several articles already labelled “blog” (as is the case on the Dcycle website itself)? In this case we need to incrementally deploy the fix. For this, a site deployment module can be of use: in your site deployment module’s .install file, you can add a new update hook to update all your existing articles labelled “blog”, something like:

/**
 * Use a machine name rather than an incremental ID to display blog items.
 */
function mysite_deploy_update_7010() {
  // deploy the new version of the view to the target site
  features_revert(array('mysite_feature' => array('views_view')));
  ...
  // cycle through your nodes and add "blog" to your new field for any
  // content labelled "blog".
}

Of course, you need to test this first with a clone of your production site, perhaps even adding an automatic test to make sure your function works as expected. Also, if you have a lot of nodes, you might need to use the “sandbox” feature of hook_update_n(), to avoid timeouts.

Once all is tested, all that needs to be done, on each environment (production, every developer’s laptop, etc.), is run drush updb -y on the command line.

Conclusion

Drupal makes it very easy to mix incremental IDs into views and code, and this will work well if you always use the same database on every environment. However, you will quickly run into problems if you want to write automated tests or deploy new sites without cloning the database. Being aware of this can help you write more logical, consistent and predictable code.

Please enable JavaScript to view the comments powered by Disqus.

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web