Apr 07 2019
Apr 07

April 07, 2019

Accessibility tests can be automated to a degree, but not completely; to succeed at accessibility, it needs to be a mindset shared by developers, UX and front-end folks, business people and other stakeholders. In this article, we will attempt to run tests and produce meaningful metrics which can help teams who are already committed to produce more accessible websites.

Premise

Say your team is developing a Drupal 8 site and you have decided that you want to reduce its accessibility issues by 50% over the course of six months.

In this article, we will look at a subset of accessibility issues which can be automatically checked – color contrast, placement of tags and HTML attributes, for example. Furthermore, we will only test the code itself with some dummy data, not actual live data or environment. Therefore, if you use the approach outlined in this article, it is best to do so within a global approach which includes stakeholder training; and automated and manual monitoring of live environments, all of which are outside the scope of this article.

Approach

Your team is probably perpetually “too busy” to fix accessibility issues; and therefore too busy to read and process reports with dozens, perhaps hundreds, of accessibility problems on thousands of pages.

Instead of expecting teams to process accessibility reports, we will use a threshold approach:

First, determine a standard towards which you’d like to work, for example WCAG 2.0 AA is more stringent than WCAG 2.0 A (but if you’re working on a U.S. Government website, WCAG 2 AA is mandated by the Americans with Disabilities Act). Be realistic as to the level of effort your team is ready to deploy.

Next (we’ll see how to do this later), figure out which pages you’d like to test against: perhaps one article, one event page, the home page, perhaps an internal page for logged in users.

In this article, to keep things simple, we’ll test for:

  • the home page;
  • an public-facing internal page, /node/1;
  • the /user page for users who are logged in;
  • the node editing form at /node/1/edit (for users who are logged in, obviously).

Running accessibility checks on each of the above pages, we will end up with our baseline threshold, the current number of errors, for example this might be:

  • 6 for the home page
  • 6 for /node/1
  • 10 for /user
  • 10 for /node/1/edit

We will then make our tests fail if there are more errors on a given page than we allow for. The test should pass at first, and this approach meets several objectives:

  • First, have an idea of the state of your site: are there 10 accessibility errors on the home page, or 1000?
  • Fail immediately if a developer opens a pull request where the number of accessibility errors increases past the threshold for any given page. For example, if a widget is added to the /user page which makes the number of accessibility errors jump to 12 (in this example), we should see a failure in our continuous integration infrastructure because 12 >= 10.
  • Provide your team with the tools to reduce the threshold over time. Concretely, a discussion with all stakeholders can be had once the initial metrics are in place; a decision might be made that we want to reduce thresholds for each page by 50% within 6 months. This allows your technical team to justify the prioritization of time spent on accessibility fixes vs. other tasks seen by able-bodied stakeholders as having a more direct business value.

Principles

Principle #1: Docker for everything

Because we want to run tests on a continuous integration server, we want to avoid dependencies. Specifically, we want a system which does not require us to install specific versions of MySQL, PHP, headless browsers, accessibility checkers, etc. All our dependencies will be embedded into our project using Docker and Docker Compose. That way, all you need to install in order to run your project and test for accessibility (and indeed other tests) is Docker, which in most cases includes Docker Compose.

Principle #2: A starter database

In our continous integration setup, will will be testing our code on every commit. Although it can be useful to test, or monitor, a remote environment such as the live or staging site, this is not what this article is about. This means we need some way to include dummy data into our codebase. We will do this by adding dummy data into a “starter database” committed to version control. (Be careful not to rely on this starter database to move configuration to the production site – use configuration management for that – we only want to store dummy data in our starter database; all configuration should be in code.) In our example, our starter database will contain node/1 with some realistic dummy data. This is required because as part of our test we want to run accessibility checks agains /node/1 and /node/1/edit.

A good practice during development would be that for new data types, say a new content type “sandwich”, a new version of the starter database be created with, say, node/2 of type “sandwich”, with realistic data in all its fields. This will allow us to add an accessibility test for /node/2, and /node/2/edit if we wish.

Don’t forget, as per principle #1, above, you will never need to install anything other than Docker on your computer or CI server, so don’t attempt to install these tools locally, they will run on Docker containers which will be built automatically for you.

  • Pa11y: There are dozens of tools to check for accessibility; in this article we’ve settled on Pa11y because it provides clear error reports; and allows the concept of a threshold above which the script fails.
  • Chromium: In order to check a page for accessibility issues without actually having a browser open, a so-called headless browser is needed. Chromium is a fully functional browser which works on the command line and can be scripted. This works under the hood and you will have no need to install it or interact with it directly, it’s just good to know it’s there.
  • Puppeteer: most accessibility tools, including Pa11y, are good at testing one page. Say, if you point Pa11y to /node/1 or the home page, it will generate nice reports with thresholds. However if you point Pa11y to /user or /node/1/edit it will see those pages anonymously, which is not what we want to test. This is where Puppeteer, a browser scripting tool, comes into play. We will use Puppeteer later on to log into our site and save the markup of /user and /node/1/edit as /dom-captures/user.html and /dom-captures/node-1-edit.html, respectively, which will then allow Pa11y to access and test those paths anonymously.
  • And of course, Drupal 8, although you could apply the technique in this article to any web technology, because our accessibility checks are run against the web pages just like an end user would see them; there is no interaction with Drupal.

Setup

To follow along, you can install and start Docker Desktop and download the Dcycle Drupal 8 starterkit.

git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/deploy.sh

You are also welcome to fork the project and link it to a free CircleCI account, in which case continuous integration tests should start running immediately on every commit.

A few minutes after running ./scripts/deploy.sh, you should see a login link to a full Drupal installation on a random local port (for example http://0.0.0.0:32769) with some dummy data (/node/1). Deploying this site locally or on a CI server such as Circle CI is a one-step, one-dependency process.

In the rest of this article we will refer to this local environment as http://0.0.0.0:YOUR_PORT; always substitute your own port number (in our example 32769) for YOUR_PORT.

Introducing Pa11y

We will use a Dockerized version of Pa11y, dcycle/pa11y, here is how it works against, say, amazon.com:

docker run --rm dcycle/pa11y:1 https://amazon.com

No site that I know of has zero accessibility issues; so you’ll see a bunch of issues in this format:

• Error: This element's role is "presentation" but contains child elements with semantic meaning.
  ├── WCAG2AA.Principle1.Guideline1_3.1_3_1.F92,ARIA4
  ├── #navFooter > div:nth-child(2)
  └── <div class="navFooterVerticalColumn navAccessibility" role="presentation"><div class="navFooterVerticalRo...</div>

Running Pa11y against a local site

Developers and continuous integration servers will need to run Pa11y against a local site. We would be tempted to run Pa11y on 0.0.0.0:YOUR_PORT, but that won’t work because Pa11y is being run inside its own container and will not have access to the host machine. You could give it access, but that raises another issue: the port is not guaranteed to be the same at every run, which requires ugly logic to figure out the port. Ugh! Instead, we will attach Pa11y to the Docker network used by our Starter site, in this case called starterkit_drupal8site_default (you can use docker network ls to list networks). Because our docker-compose.yml file defines the Drupal container as having the name drupal and port 80 (the default port), we can now run:

docker run --network starterkit_drupal8site_default \
  --rm dcycle/pa11y:1 http://drupal

This has some errors, just as we expected. Before doing anything else, type echo $?; this will give a non-zero code, meaning running this will make your continuous integration script fail. However, because we decided earlier that we will tolerate, for now, 6 errors on the home page, let’s set a threshold of 6 (or however many errors you get – there are 6 at the time of this writing) instead of the default zero:

docker run --network starterkit_drupal8site_default \
  --rm dcycle/pa11y:1 http://drupal --threshold 6

If you run echo $? right after, you should get the “passing” exit code of zero. There, we’ve met our threshold, so we will not have a failure!

How about pages where you need to be logged in?

The above solution breaks down, though, when you want to test http://drupal/node/1/edit. Although it will produce results, what we are actually checking against here is the “Access denied” page, not /node/1/edit when we are logged in. We will approach this in the following way:

  • Set a random password for user 1;
  • Use Puppeteer (see “Tools”, above) to click around your local site with its dummy data, do whatever you want to, and, every step of the way, save the DOM (the document object model, or the current markup after it has been processed by Javascript) as a temporary flat file, named, say, http://drupal/dom-captures/user.html;
  • Use Pa11y to test the temporary file we just created.

Putting it all together

In our Drupal 8 Starterkit, we can test the entire process. Start by running the Puppeteer script:

./scripts/end-to-end-tests.sh

What does this look like?

Astute readers have realized that using Puppeteer to click through the site to create our dom captures has the added benefit of confirming that our site functionality works as expected, which is why I called the script end-to-end-tests.sh.

To confirm this actually worked, you can visit, in an incognito window:

Yes it looks like you’re logged in, but you are not: these are anonymous webpages which Pa11y can check.

So if this worked correctly (and it should, because we hav it under continuous integration), we can run our Pa11y tests agains all these pages:

./scripts/a11y-tests.sh
echo $?

You will see the errors, but because the number of errors is below our threshold, the exit code will be zero, allowing our Continuous integration tests to pass.

Conclusion

Making a site accessible is, in my opinion, akin to making a site secure: it is not something to add to a to-do list, but rather an approach including all site stakeholders. Neither is accessibility something which can be automated; it really is a team culture. However, approaches like the one outlined in this article, or whatever works in your organization, will give teams metrics to facilitate the integration of accessibility into their day-to-day operations.

Please enable JavaScript to view the comments powered by Disqus.

Mar 14 2019
Mar 14

March 14, 2019

Often, during local Drupal development (or if we’re really unlucky, in production), we get the dreaded message, “Unable to send e-mail. Contact the site administrator if the problem persists.”

This can make it hard to debug anything email-related during local development.

Enter Mailhog

Mailhog is a dummy SMTP server with a browser GUI, which means you view all outgoing messages with a Gmail-type interface.

It is a major pain to install, but we can automate the entire process with the magic of Docker.

Let’s see how it works, and discuss after. Follow along by installing Docker Desktop – no other dependencies are required – and installing a Drupal 8 starterkit:

git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/deploy.sh

This will install the following Docker containers: a MySQL server with a starter database, a configured Drupal site, and Mailhog. You wil see something like this at the end of the output:

If all went well you can now access your site at:

=> Drupal: http://0.0.0.0:32791/user/reset/...
=> Dummy email client: http://0.0.0.0:32790

You might be seeing different port numbers instead of 32791 and 32790, so use your own instead of the example ports.

Now, the magic

(In my example, DRUPAL_PORT is 32791 and MAILHOG_PORT is 32790. In your case it will probably be different.)

As you can see, all emails produced by Drupal are now visible on a cool GUI!

So how does it work?

A dedicated “Mailhog” docker container, using on the Mailhog Docker image is defined in our docker-compose.yml file. It exposes port 8025 for public GUI access, which is mapped to a random unused port on the host computer (in the above example, 32790). Port 1025 is the SMTP mailhog port as you can see in the Mailhog Dockerfile. We are not mapping port 1025 to a random port on the host computer because it’s only needed in the Drupal container, not the host machine.

In the same docker-compose.yml, the “drupal” container (service) defines a link to the “mail” service; this means that when you are inside the Drupal container, you can access Mailhog SMPT server “mail” at port 1025.

In the Starterkit’s Dockerfile, we download the SMTP modules, and in our configuration, we install SMTP (0, in this case, is the module’s weight, it doesn’t mean “disabled”!).

Next, configuration: because this is for local development, we are leaving SMTP off in the exported configuration; in production we don’t want SMTP to link to Mailhog. Then, in our overridden settings, we enable SMTP and set the server to “mail” and the port to 1025.

Now, you can debug sent emails in a very realistic way!

You can remove the starterkit environment by running:

docker-compose down -v

Please enable JavaScript to view the comments powered by Disqus.

Oct 27 2018
Oct 27

October 27, 2018

This article discusses how to use HTTPS for local development if you use Docker and Docker Compose to develop Drupal 7 or Drupal 8 (indeed any other platform as well) projects. We’re assuming you already have a technique to deploy your code to production (either a build step, rsync, etc.).

In this article we will use the Drupal 8 site starterkit, a Docker Compose-based Drupal application that comes with everything you need to build a Drupal site with a few commands (including local HTTPS); we’ll then discuss how HTTPS works.

If you want to follow along, install and launch the latest version of Docker, make sure ports 80 and 443 are not used locally, and run these commands:

cd ~/Desktop
git clone https://github.com/dcycle/starterkit-drupal8site.git
cd starterkit-drupal8site
./scripts/https-deploy.sh

The script will prompt you for a domain (for example my-website.local) to access your local development environment. You might also be asked for your password if you want the script to add “127.0.0.1 my-website.local” to your /etc/hosts file. (If you do not want to supply your password, you can add that line to /etc/hosts before running ./scripts/https-deploy.sh).

After a few minutes you will be able to access a Drupal environment on http://my-website.local and https://my-website.local. For https, you will need to explicitly accept the certificate in the browser, because it’s self-signed.

Troubleshooting: if you get a connection error, try using an incongnito (private) window in your browser, or a different browser.

Being a security-conscious developer, you probably read through ./scripts/https-deploy.sh before running it on your computer. If you haven’t, you are encouraged to do so now, as we will be explaining how it works in this article.

You cannot use Let’s Encrypt locally

I often see questions related to setting up Let’s Encrypt for local development. This is not possible because the idea behind Let’s Encrypt is to certify that you own the domain on which you’re working; because no one uniquely owns localhost, or my-project.local, no one can get a certificate for it.

For local development, the Let’s Encrypt folks suggest using trusted, self-signed certificates instead, which is what we are doing in our script.

(If you are interested in setting up Let’s Encrypt for a publicly-available domain, this article is not for you. You might be interested, instead, in Letsencrypt HTTPS for Drupal on Docker and Deploying Letsencrypt with Docker-Compose.)

Make sure your project works without https first

So let’s look at how the ./scripts/https-deploy.sh script we used above works.

Let’s start by making sure our project works without https, then add a https access in a separate container.

In our starterkit project, you can run:

./scripts/deploy.sh

At the end of that scripts, you will see something like:

If all went well you can now access your site at:

 => http://0.0.0.0:32780/user/reset/...

Docker is serving our application using a random non-secure port, in this case 32780, and mapping it to port 80 on our container.

If you use Docker Compose for local development, you might have several applications running at the same time on different host ports, all mapped to port 80 on their respective container. At the end of this article you should be able to see each of them on port 443, something like:

The secret to all your local projects sharing port 443 is a reverse proxy container which receives requests to port 443, and indeed port 80 also, and acts as a sort of traffic cop to direct traffic the appropriate container.

That is why your individual projects should not directly use ports 80 and/or 443.

Adding an Nginx proxy container in front of your project’s container

An oft-seen approach to making your project available locally via HTTPS is to fiddle with your Dockerfile, installing openssl, setting up the certificate there; and rebuilding your container. This can work, but I would argue that it has significant drawbacks:

  • If you have several projects running on https port 443 locally, you could only develop one at a time because you only have one 443 port on your host machine.
  • You would need to maintain the SSL portion of your code for each of your projects.
  • It would go against the principle of separation of concerns which makes containers so robust.
  • You would be reinventing the wheel: there’s already a well-maintained Nginx proxy image which does exactly what you want.
  • Your job as a software developer is not to set up SSL.
  • If you decide to deploy your project to production Kubernetes cluster, it would longer makes sense for each of your Apache containers to support SSL.

For all those reasons, we will loosely couple our project with the act of serving it via HTTPS; we’ll leave our project alone and place an Nginx proxy in front of it to deal with the SSL/HTTPS portion of our local deployment.

Local https for one or more running projects

In this example we set up only one starterkit application, but real-world developers often need HTTPS with more than one application. Because you only have one local 443 port for HTTPS, We need a way to differentiate between our running applications.

Our approach will be for each of our projects to have an assigned local domain. This is why the https script we used in our example asked you to choose a domain like starterkit-drupal8.local.

Our script stored this information in the .env file at the root or your project, and also made sure it resolves to localhost in your /etc/hosts file.

Launching the Nginx reverse proxy

To me the terms “proxy” and “reverse proxy” are not intuitive. I’ll try to demystify them here.

The term “proxy” means something which represents something else; that term is already widely used to denote a web client being hidden from the user. So, a server might deliver content to a proxy which then delivers it to the end user, thereby hiding the end user from the server.

In our case we want to do the reverse: the client (you) is not placing a proxy in front of it; rather the application is placing a proxy in front of it, thereby hiding the project server from the browser: the browser communicates with Nginx, and Nginx communicates with your project.

Hence, “reverse proxy”.

Our reverse proxy uses a widely used and well-maintained GitHub project. The script you used earlier in this article launched a container based on that image.

Linking the reverse proxy to our application

With our starterkit application running on a random port (something like 32780) and our nginx proxy application running on ports 80 and 443, how are the two linked?

We now need to tell our Nginx proxy that when it receives a request for domain starterkit-drupal8.local, it should display our starterkit application.

There are a few steps to this, most handled by our script:

  • Your project’s docker-compose.yml file should look something like this: it needs to contain the environment variable VIRTUAL_HOST=${VIRTUAL_HOST}. This takes the VIRTUAL_HOST environment variable that our script added to the ./.env file, and makes it available inside the container.
  • Our script assumes that your project contains a ./scripts/deploy.sh file, which deploys our project to a random, non-secure port.
  • Our script assumes that only the Nginx Proxy container is published on ports 80 and 443, so if these ports are already used by something else, you’ll get an error.
  • Our script appends VIRTUAL_HOST=starterkit-drupal8.local to the ./.env file.
  • Our script attempts to add 127.0.0.1 starterkit-drupal8.local to our /etc/hosts file, which might require a password.
  • Our script finds the network your project is running on locally (all Docker-compose projects run on their own local named network), and gives the reverse proxy accesss to it.

That’s it!

You should now be able to access your project locally with https://starterkit-drupal8.local (port 443) and http://starterkit-drupal8.local (port 80), and apply this technique to any number of Docker Compose projects.

Troubleshooting: if you get a connection error, try using an incongnito (private) window in your browser, or a different browser; also note that you need to explicitly trust the certificate.

You can copy paste the script to your Docker Compose project at ./scripts/https-deploy.sh if:

  • Your ./docker-compose.yml contains the environment variable VIRTUAL_HOST=${VIRTUAL_HOST};
  • You have a script, ./scripts/deploy.sh, which launches a non-secure version of your application on a random port.

Happy coding!

Please enable JavaScript to view the comments powered by Disqus.

Oct 05 2018
Oct 05

October 05, 2018

I recently ran into a series of weird issues on my Acquia production environment which I traced back to some code I deployed which depended on my site being served securely using HTTPS.

Acquia Staging environments don’t use HTTPS by default and require you to install SSL certificates using a tedious manual process, which in my opinion is outdated, because competitors such as Platform.sh and Pantheon, Aegir, even Github pages support lots of automation around HTTPS using Let’s Encrypt.

Anyhow, because staging did not have HTTPS, I could not test some code I deployed, which ended up costing me an evening debugging an outage on a production environment. (Any difference between environments will eventually result in an outage.)

I found a great blog post which explains how to set up Let’s Encrypt on Acquia environments, Installing (FREE) Let’s Encrypt SSL Certificates on Acquia, by Chris at Redfin solutions, May 2, 2017. Although the process is very well documented, I made some tweaks:

  • First, I prefer using Docker-based solutions rather than install softward on my computer. So, instead of install certbot on my Mac, I opted to use the Certbot Docker Image, this has two advantages for me: first, I don’t need to install certbot on every machine I use this script on; and second, I don’t need to worry about updating certbot, as the Docker image is updated automatically. Of course, this does require that you install Docker on your machine.
  • Second, I automated everything I could. This result in this gist (a “gist” a basically a single file hosted on Github), a script which you can install locally.

Running the script

When you put the script locally on your computer (I added it to my project code), at, say ./scripts/set-up-letsencrypt-acquia-stage.sh, and run it:

  • the first time you run it, it will tell you where to put your environment information (in ./acquia-stage-letsencrypt-environments/environment-my-acquia-project-one.source, ./acquia-stage-letsencrypt-environments/environment-my-acquia-project-two.source, etc.), and what to put in those files.
  • the next time you run it, it will automate what it can and tell you exactly what you need to do manually.

I tried this and it works for creating new certs, and should work for renewals as well!

Please enable JavaScript to view the comments powered by Disqus.

Apr 07 2018
Apr 07

April 07, 2018

The process documented process for setting up a local environment and running tests locally is, in my opinion, so complex that it can be a barrier to even determined developers.

For those wishing to locally test and develop core patches, I think it is possible to automate the process down to a few steps and few minutes; here is an example with a core issue, #2273889 Don’t use one language’s plural index formula with another language’s string in the case of untranslated strings using format_plural(), which, at the time of this writing, results in the number 0 being displayed as 1 in certain cases.

Is it possible to start useful local development on this within 10 minutes on a computer with nothing installed except Docker? Let’s try…

Step 1: install Docker

Install and launch Docker. Everything we need, Apache web server, MySql server, Drush, Drupal, will reside on Docker containers, so we won’t need to install anything locally except Docker.

Step 2: launch a dev environment

I have create a project hosted on GitHub which will help you set up everything you need in Docker contains without local dependencies other than Docker, or any manual steps. Set it up by running:

git clone https://github.com/dcycle/drupal8_core_dev_helper.git && \
  cd drupal8_core_dev_helper && \
  ./scripts/deploy.sh`

This will create everything you need: a webserver container and database container, and your Drupal core code which will be placed in ./drupal8_core_dev_helper/drupal; near the end of the output of ./scripts/deploy.sh, you will see a login link to your development environment. Confirm you can access that local development environment at an address like http://0.0.0.0:SOME-PORT. (The port is random.)

The first time you run this, it will have to download Docker images with Drupal, MySQL, and install everything you need for local development. Future runs will be a lot faster.

See the project’s README for more details.

In your dev environment, you can confirm that the problem exists (provided the issue has not yet been fixed) by following the instructions in the “To reproduce this problem:” section of the issue description on your local development environment.

Any calls to drush can be run on the Docker container like so:

docker-compose exec drupal /bin/bash -c 'drush ...'

For example:

docker-compose exec drupal /bin/bash -c 'drush en locale language -y'

If you want to run drush directly, you can connect to your container like so:

docker-compose exec drupal /bin/bash

This will result in the following prompt on the container:

[email protected]:/var/www/html#

Now you can run drush commands directly on the container:

drush eval "print_r(\Drupal::translation()->formatPlural(0, '1 whatever', '@count whatevers', array(), array('langcode' => 'fr')) . PHP_EOL);"

Because the drupal8_core_dev_helper project also pre-installs devel on your environment, you can also confirm the problem exists by visiting /devel/php and executing:

dpm((string) (\Drupal::translation()->formatPlural(0, '1 whatever', '@count whatevers', array(), array('langcode' => 'fr'))));

Whether you do this by Drush or /devel/php, the result should be the same if the issue has not been resolved: 1 whatever instead of 0 whatevers.

Step 3: get a local version of the patch and apply it

In this example, we’ll look at the patch in comment #32 of our formatPlural issue, referenced above. If the issue has been resolved since this blog post has been written, follow along with another patch.

cd drupal8_core_dev_helper
curl https://www.drupal.org/files/issues/2018-04-07/2273889-31-core-8.5.x-plural-index-no-test.patch -O
cd ./drupal && patch -p1 < ../2273889-31-core-8.5.x-plural-index-no-test.patch

You have now patched your local version of Drupal. You can try the “0 whatevers” test again and the bug should be fixed.

Running tests

Now the real fun begins… and the “fast-track” ends.

For any patch to be considered for inclusion in Drupal core, it will need to (a) not break existing tests; and (b) provide a test which, without the patch, confirms that the problem exists.

Let’s head back to comment #32 of issue #2273889 and see if our patch is breaking anything. Clicking on “PHP 7 & MySQL 5.5 23,209 pass, 17 fail” will bring us to the test results page, which at first glance seems indecipherable. You’ll notice that our seemingly simple change to the PluralTranslatableMarkup.php file is causing a number of tests to fail: HelpEmptyPageTest, EntityTypeTest…

Let’s start by finding the test which is most likely to be directly related to our change by searching on the test results page for the string “PluralTranslatableMarkupTest” (this is name of the class we changed, with the word Test appended), which shows that it is failing:

Testing Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest
.E

We need to figure out where that file resides, by typing:

cd /path/to/drupal8_core_dev_helper/drupal/core
find . -name 'PluralTranslatableMarkupTest.php'

This tells us it is at ./tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php.

Because we have a predictable Docker container, we can relatively easily run this test locally:

cd /path/to/drupal8_core_dev_helper
docker-compose exec drupal /bin/bash -c 'cd core && \
  ../vendor/bin/phpunit \
  ./tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php'

You should now see the test results for only PluralTranslatableMarkupTest:

PHPUnit 6.5.7 by Sebastian Bergmann and contributors.

Testing Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest
.E                                                                  2 / 2 (100%)

Time: 16.48 seconds, Memory: 6.00MB

There was 1 error:

1) Drupal\Tests\Core\StringTranslation\PluralTranslatableMarkupTest::testPluralTranslatableMarkupSerialization with data set #1 (2, 'plural 2')
Error: Call to undefined method Mock_TranslationInterface_4be32af3::getStringTranslation()

/var/www/html/core/lib/Drupal/Core/StringTranslation/PluralTranslatableMarkup.php:150
/var/www/html/core/lib/Drupal/Core/StringTranslation/PluralTranslatableMarkup.php:121
/var/www/html/core/tests/Drupal/Tests/Core/StringTranslation/PluralTranslatableMarkupTest.php:31

ERRORS!
Tests: 2, Assertions: 1, Errors: 1.

How to fix this, indeed whether this will be fixed, is a whole nother story, a story fraught with dependency injection, mock objects, method stubs… More an adventure, really, than a story. An adventure which deserves to be told, just not right now.

Please enable JavaScript to view the comments powered by Disqus.

Jan 24 2018
Jan 24

January 24, 2018

Here are a few things I learned about caching for REST resources.

There are probably better ways to accomplish this, but here is what works for me.

Let’s say we have a REST resource that looks something like this in .../my_module/src/Plugin/rest/resource/MyRestResource.php and we have enabled it using the Rest UI module and given anonymous users permission to view it:

<?php

namespace Drupal\my_module\Plugin\rest\resource;

use Drupal\rest\ResourceResponse;

/**
 * This is just an example.
 *
 * @RestResource(
 *   id = "this_is_just_an_example",
 *   label = @Translation("Display the title of node 1"),
 *   uri_paths = {
 *     "canonical" = "/api/v1/get"
 *   }
 * )
 */
class MyRestResource extends ResourceBase {

  /**
   * {@inheritdoc}
   */
  public function get() {
    $node = node_load(1);
    $response = new ResourceResponse(
      [
        'title' => $node->getTitle(),
        'time' => time(),
      ]
    );
    return $response;
  }

}

Now, we can visit http://example.localhost/api/v1/get?_format=json and we will see something like:

{"title":"Some Title","time":1516803204}

Reloading the page, ‘time’ stays the same. That means caching is working; we are not re-computing our Json output each time someone requests it.

How to invalidate the cache when the title changes.

If we edit node 1 and change its title to, say, “Another title”, and reload http://example.localhost/api/v1/get?_format=json, we’ll see the old title. To make sure the cache is invalidated when this happens, we need to provide cacheability metadata to our response telling it when it needs to be recomputed.

Our node, when it’s loaded, contains within it all the caching metadata needed to describe when it should be recomputed: when the title changes, when new filters are added to the text format that’s being used, etc. We can add this information to our ResourceResponse like this:

...
$response->addCacheableDependency($node);
return $response;
...

When we clear our cache with drush cr and reload our page, we’ll see something like:

{"title":"Another title","time":1516804411}

Even more fun is changing the title of node 1 and reloading our Json page, and seeing the title and time change without clearing the cache:

{"title":"Yet another title","time":1516804481}

How to set custom cache invalidation events

Let’s say you want to trigger a cache rebuild for some reason other than those defined by the node itself (title change, etc.).

A real-world example might be events: an “upcoming events” page should only display events which start later than now. If we invalidate the cache every day, then we’ll never show yesterday’s events in our events feed. Here, we need to add our custom cache invalidation event, in this case “rebuild events feed”.

For the purpose of this demo, we won’t actually build an events feed, but we’ll see how cron might be able to trigger cache invalidation.

Let’s add the following code to our response:

...
use Drupal\Core\Cache\CacheableMetadata;
...
$response->addCacheableDependency($node);
$response->addCacheableDependency(CacheableMetadata::createFromRenderArray([
  '#cache' => [
    'tags' => [
      'rebuild-events-feed',
    ],
  ],
]));
return $response;
...

This uses Drupal’s cache tags concept and tells Drupal that when the cache tag ‘rebuild-events-feed’ is invalidated, all cacheable responses which have that cache tag should be invalidated as well. I prefer this to the ‘max-age’ cache tag because it allows us more fine-grained control over when to invalidate our caches.

On cron, we could only invalidate ‘rebuild-events-feed’ if events have passed since our last invalidation of that tag, for example.

For this example, we’ll just invalidate it manually. Clear your cache to begin using the new code (drush cr), then load the page, you will see something like:

{"hello":"Yet another title","time":1516805677}

As always, the time remains the same no matter how many times you reload the page.

Let’s say you are in the midst of a cron run and you have determined that you need to invalidate your cache for response which have the cache tag ‘rebuild-events-feed’, you can run:

\Drupal::service('cache_tags.invalidator')->invalidateTags(['rebuild-events-feed'])

Let’s do it in Drush to see it in action:

drush ev "\Drupal::service('cache_tags.invalidator')->\
  invalidateTags(['rebuild-events-feed'])"

We’ve just invalidated our ‘rebuild-events-feed’ tag and, hence, Responses that use it.

This one is beyond my competence level, but I wanted to mention it anyway.

Let’s say you want to output your node’s URL to Json, you might consider computing it using $node->toUrl()->toString(). This will give us “/node/1”.

Let’s add it to our code:

...
'title' => $node->getTitle(),
'url' => $node->toUrl()->toString(),
'time' => time(),
...

This results in a very ugly error which completely breaks your site (at least at the time of this writing): “The controller result claims to be providing relevant cache metadata, but leaked metadata was detected. Please ensure you are not rendering content too early.”.

The problem, it seems, is that Drupal detects that the URL object, like the node we saw earlier, contains its own internal information which tells it when its cache should be invalidated. Converting it to a string prevents the Response from being informed about that information somehow (again, if someone can explain this better than me, please leave a comment), so an exception is thrown.

The ‘toString()’ function has an optional parameter, “$collect_bubbleable_metadata”, which can be used to get not just a string, but also information about when its cache should be invalidated. In Drush, this will look like something like:

drush ev 'print_r(node_load(1)->toUrl()->toString(TRUE))'
Drupal\Core\GeneratedUrl Object
(
    [generatedUrl:protected] => /node/1
    [cacheContexts:protected] => Array
        (
        )

    [cacheTags:protected] => Array
        (
        )

    [cacheMaxAge:protected] => -1
    [attachments:protected] => Array
        (
        )

)

This changes the return type of toString(), though: toString() no longer returns a string but a GeneratedUrl, so this won’t work:

...
'title' => $node->getTitle(),
'url' => $node->toUrl()->toString(TRUE),
'time' => time(),
...

It gives us the error “Could not normalize object of type Drupal\Core\GeneratedUrl, no supporting normalizer found”.

ohthehugemanatee commented on Drupal.org on how to fix this. Integrating his suggestion, our code now looks like:

...
$url = $node->toUrl()->toString(TRUE);
$response = new ResourceResponse(
  [
    'title' => $node->getTitle(),
    'url' => $url->getGeneratedUrl(),
    'time' => time(),
  ]
);
$response->addCacheableDependency($node);
$response->addCacheableDependency($url);
...

This will now work as expected.

With all the fun we’re having, though, let’s take this a step further, let’s say we want to export the feed of frontpage items in our Response:

$url = $node->toUrl()->toString(TRUE);
$view = \Drupal\views\Views::getView("frontpage"); 
$view->setDisplay("feed_1");
$view_render_array = $view->render();
$rendered_view = render($view_render_array);

$response = new ResourceResponse(
  [
    'title' => $node->getTitle(),
    'url' => $url->getGeneratedUrl(),
    'view' => $rendered_view,
    'time' => time(),
  ]
);
$response->addCacheableDependency($node);
$response->addCacheableDependency($url);
$response->addCacheableDependency(CacheableMetadata::createFromRenderArray($view_render_array));

You will not be surpised to see the “leaked metadata was detected” error again… In fact you have come to love and expect this error at this point.

Here is where I’m completely out of my league; according to Crell, “[i]f you [use render() yourself], you’re wrong and you should fix your code “, but I’m not sure how to get a rendered view without using render() myself… I’ve implemented a variation on a comment on Drupal.org by mikejw suggesting using different render context to prevent Drupal from complaining.

$view_render_array = NULL;
$rendered_view = NULL;
\Drupal::service('renderer')->executeInRenderContext(new RenderContext(), function () use ($view, &$view_render_array, &$rendered_view) {
  $view_render_array = $view->render();
  $rendered_view = render($view_render_array);
});

If we check to make sure we have this line in our code:

$response->addCacheableDependency(CacheableMetadata::createFromRenderArray($view_render_array));

we’re telling our Response’s cache to invalidate whenever our view’s cache invaliates. So, for example, if we have several nodes promoted to the front page in our view, we can modify any one of them and our entire Response’s cache will be invalidated and rebuilt.

Resources and further reading

Please enable JavaScript to view the comments powered by Disqus.

Dec 18 2017
Dec 18

December 18, 2017

I recently needed to port hundreds of Drupal 7 webforms with thousands of submissions from Drupal 7 to Drupal 8.

My requirements were:

  • Node ids need to remain the same
  • Webforms need to be treated as data: they should be ignored by config export and import, just like nodes and taxonomy terms are. The reasonining is that in my setup, forms are managed by site editors, not developers. (This is not related to migration per se, but was a success criteria for my migration so I’ll document my solution here)

Migration from Drupal 7

I could not find a reliable upgrade or migration path from Drupal 7 to Drupal 8. I found webform_migrate lacks documentation (I don’t know where to start) and migrate_webform is meant for Drupal 6, not Drupal 7 as a source.

I settled on a my own combination of tools and workflows to perform the migration, all of them available on my Github account.

Using version 8.x-5.x of webform, I started by enabling webform, webform_node and webform_ui on my Drupal 8 site, this gives me an empty webform node type.

I then followed the instructions for a basic migration, which is outside the scope of this article. I have a project on Github which I use as starting point from my Drpual 6 and 7 to 8 migrations. The blog post Custom Drupal-to-Drupal Migrations with Migrate Tools, Drupalize.me, April 26, 2016 by William Hetherington provides more information on performing a basic migration of data.

Once you have set up your migration configurations as per those instructions, you should be able to run:

drush migrate-import upgrade_d7_node_webform --execute-dependencies

And you should see something like:

Processed 25 items (25 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_node_type'
Processed 11 items (11 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user_role'
Processed 0 items (0 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user_role'
Processed 95 items (95 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_user'
Processed 109 items (109 created, 0 updated, 0 failed, 0 ignored) - done with 'upgrade_d7_node_webform'

At this point I had all my webforms as nodes with the same node ids on Drupal 7 and Drupal 8, however this does nothing to import the actual forms or submissions.

Importing the data itself

I found that the most efficient way of importing the data was to create my own Drupal 8 module, which I have published on Dcycle’s Github account, called webform_d7_to_d8. (I have decided against publishing this on Drupal.org because I don’t plan on maintaining it long-term, and I don’t have the resources to combine efforts with existing webform migration modules.)

I did my best to make that module self-explanatory, so you should be able to follow the steps the README file, which I will summarize here:

Start by giving your Drupal 8 site access to your Drupal 7 database in ./sites/default/settings.php:

$databases['upgrade']['default'] = array (
  'database' => 'drupal7database',
  'username' => 'drupal7user',
  'password' => 'drupal7password',
  'prefix' => '',
  'host' => 'drupal7host',
  'port' => '3306',
  'namespace' => 'Drupal\\Core\\Database\\Driver\\mysql',
  'driver' => 'mysql',
);

Run the migration with our without options:

drush ev 'webform_d7_to_d8()'

or

drush ev 'webform_d7_to_d8(["nid" => 123])'

or

drush ev 'webform_d7_to_d8(["simulate" => TRUE])'

More detailed information can be found in the module’s README file.

Treating webforms as data

Once you have imported your webforms to Drupal 8, they are treated as configuration, that is, the Webform module assumes that developers, not site builders, will be creating the forms. This may be fine in many cases, however my usecase is that site editors want to create and edit forms directly on the production site, and we don’t want them to be tracked by the configuration management system.

Jacob Rockowitz pointed me in the right direction for making sure webforms are not treated as configuration. For that purpose I am using Drush CMI tools by Previous Next and documented on their blog post, Introducing Drush CMI tools, 24 Aug. 2016.

Once you install Drush CMI tools in your ~/.drush folder and run drush cc drush, you can use druch cexy and druch cimy instead of drush cim and drush cex in your conguration management process. Here is how and why:

Normally, if you develop your site locally and, say, add a content type or field, or remove a content type of field, you can run drush cex to export your newly created configuration. Then, your colleagues can pull your code and run drush cim to pull your configuration. drush cim can also be used in continuous integration, preproduction, dev, and production environments.

The problem is that drush cex exports all configuration, and drush cim deletes everything in the database which is not in configuration. In our case, we don’t want to consider webforms as configuration but as data, just as nodes as taxonomy terms: we don’t want them to be exported along with other configuration; and if they exist on a target environment we want to leave them as they are.

Using Drush CMI tools, you can add a file such as the following to ~/.drush/config-ignore.yml:

# See http://blog.dcycle.com/blog/2017-12-18
ignore:
  - webform.webform.*

This has to be done on all developers’ machines or, if you use Docker, on a shared Docker container (which is outside the scope of this article).

Now, for exporting configuration, run:

drush cexy --destination='/path/to/config/folder'

Now, webforms will not be exported along with other configuration.

We also need to avoid erasing webforms on target environments: if you create a webform on a target environment, then run drush cim, you will see something like:

webform.webform.webform_9521   delete
webform.webform.webform_8996   delete
webform.webform.webform_8991   delete
webform.webform.webform_8986   delete

So, we need to avoid deleting webforms on the target environment when we import configuration. We could just do drush cim --partial but this avoids deleting everything, not just webforms.

Drush CMI tools provides an alternative:

drush cimy --source=/path/to/config/folder

This works much like drush cim --partial, but it allows you to specify another parameter, –delete-list=/path/to/config-delete.yml

Then, in config-delete.yml, you can specify items that you actually want to delete on the target environment, for example content types, fields, and views which do not exist in code. This is dependent on your workflow and they way to set it up isdocumented on the Drush CMI tools project homepage.

With this in place, we’ll have our Drupal 7 webforms on our Drupal 8 site.

Please enable JavaScript to view the comments powered by Disqus.

Oct 03 2017
Oct 03

October 03, 2017

This article is about serving your Drupal Docker container, and/or any other container, via https with a valid Let’s encrypt SSL certificate.

Edit: if you’re having trouble with Docker-Compose, read this follow-up post.

Step one: make sure you have a public VM

To follow along, create a new virtual machine (VM) with Docker, for example using the “Docker” distribution in the “One-click apps” section of Digital Ocean.

This will not work on localhost, because in order to use Let’s Encrypt, you need to demonstrate ownership over your domain(s) to the outside world.

In this tutorial we will serve two different sites, one simple HTML site and one Drupal site, each using standard ports, on the same Docker host, using a reverse proxy, a container which sits in front of your other containers and directs traffic.

Step two: Set up two domains or subdomains you own and point them to your server

Start by making sure you have two domains which point to your server, in this example we’ll use:

  • test-one.example.com will be a simple HTML site.
  • test-two.example.com will be a Drupal site.

Step three: create your sites

We do not want to map our containers’ ports directly to our host ports using -p 80:80 -p 443:443 because we will have more than one app using the same port (the secure 443). Port mapping will be the responsibility of the reverse proxy (more on that later). Replace example.com with your own domain:

DOMAIN=example.com
docker run -d \
  -e "VIRTUAL_HOST=test-one.$DOMAIN" \
  -e "LETSENCRYPT_HOST=test-one.$DOMAIN" \
  -e "[email protected]$DOMAIN" \
  --expose 80 --name test-one \
  httpd
docker run -d \
  -e "VIRTUAL_HOST=test-two.$DOMAIN" \
  -e "LETSENCRYPT_HOST=test-two.$DOMAIN" \
  -e "[email protected]$DOMAIN" \
  --expose 80 --name test-two \
  drupal

Now you have two running sites, but they’re not yet accessible to the outside world.

Step three: a reverse proxy and Let’s encrypt

The term “proxy” means something which represents something else. In our case we want to have a webserver container which represents our Drupal and html containers. The Drupal and html containers are effectively hidden in front of a proxy. Why “reverse”? The term “proxy” is already used and means that the web user is hidden from the server. If it is the web servers that are hidden (in this case Drupal or the html containers), we use the term “reverse proxy”.

Let’s encrypt is a free certificate authority which certifies that you are the owner of your domain.

We will use nginx-proxy as our reverse proxy. Because that does not take care of certificates, we will use LetsEncrypt companion container for nginx-proxy to set up and maintain Let’s Encrypt certificates.

Let’s start by creating an empty directory which will contain our certificates:

mkdir "$HOME"/certs

Now, following the instructions of the LetsEncrypt companion project, we can set up our reverse proxy:

docker run -d -p 80:80 -p 443:443 \
  --name nginx-proxy \
  -v "$HOME"/certs:/etc/nginx/certs:ro \
  -v /etc/nginx/vhost.d \
  -v /usr/share/nginx/html \
  -v /var/run/docker.sock:/tmp/docker.sock:ro \
  --label com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy \
  --restart=always \
  jwilder/nginx-proxy

And, finally, start the LetEncrypt companion:

docker run -d \
  --name nginx-letsencrypt \
  -v "$HOME"/certs:/etc/nginx/certs:rw \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  --volumes-from nginx-proxy \
  --restart=always \
  jrcs/letsencrypt-nginx-proxy-companion

Wait a few minutes for "$HOME"/certs to be populated with your certificate files, and you should now be able to access your sites:

A note about renewals

Let’s Encrypt certificates last 3 months, so we generally want to renew every two months. LetsEncrypt companion container for nginx-proxy states that it automatically renews certificates which are set to expire in less than a month, and it checks this hourly, although there are some renewal-related issues in the issue queue.

It seems to also be possible to force renewals by running:

docker exec nginx-letsencrypt /app/force_renew

So it might be worth considering to be on the lookout for failed renewals and force them if necessary.

Edit: domain-specific configurations

I used this technique to create a Docker registry, and make it accessible securely:

docker run \
  --entrypoint htpasswd \
  registry:2 -Bbn username password > auth/htpasswd

docker run -d --expose 5000 \
  -e "VIRTUAL_HOST=mydomain.example.com" \
  -e "LETSENCRYPT_HOST=mydomain.example.com" \
  -e "[email protected]" \
  -e "REGISTRY_AUTH=htpasswd" \
  -e "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm" \
  -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \ 
  --restart=always -v "$PWD"/auth:/auth \
  --name registry registry:2

But when trying to push an image, I was getting “413 Request Entity Too Large”. This is an error with the nginx-proxy, not the Docker registry. To fix this, you can set domain-specific configurations, in this example we are allowing a maximum of 600M to be passed but only to the Docker registry at mydomain.example.com:

docker exec nginx-proxy /bin/bash -c 'cp /etc/nginx/vhost.d/default /etc/nginx/vhost.d/mydomain.example.com'
docker exec nginx-proxy /bin/bash -c 'echo "client_max_body_size 600M;" >> /etc/nginx/vhost.d/mydomain.example.com'
docker restart nginx-proxy

Enjoy!

You can now bask in the knowledge that your cooking blog will not be man-in-the-middled.

Please enable JavaScript to view the comments powered by Disqus.

Feb 28 2017
Feb 28

February 28, 2017

As the maintainer of Realistic Dummy Content, having procrastinated long and hard before releasing a Drupal 8 version, I decided to leave my (admittedly inelegant) logic intact and abstract away the Drupal 7 code, with the goal of plugging in Drupal 7 or 8 code at runtime.

Example original Drupal 7 code

// Some logic.
$updated_file = file_save($drupal_file);
// More logic.

Example updated code

Here is a simplified example of how the updated code might look:

// Some logic.
$updated_file = Framework::instance()->fileSave($drupal_file);
// More logic.

abstract class Framework {

  static function instance() {
    if (!$this->instance) {
      if (defined('VERSION')) {
        $this->instance = new Drupal7();
      }
      else {
        $this->instance = new Drupal8();
      }
    }
    return $this->instance;
  }

  abstract function fileSave($drupal_file);

}

class Drupal8 extends Framework {
  public function fileSave($drupal_file) {
    $drupal_file->save();
    return $drupal_file;
  }
}

class Drupal7 extends Framework {
  public function fileSave($drupal_file) {
    return file_save($drupal_file);
  }
}

Once I have defined fileSave(), I can simply replace every instance of file_save() in my legacy code with Framework::instance()->fileSave().

In theory, I can then identify all Drupal 7 code my module and abstract it away.

Automated testing

As long as I surgically replace Drupal 7 code such as file_save() with “universal” code such Framework::instance()->fileSave(), without doing anything else, without giving in the impulse of “improving” the code, I can theoretically only test Framework::instance()->fileSave() itself on Drupal 7 and Drupal 8, and as long as both versions are the same, my underlying code should work. My approach to automated tests is: if it works and you’re not changing it, there is no need to test it.

Still, I want to make sure my framework-specific code works as expected. To set up my testing environment, I have used Docker-compose to set up three containers: Drupal 7, Drupal 8; and MySQL. I then have a script which builds the sites, installs my module on each, then run a selftest() function which can test the abstracted function such as fileSave() and make sure they work.

This can then be run on a continuous integration platform such as Circle CI which generates a cool badge:

CircleCI

Extending to Backdrop

Once your module is structured in this way, it is relatively easy to add new related frameworks, and I’m much more comfortable releasing a Drupal 9 update in 2021 (or whenever it’s ready).

I have included experimental Backdrop code in Realistic Dummy Content to prove the point. Backdrop is a fork of Drupal 7.

abstract class Framework {
  static function instance() {
    if (!$this->instance) {
      if (defined('BACKDROP_BOOTSTRAP_SESSION')) {
        $this->instance = new Backdrop();
      }
      elseif (defined('VERSION')) {
        $this->instance = new Drupal7();
      }
      else {
        $this->instance = new Drupal8();
      }
    }
    return $this->instance;
  }
}

// Most of Backdrop's API is identical to D7, so we can only override
// what differs, such as fileSave().
class Backdrop extends Drupal7 {
  public function fileSave($drupal_file) {
    file_save($drupal_file);
    // Unlike Drupal 7, Backdrop returns a result code, not the file itself,
    // in file_save(). We are expecting the file object.
    return $drupal_file;
  }
}

Disadvantages of this approach

Having just released Realisic Dummy Content 7.x-2.0-beta1 and 8.x-2.0-beta1 (which are identical), I can safely say that this approach was a lot more time-consuming than I initially thought.

Drupal 7 class autoloading is incompatible with Drupal 8 autoloading. In Drupal 7, classes cannot (to my knowledge) use namespaces, and must be added to the .info file, like this:

files[] = includes/MyClass.php

Once that is done, you can define MyClass in includes/MyClass.php, then use MyClass anywhere you want in your code.

Drupal 8 uses PSR-4 autoloading with namespaces, so I decided to create my own autoloader to use the same system in Drupal 7, something like:

spl_autoload_register(function ($class_name) {
  if (defined('VERSION')) {
    // We are in Drupal 7.
    $parts = explode('\\', $class_name);
    // Remove "Drupal" from the beginning of the class name.
    array_shift($parts);
    $module = array_shift($parts);
    $path = 'src/' . implode('/', $parts);
    if ($module == 'MY_MODULE_NAME') {
      module_load_include('php', $module, $path);
    }
  }
});

Hooks have different signatures in Drupal 7 and 8; in my case I was lucky and the only hook I need for Drupal 7 and 8 is hook_entity_presave() which has a similar signature and can be abstracted.

Deeply-nested associative arrays are a staple of Drupal 7, so a lot of legacy code expects this type of data. Shoehorning Drupal 8 to output something like Drupal 7’s field_info_fields(), for example, was a painful experience:

public function fieldInfoFields() {
  $return = array();
  $field_map = \Drupal::entityManager()->getFieldMap();
  foreach ($field_map as $entity_type => $fields) {
    foreach ($fields as $field => $field_info) {
      $return[$field]['entity_types'][$entity_type] = $entity_type;
      $return[$field]['field_name'] = $field;
      $return[$field]['type'] = $field_info['type'];
      $return[$field]['bundles'][$entity_type] = $field_info['bundles'];
    }
  }
  return $return;
}

Finally, making Drupal 8 work like Drupal 7 makes it hard to use Drupal 8’s advanced features such as Plugins. However, once your module is “universal”, adding Drupal 8-specific functionality might be an option.

Using this approach for website upgrades

This approach might remove a lot of the risk associated with complex site upgrades. Let’s say I have a Drupal 7 site with a few custom modules: each module can be made “universal” in this way. If automated tests are added for all subsequent development, migrating the functionality to Drupal 8 might be less painful.

A fun proof of concept, or real value?

I’ve been toying with this approach for some time, and had a good time (yes, that’s my definition of a good time!) implementing it, but it’s not for everyone or every project. If your usecase includes preserving legacy functionality without leveraging Drupal 8’s modern features, while reducing risk, it can have value though. The jury is still out on whether maintaining a single universal branch will really be more efficient than maintaining two separate branches for Realistic Dummy Content, and whether the approach can reduce risk during site upgrades of legacy custom code, which I plan to try on my next upgrade project.

Please enable JavaScript to view the comments powered by Disqus.

Jan 10 2017
Jan 10

You might have heard about the MongoDB scare with titles like: MongoDB Apocalypse Is Here as Ransom Attacks Hit 10,000 Servers!

Rest assured, your MongoDB instances are safe and sound if they are running on Platform.sh. And this is a very strong argument to why our architecture is superior to other PaaS providers.

Unlike other providers, with Platform.sh all the services you use are inside the managed cluster and included in the plan’s price. These are not outside services that expose application ports on the internet. This is what allows us to clone entire clusters, this is what allows us to offer a 99.99% SLA on the entire stack for our enterprise offering, but this is also a security feature.

Each cluster has only two ways in: HTTP or SSH. Our entrypoints simply will not answer anything else.

Your application containers in the cluster have direct connectivity to the service containers, but this happens on a non-routable IP class. There is simply no possible way for the exterior world to access a service directly. And if you are running (in micro-service style) multiple services in the cluster you can even control which has access to which services through the relationships key in your .platform.app.yaml file. Because secure by default makes sense to us.

If you want to connect to a MongoDB instance from the exterior (to run for example an admin interface) you can still do it! But the only way to connect is through an SSH tunnel that relies on your private SSH key (platform tunnel:open on the command line will do the trick). You get all the benefits, all the ease of use of running a modern stack, but none of the hassle and risks of running a patchwork of services.

With Platform.sh you can be WebScale and secure!

Ori Pekelman Jan 10, 2017
Jan 09 2017
Jan 09

Platformsh 2017

First, a joyous and productive 2017 to you all. 2016 was really great for us as a growing company and the new year is a great time to look back and share with you, our dear clients and community, our journey.

The title of the post is audacious, very possibly a hyperbole. There are bigger players than us out there. We don’t claim the highest market share. We claim we have become an obvious choice for ambitious projects. Let me make the case.

Over the course of last year, the leading vendors in the PHP enterprise space Magento, eZ Platform, Typo3, and most recently Symfony - the PHP framework of frameworks - announced their cloud platform to be on Platform.sh. Since its inception two and a half years ago, Platform.sh has already become a leader in the whole PHP space. How did this come about?

Some technologies were born to be great, some have had greatness thrust upon them.

We set out working on Platform.sh with humble ambitions. As a company we were going to solve eCommerce. We believed that Open Source was the way and we believed that the best Open Source platform we could leverage to make an eCommerce solution was Drupal, with its correct mix of wide-spread adoption, code quality and extensibility. This was how Drupal Commerce was born.

We originally built Platform.sh to be the hosted version of this project, with a bunch of unique features that would make it the killer eCommerce service: built-in high-availability, and unmatched development to production workflow. We had to go deep and low for that (when we started the project no one was talking about containers, micro-services or hybrid cloud infrastructures, but we knew it was the way to go.)

To cut a long story short a few short months after presenting Platform.sh to the world the reaction was tremendous. Our clients loved it. But they also quickly asked us… “why can’t we use this for our non-eCommerce Drupal site, what about our Symfony based projects, and Wordpress? And Magento? We use the Akeneo PIM alongside the Magento, and there is a NodeJs based notification service…”.

The 2016 pivot

So like startups do, we pivoted. Commerce Guys has become its own company. And Platform.sh as an independent entity set out to conquer the PaaS market. This happened in the beginning of 2016. We have more than doubled our team since then. We now have people in 10 time zones from the West Coast to the East Coast, from Europe to Asia.

Why keep the PHP focus?

The technology we built was runtime-agnostic. Setting out as an independent company we could very well have shifted our focus from Drupal and PHP. We chose not to.

First a couple of words on the PHP space. There was a moment three or four years ago when there was a widespread perception that PHP was faltering. That it belonged to the realm of legacy, soon to be replaced. Of course, that was before the likes of Slack were born. Before PHP 7.0 went out of the gate. Before Composer took hold. Before Drupal 8.0 was finally released. Before this world started standardizing on Symfony. Today, we know PHP is here to stay, with both its great advantages and its weaknesses. It is powering much of the internet, from Facebook and Wikipedia to the millions and millions of sites running Wordpress and Drupal. It is powering most of online commerce. It is chosen by startups and enterprises alike.

We understood this from the beginning. We understood its importance.

Of course this does not mean we dislike other programming languages and environments. Our team is composed of true polyglots and within it you will find as many people that love functional programming from Lisp addicts to Elixir fans. Both Python and Ruby are loved. Rust is a passion. GoLang highly considered for what it does best. Then there is the herd of C nerds. We even have people that like NodeJS. We really do.

But at the time when PHP seemed to lose its lustre, everybody in the new shiny tools department started building for the new shiny languages. This happened for probably two reasons:

  1. Shiny people like shiny stuff (and who cares if 80% of the web works with something else).
  2. Doing PHP right is hard. Harder than the other stuff.

Why is PHP hard? Because of its immense popularity, PHP is more diverse. It is diverse in the number of frameworks, in the number of versions people run, in the quality of code. And because of its execution model, the topologies in which PHP applications may run can vary wildly.

As such we built a lot of flexibility in. We made it build-oriented so we can support any form of project. Unlike all other PaaS providers we added support for non-ephemeral local storage, so you could run even legacy PHP applications and still benefit from the cloud.

As such, we built it for highly-available micro-services architectures. You can get RabbitMQ, MongoDB, Redis, Postgres, Elastic Search, Solr and of course MySQL available in every Cluster. Doing PHP right meant that we also built it so that you can easily migrate from your “Legacy PHP” to this “Modern PHP” world. A world where no one has root access to a production server. A world of automated consistent deployments.

PHP Leadership

It was our mission to make it easy to do PHP right. That is why we built Platform.sh for “Modern PHP” from the beginning. This is also why early on we added NodeJS, Python, Ruby or Java (modern PHP is no longer as island). And we will be adding ever more services and runtimes which won’t make us lesser of a PHP platform, on the contrary it makes us a better one. Those that have specifically built their systems to run Drupal 7.0 with PHP 5.6 find themselves with an aging platform ill-equipped for new requirements, less performant and less agile. But going wide, we have better and more up-to-date support not only for legacy Drupal and PHP, but also for everything new that is coming. Count on us to be the best Drupal 9.0 hosting service; the best Symfony 4.0 one. The coolest Magento 3.0.

Appreciating this mindset and impressed by our technology, major PHP folks also joined us. We announced the arrival of Larry Garfield, AKA Crell, as our DevEx director and Sara Golemon, of HHVM fame, left Facebook in San Francisco to join our R&D team. Sandro Groganz, a true PHP community veteran joined us just last week to work shore-up our marketing automation team. These people complement our foundation team, that includes people like Robert Douglass and Damien Tournoud. This is how serious we were about investing in PHP by recruiting the best of talents.

In return, we saw how seriously the PHP world is taking us. As early as February 2016, Magento announced their flagship product Magento Enterprise Cloud Edition as a white-label of Platform.sh. Early December, it was announced that the Symfony cloud platform Sensio.Cloud is using Platform.sh as well. In between, we signed deals with TYPO3 community and eZ Systems.

All the while, hundreds of Drupal and Drupal Commerce, Wordpress and custom PHP sites launch every week on Platform.sh. And we are getting more and more people that deploy multi-app and micro-services oriented architectures (with more and more NodeJS, Python and Ruby apps in there as well).

PHP is here to stay, and we are here to make it run

Over the last days of 2016 and the first of 2017 we announced PHP 7.1 support as well as Private Packagist support, and today we can announce HTTP/2 active by default on all projects. Making all the fastness even faster. You can fully expect even more incredible features to be coming your way. We mean to keep on being the best Drupal and the best PHP hosting platform. Stay posted.

Ori Pekelman Jan 9, 2017
Dec 23 2016
Dec 23

In my former life I was a Drupal consultant and architect with Palantir.net. One of my former colleagues there, Michelle Krejci, had a saying she liked to repeat: “Production is an artifact of development.” She even presented on the topic a few times at conferences. What she was saying made sense, but I didn’t fully grok it at the time.

Now I do.

As a PHP developer of 15 years, I like many developers had gotten into the habit of deployment being a question of “put code on server, run”. That’s PHP’s sales-pitch; it’s the sales pitch of any scripting language, really. The same could be said just as easily of Python or Ruby. What is an artifact of development other than code?

Quite a lot, of course. Developers in compiled languages are used to that; source code needs to be compiled to produce something that can actually run. And as long as you have a build process to compile code, throwing other tasks in there to pre-process the source code, or generate other files, is a pretty short leap. For scripting language users, though, that is a weird concept. Isn’t the whole point of a scripting language that I don’t need to compile it?

Well, yes and no. While PHP or Python execute “on the fly” as needed, that doesn’t mean other parts of the system do the same. Sass files need to be converted to CSS. Javascript is frequently written in a non-runnable form these days (TypeScript, ES6, CoffeeScript, JSX, or a dozen other forms) and compiled into ES5-compatible Javascript. Even if you’re not using a code generator or transpiler or whatever is in vogue these days, almost any serious project is (should?) be using CSS compression and JS compression to reduce download size.

And let’s not forget 3rd party dependencies. Any mainstream PHP project these days is built with Composer. Python uses pip, but the effect is the same. 3rd party dependencies are not part of your code base, and do not belong in your Git repository, but are pulled in externally.

On top of that, many systems today do have a PHP code generation step, too. Symfony’s Dependency Injection Container and Routing systems, Doctrine ORM’s generated classes, and various others all entail turning easy-to-edit-but-slow code into uneditable-but-fast code.

For years I’ve been largely avoiding such tools, because I worked mostly with either heavily-managed hosts that had no support for such steps (or their support was far too hard-coded) or client-hosted servers that still believed in hand-crafted artisanal server management. Short of checking the generated CSS, JS, and PHP code into my repository (which we did with Sass/CSS for years), there wasn’t much way to square the clear shift toward even scripting languages having a compile step with the 2005-era thinking of most of the servers I used.

And then I found Platform.sh.

From the very beginning, Platform.sh has been built on the “production is an artifact of development” model. Your application doesn’t consist of just your code. It’s your code, plus 3rd party code, plus your server configuration, plus some CI scripts to generate CSS, compressed JS, optimized PHP, and so forth. Platform.sh was built specifically to address that modern reality. Your git repository may only contain your code, but that gets turned into, repeatably and reliably, a set of application containers, service containers, and “output code” that will actually get run. What that process looks like is up to you and your application; it could involve Sass, or not; compiling TypeScript, or not; dumping a dependency container or routes, or not.

The compiled artifact of development isn’t just your code; in fact it’s not even an application container that includes your code. It’s the entire constellation of tools that form your application — your code, your database, your cache server, your search index, etc. That’s exactly how Platform.sh works, and why it offers far better support for modern web applications than any other managed host I’ve used. (Give it a spin.) And no, I’m not just saying that because I work here. :-)

So thank you, Michelle, for convincing me of what modern web hosting should be. And thank you Platform.sh for making it a reality.

Larry Garfield Dec 23, 2016
Oct 02 2016
Oct 02

October 02, 2016

Unless you work exclusively with Drupal developers, you might be hearing some criticism of the Drupal community, among them:

  • We are almost cult-like in our devotion to Drupal;
  • maintenance and hosting are expensive;
  • Drupal is really complicated;
  • we tend to be biased toward Drupal as a solution to any problem (the law of the instrument).

It is true that Drupal is a great solution in many cases; and I love Drupal and the Drupal community.

But we can only grow by getting off the Drupal island, and being open to objectively assess whether or not Drupal is right solution for a given use case and a given client.

“if you love something, set it free” —Unknown origin.

Case study: the Dcycle blog

I have built my entire career on Drupal, and I have been accused (with reason) several times of being biased toward Drupal; in 2016 I am making a conscious effort to be open to other technologies and assess my commitment to Drupal more objectively.

The result has been that I now tend to use Drupal for what it’s good at, data-heavy web applications with user-supplied content. However, I have integrated other technologies to my toolbox: among them node.js for real-time websocket communication, and Jekyll for sites that don’t need to be dynamic on the server-side. In fact, these technologies can be used alongside Drupal to create a great ecosystem.

My blog has looked like this for quite some time:

Very ugly design.

It seemed to be time to refresh it. My goals were:

  • Keeping the same paths and path aliases to all posts, for example blog/96/catching-watchdog-errors-your-simpletests and blog/96 and node/96 should all redirect to the same page;
  • Keep comment functionality;
  • Apply an open-source theme with minimal changes;
  • It should be easy for myself to add articles using the markdown syntax;
  • There should be a contact form.

My knee-jerk reaction would have been to build a Drupal 8 site, but looking at my requirements objectively, I realized that:

  • Comments can easily be exported to Disqus using the Disqus Migrate module;
  • For my contact form I can use formspree.io;
  • Other than the above, there is no user-generated content;
  • Upgrading my blog between major versions every few years is a problem with Drupal;
  • Security updates and hosting require a lot of resources;
  • Backups of the database and files need to be tested every so often, which also requires resources.

I eventually settled on moving this blog away from Drupal toward Jekyll, a static website generator which has the following advantages over Drupal for my use case:

  • What is actually publicly available is static HTML, ergo no security updates;
  • Because of its simplicity, testing backups is super easy;
  • My site can be hosted on GitHub using GitHub pages for free (although HTTPS is not supported yet for custom domain names Github pages now supports secure HTTPS via Let’s encrypt);
  • All content and structure is stored in my git repo, so adding a blog post is as simple as adding a file to my git repo;
  • No PHP, no MySQL, just plain HTML and CSS: my blog now feels lightning fast;
  • Existing free and open-source templates are more plentiful for Jekyll than for Drupal, and if I can’t find what I want, it is easier to convert an HTML template to Jekyll than it is to convert it to Drupal (for me anyway).
  • Jekyll offers plugins for all of my project’s needs, including the Jekyll Redirect Form gem to define several paths for a single piece of content, including a canonical URL (permalink).

In a nutshell, Jekyll works by regenerating an entirely new static website every time a change is made to underlying structured data, and putting the result in a subdirectory called _site. All content and layout is structured in the directory hierarchy, and no database is used.

Exporting content from Drupal to Jekyll

Depending on the complexity of your content, this will likely be the longest part of your migration, and will necessitate some trial and error. For the technical details of my own migration, see my blog post Migrating content from Drupal to Jekyll.

What I learned

I set out with the goal of performing the entire migration in less than a few days, and I managed to do so, all the while learning more about Jekyll. I decided to spend as little time as possible on the design, instead reusing brianmaierjr’s open-source Long Haul Jekyll theme. I estimate that I have managed to perform the migration to Jekyll in about 1/5th the time it would have taken me to migrate to Drupal 8, and I’m saving on hosting and maintenance as well. Some of my clients are interested in this approach as well, and are willing to trade an administrative backend for a large reduction in risk and cost.

So how do users enter content?

Being the only person who updates this blog, I am confortable adding my content (text and images) as files in Github, but most non-technical users will prefer a backend. A few notes on this:

  • First, I have noticed that even though it is possible for clients to modify their Drupal site, many actually do not;
  • Many editors consider the Drupal backend to be very user-unfriendly to begin with, and may be willing instead of it to accept the technical Github interface and a little training if it saves them development time.
  • I see a big future for Jekyll frontends such as Prose.io which provide a neat editing interface (including image insertion) for editors of Jekyll sites hosted on GitHub.

Conclusion

I am not advocating replacing your Drupal sites with Jekyll, but in some cases we may benefit as a community by adding tools other than the proverbial hammer to our toolbox.

Static site generators such as Jekyll are one example of this, and with the interconnected web, making use of Drupal for what it’s good at will be, in the long term, good for Drupal, our community, our clients, and ourselves as developers

Please enable JavaScript to view the comments powered by Disqus.

Sep 19 2016
Sep 19

September 19, 2016

Docker is now available natively on Mac OS in addition to Linux. Docker is also included with CoreOS which you can run on remote Virtual Machines, or locally through Vagrant.

Once you have installed Docker and Git, locally or remotely, you don’t need to install anything else.

In these examples we will leverage the official Drupal and mySQL Docker images. We will use the mySQL image as is, and we will add Drush to our Drupal image.

Docker is efficient with caching: these scripts will be slow the first time you run them, but very fast thereafter.

Here are a few scripts I often use to set up quick Drupal 7 or 8 environments for module evaluation and development.

Keep in mind that using Docker for deployment to production is another topic entirely and is not covered here; also, these scripts are meant to be quick and dirty; docker-compose might be useful for more advanced usage.

Port mapping

In all cases, using -p 80, I map port 80 of Drupal to any port that happens to be available on my host, and in these examples I am using Docker for Mac OS, so my sites are available on localhost.

I use DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g') to figure out the current port of my running containers. When your containers are running, you can also just docker ps to see port mapping:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
f1bf6e7e51c9        drupal8-image       "apache2-foreground"     15 seconds ago      Up 11 seconds       0.0.0.0:32771->80/tcp   drupal8-container
...

In the above example (scroll right to see more outpu), port http://localhost:32771 will show your Drupal 8 site.

Using Docker to evaluate, patch or develop Drupal 7 modules

I can set up a quick environment to evaluate one or more Drupal 7 modules. In this example I’ll evaluate Views.

mkdir ~/drupal7-modules-to-evaluate
cd ~/drupal7-modules-to-evaluate
git clone --branch 7.x-3.x https://git.drupal.org/project/views.git
# add any other modules for evaluation here.

echo 'FROM drupal:7' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal7-image .
docker run --name d7-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/sites/all/modules --name drupal7-container -p 80 --link d7-mysql-container:mysql -d drupal-image

DRUPALPORT=$(docker ps|grep drupal7-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 15

docker exec drupal7-container /bin/bash -c "echo 'create database drupal'|mysql -uroot -proot -hmysql"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush en views_ui -y"
# enable any other modules here. Dependencies will be downloaded
# automatically

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal7-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Note that we are linking (rather than adding) sites/all/modules as a volume, so any change we make to our local copy of views will quasi-immediately be reflected on the container, making this a good technique to develop modules or write patches to existing modules.

When you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal7-container d7-mysql-container
docker rm drupal7-container d7-mysql-container

Using Docker to evaluate, patch or develop Drupal 8 modules

Our script for Drupal 8 modules is slightly different:

  • ./modules is used on the container instead of ./sites/all/modules;
  • Our Dockerfile is based on drupal:8, not drupal:7;
  • Unlike with Drupal 7, your database is not required to exist prior to installing Drupal with Drush;
  • In my tests I need to chown /var/www/html/sites/default/files to www-data:www-data to enable Drupal to write files.

Here is an example where we are evaluating the Token module for Drupal 8:

mkdir ~/drupal8-modules-to-evaluate
cd ~/drupal8-modules-to-evaluate
git clone --branch 8.x-1.x https://git.drupal.org/project/token.git
# add any other modules for evaluation here.

echo 'FROM drupal:8' > Dockerfile
echo 'RUN curl -sS https://getcomposer.org/installer | php' >> Dockerfile
echo 'RUN mv composer.phar /usr/local/bin/composer' >> Dockerfile
echo 'RUN composer global require drush/drush:8' >> Dockerfile
echo 'RUN ln -s /root/.composer/vendor/drush/drush/drush /bin/drush' >> Dockerfile
echo 'RUN apt-get update && apt-get upgrade -y' >> Dockerfile
echo 'RUN apt-get install -y mysql-client' >> Dockerfile
echo 'EXPOSE 80' >> Dockerfile

docker build -t drupal8-image .
docker run --name d8-mysql-container -e MYSQL_ROOT_PASSWORD=root -d mysql
docker run -v $(pwd):/var/www/html/modules --name drupal8-container -p 80 --link d8-mysql-container:mysql -d drupal8-image

DRUPALPORT=$(docker ps|grep drupal8-container|sed 's/.*0.0.0.0://g'|sed 's/->.*//g')

# wait for mysql to fire up. There's probably a better way of doing this...
# See stackoverflow.com/questions/21183088
# See https://github.com/docker/compose/issues/374
sleep 15

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush si -y --db-url=mysql://root:[email protected]/drupal"
docker exec drupal8-container /bin/bash -c "chown -R www-data:www-data /var/www/html/sites/default/files"
docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush en token -y"
# enable any other modules here.

echo -e "Your site is ready, you can log in with the link below"

docker exec drupal8-container /bin/bash -c "cd /var/www/html && drush uli -l http://localhost:$DRUPALPORT"

Again, when you are finished you can destroy your containers, noting that all data will be lost:

docker kill drupal8-container d8-mysql-container
docker rm drupal8-container d8-mysql-container

Please enable JavaScript to view the comments powered by Disqus.

Jul 05 2016
Jul 05

When Platform.sh launched, the majority of our business was Drupal 7 sites running Drupal Commerce. While we still host many of those, our business has expanded to cover many application stacks and languages. Drupal 8 has been out for 8 months now, Symfony’s market is growing, and we support both PHP and NodeJs with more languages on the way (stay tuned!).

As a result some assumptions we baked into the system no longer make sense for the majority of or users. We are therefore removing the default Platform.sh configuration files that were previously used if your project didn’t include one.

Wait, but what about my existing sites!

If you already have an existing project with Platform.sh, it is completely unaffected. This change only affects newly created projects as of Monday 25 July 2016.

We still recommend that all projects ensure they have the appropriate configuration files committed to Git, but only new projects are technically required to do so.

Whew. OK, so what’s the problem?

There are three files that drive your entire cluster with Platform.sh:

  • .platform.app.yaml defines your application container, where your code runs.
  • .platform/routes.yaml defines your routing container, and how it maps and caches incoming requests to your application container.
  • .platform/services.yaml defines what other services should be included in each cluster, such as MySQL, Redis, or Elasticsearch.

(No, really, that’s it. That’s your entire server cluster definition. Neat, eh?)

Previously, if one of those files was missing we would create a default file automatically. Those default files were designed around a specific use case: Drupal 7 running on PHP 5.4 with Redis caching and Solr for search. However, that is increasingly not the typical case; Drupal 8 is growing fast, PHP 5.4 is no longer supported by the PHP team, the various services have new versions available, and Platform.sh offers a lot more than just Drupal and PHP. (A default PHP container with drush make makes little sense if your application is written in Node.js…) That makes those defaults less and less useful to keep around.

It also meant that to entirely disable additional services, say for a statically generated site (like the Platform.sh site itself), required adding a blank file to the repository to override the default 3 services. That’s just silly.

So what changes?

We no longer add default files. No file, no behavior. That means you must provide, at least, a .platform.app.yaml file and a .platform/routes.yaml file for a site to work. If you don’t provide those, trying to push a branch to our Git repository will fail as the code cannot be deployed. (The .platform/services.yaml is optional; if you don’t need any services, skipping this file will simply not create any.)

If you’re already in the habit of adding those files to your Git repository for a new project, congratulations, nothing changes for you. :-)

We are also dropping version-defaults for the app container and services. That is, if you ask for a mysql service you must also specify the version; we won’t magically pick one for you if not specified, for the same reason: The defaults would be old-forever. We want you to be able to move your site to the latest and greatest version of your language and services of choice on your schedule, not ours.

If you want to see the old defaults that were created, in case you want to use them yourself, they’re listed in in our documentation site.

For more information on those configuration files, see the documentation:

Larry Garfield Jul 5, 2016
Jun 07 2016
Jun 07

Drupal 8.1 has made a significant shift toward embracing Composer as the standard way to install and manage a Drupal site. Starting today, with Drupal 8.1.2, Platform.sh’s Drupal 8 templates are also Composer-based, and default to using PHP 7.

Wait, what about my existing sites?

Absolutely nothing changes for sites already hosted with Platform.sh. If you’re using Drush Make or just checking your entire Drupal site into Git, you can continue to do so. Nothing in this post applies to you (unless you want it to).

Oh good. So what actually changes?

When you create a new site with Platform.sh, you’re given the opportunity to select a “template” for a site. The template is really just a starter Git repository, complete with a recommended .platform.app.yaml file and .platform directory for a given application. Until now, the template for Drupal 7 and Drupal 8 used Drush Make as their build script. The Drupal 8 template now uses Composer, just like our Symfony template and most other PHP applications.

The Composer template is closely based on the (excellent) Drupal-Composer project built by the Drupal community. It adds only two patches to make Drupal install cleanly on a read-only file system, both of which have already gone through the Drupal issue queues and are just waiting to be committed. Once they’ve been incorporated into a point release we’ll drop those patches from our composer.json file.

As Drupal 8 is also fully tested with PHP 7, we’ve defaulted to PHP 7.0 for all newly created Drupal 8 sites.

As Platform.sh containers are always “production ready”, the composer command we use is optimized for production. Specifically, we run:

composer install --no-progress --prefer-dist --optimize-autoloader

Neat. But wait, which Composer repository are you using for Drupal?

Drupal currently has two different Composer services, a legacy one hosted at https://packagist.drupal-composer.org and a new, experimental one at https://packages.drupal.org/. We’ve been in contact with the Drupal.org Infrastructure team, and they’ve given us the go-ahead to default to the new, official service.

If you want to switch back to the legacy service, be sure to check the Drupal.org documentation page for notes on the different way it handles module versions.

But, but, I have legacy code that doesn’t work with PHP 7 yet!

Not to worry! If you need to start a new Drupal 8 site but want to run it on PHP 5.6 instead, simply edit your .platform.app.yaml file and change the line

type: "php:7.0"

to

type: "php:5.6"

Then git push. Yes, it really is that easy.

(PHP versions before 5.6 are not supported by the PHP development team. We only provide those images to support legacy projects. Please use PHP 5.6 or, preferably, PHP 7 for all new projects. Security experts around the world thank you.)

I already have a Drupal 8 project using Composer. Will it still work?

Absolutely! Simply go to the template Git repository and copy the .platform.app.yaml file and .platform directory, then stick those in your project’s Git root. If you used the Drupal-Composer project to create it initially, all of the paths should still work fine. You will also need the settings.php and settings.platformsh.php files to ensure your site gets the correct database credentials and such. You can tweak the .platform.app.yaml file if needed, such as if your files are organized differently.

You can also tweak those files as needed to configure your cluster exactly how you need. See the documentation for more details.

What about Drupal 7?

We’re still investigating whether we want to switch our Drupal 7 template over to Composer. (If you have thoughts on the matter, let us know.) Currently, Drupal 7’s test suite doesn’t fully pass under PHP 7 as there’s still a few edge case bugs, and a number of contrib modules need minor tweaks. We may default Drupal 7 to PHP 7 in the future when we feel it’s safe to do so. For now, we recommend PHP 5.6 for Drupal 7 sites.

Wow, thanks, this is great!

Happy to help! See you in the Cloud…

Larry Garfield Jun 7, 2016
May 19 2016
May 19

You’re developing your site on Platform.sh and you love the fact that you get exact copies of your production site for every Git branch that you push.

But now that you think about it, you realize that all those copies used by your development team to implement new features or fixes contain production data (like user emails, user passwords…). And that all the people working on the project will have access to that sensitive data.

So you come up with the idea to write a custom script to automatically sanitize the production data every time you copy the production site or synchronize your development environments. Next you think of a way to automatically run that script. Possibly a custom Jenkins job that you will maintain yourself. But, of course, you will need to update this Jenkins job for every new project you work on. Plus, you will have to figure out the permissions for this script to give proper access to your site.

So Simple

But wait, what if I told you that all this hassle can be handled in a simple deployment hook that Platform.sh provides?

Indeed, with Platform.sh, every action will trigger specific hooks where you can interact either with the build phase or the deployment phase of the process.

For example with Drupal, you can use the drush sql-sanitize command to sanitize your database and get rid of sensitive live information.

Also you need to make sure that the sanitization only runs on the development environments and never on the master environment (you will hate me if that happens):

type: php:7.0
build:
    flavor: drupal
hooks:
    build: |
    # Whatever you want to do during the build phase.
    deploy: |
        cd /app/public
        if [ $PLATFORM_ENVIRONMENT = "master" ]; then
            # Do whatever you want on the production site.
        else
            drush -y sql-sanitize --sanitize-email=user_%[email protected] --sanitize-password=custompassword
        fi
        drush -y updatedb

If you are not working with Drupal, you can even run your own sanitization script. Read more about build and deployment hooks on our public documentation.

To access the deploy hook logs on the server:

$ platform ssh
[email protected]:~$ cat /var/log/deploy.log

[2016-05-18 10:14:13.872085] Launching hook 'cd /app/public
if [ $PLATFORM_ENVIRONMENT = "master" ]; then
    # Do whatever you want on the production site. 
else
    drush -y sql-sanitize --sanitize-email=user_%[email protected] --sanitize-password=custompassword
fi
drush -y updatedb

The following operations will be done on the target database:
* Reset passwords and email addresses in users table
* Truncate Drupal's sessions table

Do you really want to sanitize the current database? (y/n): y
No database updates required                                           [success]

That’s it! Sleep well now ;)

Augustin Delaporte May 19, 2016
May 05 2016
May 05

Are you joining us at DrupalCon New Orleans next week? It’s going to be a blast!

Those who have attended a DrupalCon before know how intense they can be. For first-timers, a DrupalCon can be overwhelming. The Drupal Community is an amazing and welcoming group of people, almost unnervingly so at times. The energy around a DrupalCon is palpable, but that means it can be a shock to those used to a calmer event.

So how do you get the most out of a DrupalCon? Glad you asked…

Water

No, really. DrupalCons are big and you’ll be walking a lot and talking to a lot of people. Have a water bottle on you. A sholder-sling or belt clip bottle is best because it’s easier to keep with you, but if your laptop bag or backpack has a bottle holder that works well, too.

Sure, there will be coffee breaks. But the lines may be long, you don’t want to wait for them to get a drink, and water is healthier for you anyway (which matters, really).

Bring a notebook

It doesn’t have to be paper, of course. A tablet, phone, Chromebook, laptop, or whatever else lets you take notes is fine. You’ll be exposed to a million new ideas this week, and your odds of remembering everything you found really cool I need to use that this will change my life are slim. Write it down! At least write down key terms, phrases, tool, and links to Google later.

Have lunch with strangers

What good is hanging out with a conference of thousands of people if you only talk to the people you know? Take advantage of the general friendliness of the Drupal community to meet new people. Break away from your usual team and talk to someone else’s team. Maybe it’s developers you don’t know. Maybe it’s a vendor you’re considering hiring and want to get to know better. (Yep, we’ll be there!) Maybe it’s the marketing director for another institution like yours. Or all of them at the same table. Spend time with new people and come away with new friends.

One caveat,though: Most Drupal developers are very friendly, but please don’t fawn. Yes, you may be casually chatting with the person who wrote the module that runs your entire business, but they’re still just a (really smart!) person hanging around, learning stuff, and eating lunch. Please treat them as such.

Mix in the Hallway Track

DrupalCon New Orleans has 130 sessions across 13 tracks, with 11 concurrent sessions. That’s a lot of content. Fortunately, it’s also all recorded. DrupalCon has one of the best session recording programs of any conference I’ve been to, so if there are too many simultaneous sessions you really want to attend, worry not! The Drupal Association has you covered. (Unless it’s my session on PHP 7 on Wednesday at 1pm. Then just go to the session.)

So well covered, in fact, that you shouldn’t try to pack a session into every time slot. Take some time to just talk and mingle with people. Get into heated (but polite) debates about technical issues with someone you just met. Stop by the expo hall to chat with the Platform.sh team (and the other sponsors, too). Step by the Business Showcase sessions in the vendor hall, especially at 2:00 on Wednesday to see Platform.sh’s resident astronaut. :-) Collect swag from the vendor hall. (That’s why you go to a conference, right? All the free swag?)

Pace yourself

There’s so much to do at DrupalCon, including the after-parties, that it’s easy to lose track of time, or have one too many beers with those new friends you just met. Be sure to pace yourself. DrupalCon is a week-long event; don’t spend all your energy on day one. In addition to hydrating, get a good night’s sleep every night. (Note: “Good night’s sleep” is relative. A full 6 hours is generally considered a lot during DrupalCon week.)

Also, eat healthy! Although the conference lunch tries to be reasonably healthy, it’s very easy to fall into the “pizza and beer and beer” trap at the after parties with all of your new friends. Be careful to mix in plenty of protein and vegetables while you’re at it, so that you can stay upright for the next night. You want to be awake and coherent for Thursday night’s Trivia Night.

Come for the sessions, stay for the sprints

DrupalCon doesn’t end with the closing ceremony! Drupal is all about contributing and giving back. That’s how you pay for Open Source. And the best place to do that at DrupalCon is at the Sprints on Friday. You do not need to be an accomplished developer, or any developer, in order to help out. There are sprint areas for coders, for front-end devs, for documentation, for UX testing, for marketing, you name it. If there’s not yet a planned sprint for a topic you’re interested in… guess what, you’re now organizing it. (Hat tip to Cal Evans…)

Not sure what to do or where to start? There’s even a First-time Sprinter Workshop, where people will be on hand to help you get started. Even if that means starting from “So, who’s this Git I keep hearing about?” someone will be able to get you onboarded and on your way.

Go to the Prenote

Most importantly, of course, plan to get up early enough on Tuesday to attend the DrupalCon Prenote. The Prenote is a DrupalCon tradition, and a great way to break the ice, whether you’re a new attendee or seasoned DrupalCon veteran. Past years have included sketch comedy, super heros, sacrificing Dries for Christmas dinner, crustaceans, and musical comedy. I can’t give away too much for this year’s plans, but I will leak out… it will definitely sound better in person than on the recording. ;-)

Always start the Con with Dries’ favorite session.

We’ll see you in NOLA!

Larry Garfield May 5, 2016
Jul 30 2015
Jul 30

Making themes and specially advanced ones for Drupal has never been an easy task, it requires considerable amount of Drupal knowledge and in most cases at least bit of programming. So it comes as no surprise that despite the popularity of Drupal, web designers are reluctant to create themes for Drupal. Hopefully by the release of Drupal 8, it becomes a bit easier, but there is still a lot of work to do. The module which i'm going to introduce, can considerably simplify theming and eliminate/reduce the required programming for making almost all sort of Drupal themes.

When we create themes in Drupal, there are great number of reoccurring tasks that we have to do like adding IE conditional comments, remove or replace some core or contributed modules CSS/JS files to prevent conflict with the theme, putting some JavaScript at the bottom of the page or even adding inline CSS or JS files.

Unfortunately we can't do any of these common tasks using Drupal's theme .info file. Surprisingly however we can do most of it using Drupal 7's JS/CSS API! But not easily and not without programming. So as a themer with no knowledge of programming or Drupal's API, we will have no choice but to work around Drupal and directly modify the HTML (as most Drupal themers do) and by doing so not only lose all the great features that Drupal's modularity brings like all sort of CSS/JS optimizations, CDN, etc., but also will have to manually resolve the problems that it causes for core and contributed modules' UI and functionality.

Wouldn't have been great if we had total control over CSS/JS files via theme .info without having to know programming? That's exactly the purpose of CSS JS Total Control module. It extends Drupal's theme .info and adds loads of new features for handling JavaScripts and Stylesheets and is fully compatible with core and all the related contributed modules. No more programming or working around Drupal for handling JavaScripts and Stylesheets.

Download this module from [here], and start using it right away :) don't forget to send feedbacks

So lets have a look at the supported features :

  • Full support for drupal_add_css and drupal_add_js parameters and even more!
    • Adding external files
    • Defining where to include it : header / footer
    • Adding inline css/js
    • Whether to display on all pages or not
    • Defining style/script group : theme / module / core
    • Weight (the order of adding to the page)
    • Supporting Defer parameter
    • Enable/Disable caching per style/script
    • Enable/Disable preprocessing
    • Enable/Disable using core
    • Adding attributes like id to stylesheet/javascript include tags
    • Support for IE conditional comments for both styles and scripts
    • Defining style media : print/all/screen
  • Manipulating existing styles/scripts
    • Creating a white-list or blacklist to decide which style/scripts should be added to the page
    • Possibility of replacing and overriding core and contributed modules styles and scripts using only the info file
  • Possibility of altering the scripts and styles (hook_js_alter and hook_css_alter support for Drupal 6)
  • Compatible with most of the style and a script manipulation modules
  • Adds theme_path variable to be used by template files and css_js_total_control_get_theme_path function

Some examples for demonstration : You can read the full document plus practical examples [here]

Replacing core jquery!

scripts-settings[filter][rules][0][function] = regular_expression
scripts-settings[filter][rules][0][pattern] = %misc/jquery|jquery_update%
scripts-settings[filter][type] = blacklist

scripts-extended[js/vendor/jquery.min.js][scope] = header
scripts-extended[js/vendor/jquery.min.js][weight] = 0
scripts-extended[js/vendor/jquery.min.js][group] = core

Adding an inline script at the bottom of the HTML!

scripts-extended[js/menu-effect.inline.js][scope] = footer
scripts-extended[js/menu-effect.inline.js][type] = inline

Adding a stylesheet only for IE 7

stylesheets-extended[css/font-awesome-ie7.min.css][condition-string] = if IE 7

Adding an id to a stylesheet's include html tag (usage is mostly for dynmically changing theme style via javascript)

stylesheets-extended[css/menu/styles/lblack.css][media] = all
stylesheets-extended[css/menu/styles/skins/lblack.css][attributes][id] = custom_menu

Moving an script before all the other scripts

scripts-extended[js/vendor/jquery.min.js][scope] = header
scripts-extended[js/vendor/jquery.min.js][weight] = 0
scripts-extended[js/vendor/jquery.min.js][group] = core

Adding an inline script at the bottom of the page, (prints the content of the file)

scripts-extended[js/menu-effect.inline.js][scope] = footer
scripts-extended[js/menu-effect.inline.js][type] = inline

Adds a javscript library. (Relied on libraries module's API to load it)

scripts-extended[easing][type] = library
scripts-extended[easing][version] = default

Add some settings to Drupal js variable, (we can use this settings later on in our custom js files)

scripts-extended[mythemename][type] = setting
scripts-extended[mythemename][setting][name] = special

Allowing only necessary stylesheets and removing the rest to prevent conflict with theme styles

stylesheets-settings[filter][rules][0][function] = regular_expression
stylesheets-settings[filter][rules][0][pattern] = %settings|admin|misc|jquery_update%
stylesheets-settings[filter][type] = whitelist

The END.

Jul 06 2015
Jul 06

July 06, 2015

If you are using a site deployment module, and running simpletests against it in your continuous integration server using drush test-run, you might come across Simpletest output like this in your Jenkins console output:

Starting test MyModuleTestCase.                                         [ok]
...
WD rules: Unable to get variable some_variable, it is not           [error]
defined.
...
MyModuleTestCase 9 passes, 0 fails, 0 exceptions, and 7 debug messages  [ok]
No leftover tables to remove.                                           [status]
No temporary directories to remove.                                     [status]
Removed 1 test result.                                                  [status]
 Group  Class  Name

In the above example, the Rules module is complaining that it is misconfigured. You will probably be able to confirm this by installing a local version of your site along with rules_ui and visiting the rules admin page.

Here, it is rules which is logging a watchdog error, but it could by any module.

However, this will not necessarily cause your test to fail (see 0 fails), and more importantly, your continuous integration script will not fail either.

At first you might find it strange that your console output shows [error], but that your script is still passing. You script probably looks something like this:

set -e
drush test-run MyModuleTestCase

So: drush test-run outputs an [error] message, but is still exiting with the normal exit code of 0. How can that be?

Well, your test is doing exactly what you are asking of it: it is asserting that certain conditions are met, but you have never explicitly asked it to fail when a watchdog error is logged within the temporary testing environment. This is normal: consider a case where you want to assert that a given piece of code logs an error. In your test, you will create the necessary conditions for the error to be logged, and then you will assert that the error has in fact been logged. In this case your test will fail if the error has not been logged, but will succeed if the error has been logged. This is why the test script should not fail every time there is an error.

But in our above example, we have no way of knowing when such an error is introduced; to ensure more robust testing, let’s add a teardown function to our test which asserts that no errors were logged during any of our tests. To make sure that the tests don’t fail when errors are expected, we will allow for that as well.

Add the following code to your Simpletest (if you have several tests, consider creating a base test for all of them to avoid reusing code):

/**
 * {inheritdoc}
 */
function tearDown() {
  // See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
  $num_errors = $this->getNumWatchdogEntries(WATCHDOG_ERROR);
  $expected_errors = isset($this->expected_errors) ? $this->expected_errors : 0;
  $this->assertTrue($num_errors == $expected_errors, 'Expected ' . $expected_errors . ' watchdog errors and got ' . $num_errors . '.');

  parent::tearDown();
}

/**
 * Get the number of watchdog entries for a given severity or worse
 *
 * See http://dcycleproject.org/blog/96/catching-watchdog-errors-your-simpletests
 *
 * @param $severity = WATCHDOG_ERROR
 *   Severity codes are listed at https://api.drupal.org/api/drupal/includes%21bootstrap.inc/group/logging_severity_levels/7
 *   Lower numbers are worse severity messages, for example an emergency is 0, and an
 *   error is 3.
 *   Specify a threshold here, for example for the default WATCHDOG_ERROR, this function
 *   will return the number of watchdog entries which are 0, 1, 2, or 3.
 *
 * @return
 *   The number of watchdog errors logged during this test.
 */
function getNumWatchdogEntries($severity = WATCHDOG_ERROR) {
  $results = db_select('watchdog')
      ->fields(NULL, array('wid'))
      ->condition('severity', $severity, '<=')
      ->execute()
      ->fetchAll();
  return count($results);
}

Now, all your tests which have this code will fail if there are any watchdog errors in it. If you are actually expecting there to be errors, then at some point in your test you could use this code:

$this->expected_errors = 1; // for example

Please enable JavaScript to view the comments powered by Disqus.

Jun 30 2015
Jun 30

Queries are the centerpiece of MySQL and they have high optimization potential (in conjunction with indexes). This is specially true for big databases (whatever big means). Modern PHP frameworks tend to execute dozens of queries. Thus, as a first step, it is required to know what the slow queries are. A built-in solution for that is the MySQL slow query log. This can either be activated in my.cnf or dynamically with the --slow_query_log option. In both cases, long_query_time should be reduced to an appropriate value. Most Linux distributions come up with a default value of 1 or more seconds. But this turns out too slow for web applications as you want to achieve an overall response time of a few hundreds of milliseconds. So depending on your needs of performance choose a value of 0.1 or 0.01 seconds.

SQL consists of 2 different types of queries: Those who belong to the Data definition language (DDL) and those who are working with data (Data manipulation language, DML). DDL queries have usually no performance implications. But there is an exception of this rule of thumb: ALTER TABLE statements can be very time-consuming, if a table contains millions of records and uses (unique) indexes. We will cover a practice in a minute. DML queries again can be divided into INSERT statements and other CRUD statements (SELECT, UPDATE and DELETE) on the other hand. Those statements can be optimized with several techniques. Most of this blog post will address this type of statement.

Optimizing ALTER TABLE statements

Imagine you have an accounts table with millions of records and you want to extend it with a field for a phone number. A direct execution of ALTER TABLE would certainly lead to major load. The trick is to avoid index ad-hoc re-calculation. Hence, we drop all indexes and copy the table to an extra table and perform structural changes there.

  1. Set innodb_buffer_pool_size appropriately (Be aware: For performing structural changes, a high buffer pool size can speed up things; Being live however, a high size will lead to memory shortages)
  2. (Optional) Backup the database
  3. Drop all indexes except primary key and foreign keys
    DROP index ...
  4. 4. Copy the table and apply structural changes. Use a similar name, for example with suffix '_new'.

    CREATE TABLE IF NOT EXISTS `Accounts_new`
      id` int(11) NOT NULL AUTO_INCREMENT,
      `email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      `city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`),
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
      ALTER TABLE `Accounts_new` ADD `phone` VARCHAR(255 ) NOT NULL;

  5. Copy data with INSERT INTO ... SELECT. Just select the columns that are used in the new table.
    INSERT INTO Accounts_new SELECT `id`, `email`,  `city`, null FROM Accounts;
  6. Rename the table. In case of used foreign keys disable the foreign key check.

    SET foreign_key_checks = 0;
      DROP TABLE Accounts;
      ALTER TABLE Accounts_new RENAME Accounts;
      SET foreign_key_checks = 1;

  7. Create all indexes including foreign keys.
    CREATE index ...

Two steps require major efforts. First, copying all the data to the new table will take some time; Second, rebuilding all indexes can last a long time (it depends on the number of indexes and whether they are unique or not).


Optimizing insertions

INSERT queries should be merged, if possible. A single query that creates 10 rows is faster than 10 sole queries. However, this technique has its limits, especially, if MySQL runs out of memory. If you want to import a whole database, then you can switch off some consistency checks, for example foreign_key_checks=0, unique_checks=0. Moreover, autocommit=0 can also help.


Optimizing SELECT statements

SELECT, UPDATE and DELETE statements have one thing in common: It is the way they filter results (with the WHERE clause). This can turn out as a complex task, especially for big tables. Big means tables having a row count from 100 000. Tables having more than one million rows should definitely be included into query optimization. For the sake of simplicity, we concentrate on SELECT queries. It is the most frequently used case anyway.


1) Use EXPLAIN

If you want to optimize your query, you should know how MySQL executes it. You can use EXPLAIN to get the query execution plan. With MySQL Version 5.6 it is possible to use explain for insert, update and delete statements.

EXPLAIN SELECT * FROM Users WHERE uid = 1;

The result contains several useful informations about the query:


column Description select_type Is the query a simple query (primary) or is it a compounded query (join or subquery)? type This is extremely important for joins or subqueries: How is this query joined? The best types are: const, ref, eq_ref. Worse types are: range, index, all. Attention: do not mix up index with ref/eq_ref! For further informations, please visit the MySQL docs. possible keys A list of indexes which could be used to optimize the speed of the query. key The used index key_len The length of the index. Shorter indexes tend to perform better. ref Which column is used for the index scan? rows Estimated number of rows that have to be compared with the filter criteria. This number should be as low as possible. extra Additional information about the query. Attention: Do not mix up Using index with ref/eq_ref!

MySQL docs: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

If the query is a simple query (i.e. no joins or subqueries are used), then EXPLAIN will return a single line where select_type is set to SIMPLE. To get a good performance, it is important to use an existing index. This is the case when type is equal to ref and possible_keys and key suggest an index.

If joins are used, the returned result will contain a line per table. Joining tables should always be done by a foreign key comparison. In this case the type of an EXPLAIN is eq_ref. Avoid to leave out foreign keys. Try to avoid joins on different attribute types, for instance a varchar field and an integer field. This will make MySQL do a lot of type conversions which is simply not good.


2) Use existing indexes

Indexes are ordered by (at least) an attribute by design. Thus, they can be applied to queries which are filtering by this attribute, either as exact filter (WHERE x = 'y') or as range query (WHERE timestamp >= 123). Indexes are not applicable if you use any function in the WHERE clause, for instance WHERE SUBSTR(name, 0, 4) = 'Alex'. The following list shows which WHERE clauses can be handled by indexes:

WHERE x = 'y' check.png

WHERE timestamp >= 123 check.png

WHERE timestamp BETWEEN 123 AND 456 check.png

WHERE name LIKE ('Ale%') check.png

WHERE name LIKE ('%Ale%') error.png

WHERE SUBSTR(name, 0, 4) = 'Alex' error.png

If you have more than one filter criterion in the query, your index should include all used columns as well. Imagine you have the following indexes: name_IDX, firstname_IDX, firstname_name_IDX and name_firstname_IDX. Then the query

# Using composite indexes
SELECT * FROM Users WHERE firstname = 'Alex' AND name = 'Saal'

... could be optimized with firstname_IDX, firstname_name_IDX but not with name_firstname_IDX because of the order of the columns! The order has to be the same in the query as well as in the index. It is like using a telephone book. A telephone book is ordered by last name, then by first name. It is much more easy to first look for all persons with the desired last name and have a list with only a few persons. It does not make sense at all to browse the whole telephone book looking for people with a wanted first name and then comparing the last name in step 2.

Keeping this image in mind: It is always good to have a selective index. You can use an index which includes a gender of a customer. But this reduces the data set only by a half. Instead, it is much more pleasant to have an index like e-mail address or a unique index like Social Security Number. Be selective! As a rule of thumb, there are 3 levels of selectivity:

  • Primary key or unique key (best; those clauses will return a single row immediately)
  • An index matching the WHERE clause, or a prefix index (useful for text fields)
  • No key or index is applicable (worst)

Furthermore, firstname_name_IDX matches better than firstname_IDX and will be preferred by MySQL. Note that firstname_name_IDX can also be used for queries like

# Filtering the first name
SELECT * FROM Users WHERE firstname = 'Alex'

It is therefore neither necessary nor recommended having both indexes created simultaneously.

The indexes are always read from left to the right. If you have an index containing multiple columns - index names (column_firstname, column_familyname) - the order of your filters in the query should follow the same order. Otherwise the index can not be used. So if you filter without using the first column (column_firstname is not used in the query) in the index, but assuming that the filter is also used by just filtering for the second column (column_familyname) in the index, the index is not used. Therefore it is sometimes better to add a second index using just the second column. Check the statement by using EXPLAIN to check which index is used or not. For examples see the chapter about table indexes below.


3) Or Statements

The mysql query optimizer can not use indexes if the OR statement is used, so try to avoid OR statements!


4) Optimization of GROUP BY/ORDER BY queries

Sometimes you are facing queries that aggregate or sort rows:

# GROUP BY / ORDER BY
SELECT role, count(*) FROM Users WHERE registration_ts > 140000000 GROUP BY role;
SELECT id, username FROM Users WHERE registration_ts > 140000000 ORDER BY username;

What MySQL does is:

  1. Selection of Users by WHERE registration_ts > 140000000,
  2. Order results of step 1 (no matter if GROUP BY role or ORDER BY username is used)
  3. Projection to the desired columns by SELECT role or SELECT id, username

The hardest step is sorting. This is where indexes can help a lot. They contain a sorted list of records dependent to their definition. This is extremely helpful in particular if you have a lot of data in that table (Complexity of sorting algorithms is O(n*log(n))). How to define the index to optimize this query? Choose first the column filtered in the WHERE clause, then those in GROUP BY/ORDER BY (in the same order as in the query!). If it is possible to add the columns of SELECT to the index (after the columns of GROUP BY/ORDER BY) to gain some performance (this technique is called covering index). It is not always reasonable to use covering indexes: If the whole index gets too big, then you probably won't gain any time.

Extending the example of a telephone book: It is helpful, if you have requests like "Tell me how many persons have the last name 'Smith'" (This is a GROUP BY) or "Give me a list of all persons ordered by last name and first name" (ORDER BY).

In the previous example use the following indexes:

  • registration_role_IDX for the GROUP BY statement
  • registration_username_IDX for the ORDER BY statement

5) Usage of subqueries

When it comes to complex queries, MySQL (especially before 5.6) is optimized for using JOIN statements. However, in some cases a subquery can be more efficient if you use both GROUP BY and ORDER BY on different tables. In that case, an index cannot be used, if you join the tables. Defining a main query and subquery avoids this problem, as each query acts on its own table and is able to use any available index.

# Case A: Query as INNER JOIN
SELECT
    a.id AS account_id,
    p.id AS product_id,
    TRIM(SUBSTRING(p.name, 1, 30)) AS product_name,
    COUNT(*) AS count
FROM Accounts a
INNER JOIN Orders o ON a.id = o.account_id
INNER JOIN Products p ON p.id = o.product_id
GROUP BY p.id
ORDER BY a.id

# Case B: Subquery
SELECT account_id, product_id, product_name, count
FROM (SELECT
    a.id AS account_id,
    p.id AS product_id,
    TRIM(SUBSTRING(p.name, 1, 30)) AS product_name,
    COUNT(*) AS count
  FROM Accounts a
  INNER JOIN Orders o ON a.id = o.account_id
  INNER JOIN Products p ON p.id = o.product_id
  GROUP BY p.id) as product
ORDER BY account_id

In that case, the query has been split up to an outer query and a subquery (line 2-10). Case A would make MySQL create a temporary table and use filesort. Case B can avoid that. It depends on the size of each table, which way is superior.

Other MySQL blog posts

Jun 10 2015
Jun 10

June 10, 2015

To me, modern code must be tracked by a continuous integration server, and must have automated tests. Anything else is legacy code, even if it was rolled out this morning.

In the last year, I have adopted a policy of never modifying any legacy code, because even a one-line change can have unanticipated effects on functionality, plus there is no guarantee that you won’t be re-fixing the same problem in 6 months.

This article will focus on a simple technique I use to bring legacy Drupal code under a test harness (hence transforming it into modern code), which is my first step before working on it.

Unit vs. functional testing

If you have already written automated tests for Drupal, you know about Simpletest and the concept of functional web-request tests with a temporary database: the vast majority of tests written for Drupal 7 code are based on the DrupalWebTestCase, which builds a Drupal site from scratch, often installing something like a site deployment module, using a temporary database, and then allows your test to make web requests to that interface. It’s all automatic and temporary environments are destroyed when tests are done.

It’s great, it really simulates how your site is used, but it has some drawbacks: first, it’s a bit of a pain to set up: your continuous integration server needs to have a LAMP stack or spin up Vagrant boxes or Docker containers, you need to set up virtual hosts for your code, and most importantly, it’s very time-consuming, because each test case in each test class creates a brand new Drupal site, installs your modules, and destroys the environment.

(I even had to write a module, Simpletest Turbo, to perform some caching, or else my tests were taking hours to run (at which point everyone starts ignoring them) – but that is just a stopgap measure.)

Unit tests, on the other hand, don’t require a database, don’t do web requests, and are lightning fast, often running in less than a second.

This article will detail how I use unit testing on legacy code.

Typical legacy code

Typically, you will be asked to make a “small change” to a function which is often 200+ lines long, and uses global variables, performs database requests, and REST calls to external services. But I’m not judging the authors of such code – more often than not, git blame tells me that I wrote it myself.

For the purposes of our example, let’s imagine that you are asked to make change to a function which returns a “score” for the current user.

function mymodule_user_score() {
  global $user;
  $user = user_load($user->uid);
  $node = node_load($user->field_score_nid['und'][0]['value']);
  return $node->field_score['und'][0]['value'];
}

This example is not too menacing, but it’s still not unit testable: the function calls the database, and uses global variables.

Now, the above function is not very elegant; our first task is to ignore our impulse to improve it. Remember: we’re not going to even touch any code that’s not under a test harness.

As mentioned above, we could write a subclass of DrupalWebTestCase which provisions a database, we could create a node, a user, populate it, and then run the function.

But we would rather write a unit test, which does not need externalities like the database or global variables.

But our function depends on externalities! How can we ignore them? We’ll use a technique called dependency injection. There are several approaches to dependency injection; and Drupal 8 code supports it very well with PHPUnit; but we’ll use a simple implementation which requires the following steps:

  • Move the code to a class method
  • Move dependencies into their own methods
  • Write a subclass replaces dependencies (not logic) with mock implementations
  • Write a test
  • Then, and only then, make the “small change” requested by the client

Let’s get started!

Move the code to a class method

For dependency to work, we need to put the above code in a class, so our code will now look like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    global $user;
    $user = user_load($user->uid);
    $node = node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }
}

function mymodule_user_score() {
  $score = new MyModuleUserScore();
  return $score->mymodule_user_score();
}

That wasn’t that hard, right? I like to keep each of my classes in its own file, but for simplicity’s sake let’s assume everything is in the same file.

Move dependencies into their own methods

There are a few dependencies in this function: global $user, user_load(), and node_load(). All of these are not available to unit tests, so we need to move them out of the function, like this:

class MyModuleUserScore {
  function mymodule_user_score() {
    $user = $this->globalUser();
    $user = $this->user_load($user->uid);
    $node = $this->node_load($user->field_score_nid['und'][0]['value']);
    return $node->field_score['und'][0]['value'];
  }

  function globalUser() {
    return global $user;
  }

  function user_load($uid) {
    return user_load($uid);
  }

  function node_load($nid) {
    return node_load($nid);
  }

}

Your dependency methods should generally only contain one line. The above code should behave in exactly the same way as the original.

Override dependencies in a subclass

Our next step will be to provide mock versions of our dependencies. The trick here is to make our mock versions return values which are expected by the main function. For example, we can surmise that our user is expected to have a field_score_nid, which is expected to contain a valid node id. We can also make similar assumptions about how our node is structured. Let’s make mock responses with these assumptions:

class MyModuleUserScoreMock extends MyModuleUserScore {
  function globalUser() {
    return (object) array(
      'uid' => 123,
    );
  }

  function user_load($uid) {
    if ($uid == 123) {
      return (object) array {
        field_score_nid => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 234,
            ),
          ),
        ),
      }
    }
  }

  function node_load($nid) {
    if ($nid == 234) {
      return (object) array {
        field_score => array(
          LANGUAGE_NONE => array(
            array(
              'value' => 3000,
            ),
          ),
        ),
      }
    }
  }

}

Notice that our return values are not meant to be complete: they only contain the minimal data expected by our function: our mock user object does not even contain a uid property! But that does not matter, because our function is not expecting it.

Write a test

It is now possible to write a unit test for our logic without requiring the database. You can copy the contents of this sample unit test to your module folder as mymodule.test, add files[] = mymodule.test to your mymodule.info, enable the simpletest modules and clear your cache.

There remains the task of actually writing the test: in your testModule() function, the following lines will do:

public function testModule() {
  // load the file or files where your classes are located. This can
  // also be done in the setUp() function.
  module_load_include('module', 'mymodule');

  $score = new MyModuleUserScoreMock();
  $this->assertTrue($score->mymodule_user_score() == 3000, 'User score function returns the expected score');
}

Run your test

All that’s left now is to run your test:

php ./scripts/run-tests.sh --class mymoduleTestCase

Then add above line to your continuous integration server to make sure you’re notified when someone breaks it.

Your code is now ready to be fixed

Now, when your client asks for a small or big change, you can use test-driven development to implement it. For example, let’s say your client wants all scores to be multiplied by 10 (30000 should be the score when 3000 is the value in the node):

  • First, modify your unit test to make sure it fails: make the test expect 30000 instead of 3000
  • Next, change your code iteratively until your test passes.

What’s next

This has been a very simple introduction to dependency injection and unit testing for legacy code: if you want to do even more, you can make your Mock subclass as complex as you wish, simulating corrupt data, nodes which don’t load, and so on.

I highly recommend getting familiar with PHPUnit, which is part of Drupal 8, and which takes dependency injection to a whole new level: Juan Treminio’s “Unit Testing Tutorial Part I: Introduction to PHPUnit”, March 1, 2013 is the best introduction I’ve found.

I do not recommend doing away entirely with functional, database, and web tests, but a layered approach where most of your tests are unit tests, and you limit the use of functional tests, will allow you to keep your test runs below an acceptable duration, making them all the more useful, and increasing the overall quality of new and even legacy code.

Please enable JavaScript to view the comments powered by Disqus.

Jun 05 2015
Jun 05

Backups are very important for every application, especially if a lot of data is stored in your database. For a website with few updates it is not so important to do backups regularly, you can just take the backup of last week for restoring the site and if there was just one or two updates, you can add them manually afterwards. But if you run a community site with user generated content and a lot of input the topic backup & recovery becomes a lot more important but also complex. If the last backup is from last night you have to consider all the updates that were made in the meantime. Because you don’t know what the users have entered, it is impossible to add these changes afterwards manually. That is why you need a backup strategy that also considers the storing of all updates in the time between two full backups.

There are four methods for backup and recovery a MySQL database. All other methods are based on these 4 methods. These are logical and physically backup methods.

Physical backups

Storing the binary MySQL files

With this method the real MySQL database files for the tables, in which all data is physically stored on the hard disk, are copied to a save location. If a backup is needed the files can be copied back to the server.

Backup

service mysql stop;
cp -R /var/lib/mysql/database_name target
service mysql start;


Recovery

service mysql stop;
cp -R /path/to/backup/database_name /var/lib/mysql/database_name
service mysql start;

Advantages:

  • fast
  • easy for backup
  • multiple files, if one is broken, hopefully just this table is lost, not the whole database

Disadvantages:
  • takes a lot of diskspace, all the indexes etc. are copied too
  • The database has to switched off for a certain time during the backup
  • Restoring can become a little complex
  • you need special permissions on the operation system

LVM snapshot

Linux provides a Logical Volume Manager (LVM) (http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and is a layer to manage the filesystem. The LVM provides the ability to create snapshots of any logical volume. So, you create a backup of the physical volume which can be easily used for recovering in the future. This is one of the best and easiest solutions, it is very fast, very easy and the potntial error level is very low. You don't have to set your database or application offline, there will be no locks on the tables and you get a stable snapshot of the current status.

Backup # create the snapshot
lvcreate -l100%FREE -s -n mysql-backup /data/databases

# Create a mount-point and mount the volume/snapshot
mkdir -p /mnt/snapshot
mount /data/databases/mysql-backup /mnt/snapshot

# Do the backup and copy it to a separate system
tar -cf /tmp/dbbackup_YYYMMDD_H:i:s.tar /mnt/snapshot
cp /tmp/dbbackup_YYYMMDD_H:i:s.tar ip:/path/to/backups/

# Now remove the mount and the snapshot
umount /mnt/snapshot
lvremove /data/databases/mysql-backup


Recovery # copy back the backup to your server
cp ip:/path/to/backups/dbbackup_YYYMMDD_His.tar  /tmp/

# stop the database
service mysql stop;
# remove the old database files
rm -R /var/lib/mysql/database_name

# copy the backup
cp /tmp/dbbackup_YYYMMDD_His.tar .

# unpack the files
tar xvf dbbackup_YYYMMDD_His.tar

restart the database
service mysql start;

Advantages:

  • very fast
  • easy
  • no stop of the database, no lock on tables

Disadvantages:
  • LVM needed
  • maybe root access for the operation system needed

Logical backups

mysqldump

A mysql dump is a common strategy for backups. This is a logical backup, means the structure and the content of the database are exported into a special file in a special format. This is done by using the MySQL syntax and stores all relevant information’s that are needed to rebuild the database. Normally there is a statement to create the database again, statements to rebuild the tables and their structure and then statements to import the data into the tables itself. All these information’s are stored in one file and this can be copied to a save location. When a backup is needed the file can be imported and the database will be restored with the information’s stored in the file.

Backup

# create the dump
mysqldump --add-drop-database -u Username -pPassword database_name > dump_database_name_YYYMMDD_His.mysql


Recovery

# drop the old table and insert the backup dump data
mysql -u Username -pPassword database_name < dump_database_name_YYYMMDD_His.mysql


Advantages:
  • Very easy
  • Can be done by users without special permissions on the operation system, esp. root permissions
  • The indexes are not stored, so this backup does not uses as much disk space as a physical file backup
  • You can have a look into the backup and also search in it, data manipulation is also possible if something has to be changed

Disadvantages:
  • Slower than a physical backup
  • Slower in recovery, because everything has to be imported first and then the indexes have to be built again
  • One file, if this is broken the backup is not possible

Hints:

  • --lock-all-tables: locks all tables during creation of the dump, so the application cannot access them, avoids data inconsistences!
  • --single-transaction: the whole dump is executed as an transaction, so the application can still access and write the database, the dump is made via transaction
  • --master-data: set the location of the master in a database replication, so the slave knows the position and where to start with the replication
  • --add-drop-database: add a DROP DATABASE statement before each CREATE DATABASE statement

Innobackup

There are special tools to create a innobackup, see also http://dev.mysql.com/doc/mysql-enterprise-backup/3.8/en/ihb-meb-compatib....
This is a special case for databases where the storage engine innodb is used. As innodb became the default storage engine and mysisam will be removed in the future, this is also a very common possibility to create a database backup. It is nearly the same as a normal MySQL dump but also considers the special possibilities of innodb like foreign key constraints etc. There is the MySQL Enterprise Backup (MEB) (http://www.mysql.com/products/enterprise/backup.html) for creating innobackups, but it costs money.

There is also an open source tool from percona named xtrabackup - http://www.percona.com/software/percona-xtrabackup. There is a free version available but also a enterprise edition which also costs money. As percona (http://www.percona.com) offers a lot of useful tools around mysql, this is maybe also a good choice for your MySQL toolkit. There are also other tools from percona which help to improve you daily live with MySQL.

Hints

Master-Slave replication for backups and avoiding downtimes

A special hint at this point: if possible, use a master-slave replication and use the slave for building the backups, so the main system if not affected and performance for the application is not affected. It is also a good setup to avoid long downtimes of your application. If one server crashes you can switch to the other with your application and it will stay online. In the meantime you can repair the broken system and then restore the old setup. So, if the master breaks, you can switch to the salve and it becomes master. If the slave fails, just the read requests of your application have to be routed to the master.

Fromdual Bman

Fromdual.com offers the tool fromdual_bman (http://fromdual.com/mysql-backup-manager-mysql_bman) which is a wrapper for seven different backup methods (the ones mentioned above and combinations of them).

Backup location

A backup is always better than no backup. But if the backup is stored on the same logical disk on the server where your website also runs on, you mabe get in trouble. If the disk crashes your website is offline and you also loose your backup. In this case it is impossible to restore your website on another server and bring it back online. So, always save your backups on another logical volume or on another server. If the data is very important also consider to save your backups on multiple locations maybe also in other data centers. In a case of fire or something similar your backup can be fetched from somewhere else and recovery can run in another data center.

Uuuups Queries - Accident on database

So called uuuups queries are queries where accidentially a wrong query was executed in a production system. This mostly happens because somebody executes the query manually by accident. There are multiple reasons why this can happen, for example mixing up the consoles etc.

Time is the key, so you have to act immediately!

Stop the database and your application immediately! Set your application to maintenance mode!

Two possibilities:
  1. no database replication

    You can only reimport the last backup, whenever it was made. Hopefully it was made not long time ago, maybe last night. By this you loose all changes between your last backup and the time of your uuuups query. Or, if you know the changes that were made in this time, you can fix the changes manually or by writing a script which does the changes for you.

  2. a running database replication
    You can reimport the last backup. By using a replication you automatically get the binary logs where MySQL writes the databases changes to be executed on the slave server. You can use these binary logs to create a point in time recovery, that means you can recover everything until the execution of the wrong query. After the import of the dump you can execute the mysql binary logs containing either the statements (statement based replication) or the changed rows (row based replication). Because all the changes are stored in these files you get all the changes that were made in the time of your last backup and the time the uuups query was executed. Do not forget to avoid the execution of the uuups query again, it is also in the binary logs! You can edit the binary logs by using the myqlbinlog tool (http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html) and delete the uuups query from the log. This has to be done on both servers, on the master and on the slave. But when one of the servers (use the master first) is recovered you can enable you application again by just using this first server. After that you can recover the slave server and restart the replication. If you are an experienced user you can also start the recovering of the master and the slave server together, so both systems are nearly back at the same time. But so not mix up the systems, else you have to start from the beginning and your application stays offline.

We hope these hints can help you in your daily life with MySQL. There are also other posts about MySQL. Because in this post there was a lot about database replication mentioned, the post about MySQL - setup is also interesting for you.

Other MySQL blog posts

May 29 2015
May 29

Last week, some colleagues from Cocomore and I attended DrupalCamp Spain 2015. Spanish Drupal community is awesome, and they have put all their efforts in making an unforgettable event again in this 6th edition (the 5th I have attended).

The event was divided into different activities for the three days: Business Day and Sprints on Friday, and sessions on Saturday and Sunday.

Starting my session.
Starting my session. Photo: pakmanlh (https://twitter.com/pakmanlh/status/602105515745910786)

I participated as speaker talking about dos and dont’s building a Drupal 8 site. We looked at our experiences with managing the project structure, the different ways of using Composer for managing your project, different merging strategies, evaluated the status of contrib and how we managed to reduce the risk of using betas by writing Behat tests and doing Continuous integration.

The topic is quite relevant, so got a lot of questions at the end and during all weekend. Keep them coming if you think I can help you!

Recording: https://vimeo.com/129005035
Slides: https://docs.google.com/presentation/d/1FPABmI1GVOUJzmS09JCXzObQuYhzU3rg...

DrupalCamp Spain is a well settled event already, so it’s attracting more and more international participants every year. Social events are well-planned and attractive, and the event is having more English sessions every year. I’m looking forward to next edition!

Visiting the tabancos.
Visiting the tabancos. Photo: juampy (https://twitter.com/juampynr/status/602192531636559872)

That’s a wrap! DrupalCamp Spain 2015 was an amazing event and I for sure will be there again next year. Thanks to the organization for their hospitality, it was real fun sharing those days with you!

Attachment Size thinking_about_a_drupal_8_project-_here_is_my_story_drupalcampspain2015_public.pdf 3.97 MB
May 12 2015
May 12

When setting up a MySQL Server there are a lot of things to consider. Most requirements depend on the intended usage of the system.

Beginning with the hardware requirements, this series of blogposts consists of our own experience, operating several hundred MySQL instances as well as a recent workshop our team attended. Before you now start optimizing your MySQL instance, consider the following: Standards are more important in multi-instance setups than individual performance tuning. Standards allow for easier debugging and give you a better handling for security related incidents. We strongly recommend the use of configuration management tools, like puppet or Chef for environments that are supposed to be consistent over all your systems.

The MySQL default settings, as of version 5.6 are great - and apart from larger systems, hardly any changes are necessary. For a standard LAMP stack, hosting a blog or a small website, not a lot of optimization is needed. Todays CMS come with sufficient configurations and tampering with the back end itself can cause more harm than it will help optimizing performance. These articles are mostly interesting, if you are developing your own applications. This first article focuses on the design of the underlying platform. It makes an estimated 10%-20% of the possible performance tuning. Nevertheless, if your application is poorly designed or it is not optimized, it will become quite expensive to try and fix the system the hardware-way. So, the first step is always to ensure a well structured and optimized application. We will address this in a future article.


Hardware / Operation system

Here are some general hardware considerations, viable for most setups:

CPU

MySQL does NOT scale a single query to multi-cores. (one query = one core). So it is better for your instance to have less - but faster - cores than vice versa. Try to avoid multi-CPU (note: multiple physical CPU, not cores), systems with a lot of ram. Usually CPU's have a direct bus to their RAM-banks. If RAM on a bank held be another CPU needs to be addressed, there will be overhead. In high performance setups rather go for a hexa- or octa-core than two quad-core CPU.

RAM

RAM has become a cheap resource these days. Use as much RAM as possible but it should match the database-size and have some extra space for the operating system and the application if existent on the same machine. If the database is served mostly or completely from RAM, it tends to have fast read speeds and only needs to write to disk, not to read from them, in most cases. This will be a very huge performance optimization, because operations in the RAM are easily 100 times faster than an SSD.

HDD

If you are using conventional, magnetic storage, faster turning disks are preferred. Try to go for 15k/RPM+. SSD or solid state drives are obviously even better than that and outperform every other storage medium at random IOPS (input output per seconds). For the RAID setup, go for RAID10 (better) or RAID1 (cheaper) but avoid RAID5/6 etc because the performance is bad at random writes. The less read/write operations from disk the better the performance.

File system

If you are not sure about the file system, go with the recommended type of your operating system. In general that means: Stay with ext4. In the very unlikely event of file system problem, the most common will make debugging a lot easier and in the long term, this outweighs the possible performance gain of other file systems by far.

I/O scheduler

The standard scheduler on a lot of operation systems is CFQ (Completely Fair Queuing). For databases, consider switching to NOOP or Deadline. Find out the scheduler your system is using

cat /sys/block/sda/queue/scheduler

This will give you an output like

noop [deadline] cfq

where the used scheduler is shown in brackets. As of kernel 2.6.10 it is possible to change the scheduler at runtime. You will need root permissions to do so.

echo noop > /sys/block/sda/queue/scheduler

to set your scheduler to noop. This setting is not persistent and will be reset at reboot. To make a setting permanent, it needs to be given to the kernel as a parameter at boot time.

elevator=<scheduler>

needs to be added, where <scheduler> needs to be noop or deadline.


Storage engine

If starting a new database, especially with MySQL 5.6, InnoDB will be be the standard engine. For already running projects, there is a way to convert MyISAM tables to InnoDB. This is usually not recommended as the performance gain will only impact very high performance databases and there is always a risk involved.

General Setup

Operation system

The decision for the operation system is one of the less important. At Cocomore we stay with the Ubuntu LTS versions. As mentioned in the beginning, systems that are setup the same way make configuration and debugging way easier and improve stability. The problem with LTS systems is that they oftentimes only offer relatively old packages. The default on Ubuntu 14.04, released in April 2014 is MySQL 5,5.

There are a couple of ways to install newer versions. As of February 2015, the newest stable MySQL version ist 5.6. This can be installed from the Ubuntu universe repository. Oracle themselves offer a repository for MySQL as well, you might need to login first, though. We recommend using the latest stable release, as of February 2015, this is version 5.6.


Database Replication

There are two possibilities:
  • statement based - In MySQL 5.1.4 and earlier, binary logging and replication used this format exclusively.
  • row based

Statement based replication stores the SQL statement in the binary log and will be executed on the slave server. This has the disadvantage that for example time updates etc. will also be executed as said within the statement. Because of this, there is a chance of inconsistency between Master and slave because the time or random statement will lead to different results on the different servers.

Row base replication is superior because only the results of a statement (the changed rows), will be stored in the binary logs and then the appropriate rows will be changed to the new result on the slave. But this requires much more disk space, because now not just the sql statement is stored any more but the whole rows which are changed.

The replication method can be changed on the fly, but we recommend the row based replication to avoid inconsistency between Master and Slave. Diskspace became very cheap, so there is no need to use the statement based replication.

As of MySQL 5.1.8, the server can change the binary logging format in real time according to the type of event using mixed-format logging. When the mixed format is in effect, statement-based logging is used by default, but automatically switches to row-based logging in particular cases.

High availability solutions

If your database goes down, your application will not work anymore. Therefore we recommend a high availability setup for your system if your application should stay online although there are some troubles in the background. There are some possibilities to ensure high availability for your database. The simple ones are easier and less expensive as an gallera cluster for example. It depends on the worth of your application how mush time and mone you want to spend to keep your application always online.


Master - Slave setup

  • sell also Master - Slave setup
  • simple asynchronous replication
  • in case of failure, switch to the slave and recover the master
  • switching can be done manually or automatically for example using Heartbeat

Active / passive cluster

  • same as before using a master - slave replication
  • using a vitual IP address which points to the master
  • underlying system is a SAN or DRBD
  • Heartbeat is switching the virtual IP address to the other system (slave) in case of error

mysql_ha_active_passive_cluster.jpg

Gallera cluster

  • see also Gallera cluster
  • Gallera is a cluster system of multiple environments with multiple masters which are all synchronized
  • there is a load-balancer switching the load to the different master databases
  • Gallera is synchronizing the multiple master databases, if statements are executed
  • works as long as the majority of the masters is online and connected to Gallera, therefore it has to be always an odd-numbered amount of Master databases
  • If one Master db crashes the others are still online and the crashed one can be repaired

mysql_ha_gallera_cluster.jpg

Conclusion

The first step for optimization is always your application itself. But there are also other possibilities to tune up your system by choosing the right hardware, the right operations system (OS), the correct file system and the best storage engine of MySQL itself for your application. With newer MySQL Versions new features but also performance improvements are delivered. Currently MySQL Version 5.6 (February 2015) is a very good way to deal with.

To ensure that your application is not going down if your database goes down there are possibilities to ensure high availability using database replications and different setups. It depends on your budget which solutions can be used, but always consider all possibilities of improvements (Hardware, DB tuning, replication or application tuning) and the estimated effect, so you can choose the improvement with the best cost-value ratio.

If you have questions or you need help with your application, please do not hesitate to contact us.

In our next post we will provide you some informations how to improve your MySQL settings in the my.cnf itself.

Other MySQL blog posts

Apr 13 2015
Apr 13

This YouTube video doesn’t need any further explanation beside it’s title: The Drupal Rap song – Everyday I’m Drupalin’

Lyrics:

Chorus
Everyday I’m drupalin

Verse
Where them forms you gettin fapi with I’m the fapi boss/ hookin into edit form and webforms is my specialty sauce/ I’ll hook form alter by form id’s or entities/ put a list on Ajax/ just to keep it callin back/

I got them distrobutions, I’m like acqia/
Check my public repos, I didn’t copy nuttin/ I know dries n webchick, I kno Ryan szrama/ all the commerce guys we hipchat when they got some drama/
Might not be pretty code but it gets me paid/ I’m using rules like php loopin through arrays/ I put it all in features, so the code is stable/ it might take longer, but next time I just click enable/ These dudes clearin caches, on every hook init/ queries by thousands, page loads by the minutes

Verse
No matter the language we compress it hard/ drugs cc all, we just drugs cc all/
Where’s all of the changes, you never saw/ so drush cc all, we just drugs cc all/ I lean heavy on smacss, compass compilin my sass/ you just installed flexslider now you teachin a class/
I seen your content types, I don’t need to kno you/ to know that we ain’t even in the same nodequeue/
I’m on drupal answers, check my reputation/ I’m on my tablet earnin karma while I’m on vacation/ ya girl like a module, she stay hookin n/ you couldn’t code an info file, without lookin in/
Mo scrums, equals better sprints, break the huddle, n the work begins

Mar 16 2015
Mar 16

Security is an important aspect to keep an eye for, and this time it’s about preventing clickjacking on Drupal and other Apache web applications.

Edit apache’s configuration file, which may be your declared vhost or such, usually at a location like /etc/httpd/conf.d/default.conf and make sure the  following

<IfModule mod_headers.c>
Header always append X-Frame-Options SAMEORIGIN
</IfModule>

This will disable embedding your website as an iFrame.

0013_clickjacking

Mar 09 2015
Mar 09

Apache Obfuscation can be achieved very easily and the benefits are great – it doesn’t disclose server information such as versions, OS, and does output verbose errors when ‘bad things happen’, and they happen.

2870445260_82be0db1db_z

Edit apache configuration, usually available here for RedHat based distributions: /etc/httpd/conf/httpd.conf

Make sure the following settings are present, save, and restart apache:

TraceEnable Off
ServerSignature Off
ServerTokens Prod

How do we test that this is actually working?

How to TraceEnable:
1. curl -v -X TRACE http://…
2. Confirm you get a forbidden response

How test ServerTokens:
1. Make a request to the website and check the response headers
2. Confirm the response contains only “Apache” information in the Server header

How to test ServerSignature:
1. Make a request to the website for a URL that should respond with Apache server error
2. Confirm you don’t see information about the apache server software version, OS, etc.

Feb 23 2015
Feb 23

February 23, 2015

Continuous integration (CI) is the practice of running a series of checks on every push of your code, to make sure it is always in a potentially deployable state; and to make sure you are alerted as soon as possible if it is not.

Continuous integration and Drupal projects

This blog post is aimed at module maintainers, and we’ll look at how to use CI for modules hosted on Drupal.org. I’ll use as an example a project I’m maintaining, Realistic Dummy Content.

The good news is that Drupal.org has a built-in CI service for hosted modules: to use it, project maintainers need to click on the “Automated Testing” tab of their projects, enable automated testing, and make sure some tests are defined.

Once you have enabled automated testing, every submitted patch will be applied to the code and tested, and the main branches will be tested continually as well.

If you’re not sure how to write tests, you can learn by example by looking at the test code of any module which has automated testing enabled.

Limitations of the Drupal.org QA system

The system described above is great, and in this blog post we’ll explore how to take it a bit further. Drupal’s CI service runs your code on a new Drupal site with PHP 5.3 enabled. We know this by looking at the log for a test on Realistic Dummy content, which contains:

[13:50:02] Database backend [mysql] loaded.
...
[simpletest.db] =>
[test.php.version] => 5.3
...

For the sake of this article, let’s say we want to use SQLite with php 5.5, and we also want to run checks from the coder project’s coder_review module. We can’t achieve this within the Drupal.org infrastructure, but it is possible using Docker, CircleCI, and GitHub. Here is how.

Step 1: get a local CoreOS+Docker environment

Let’s start by setting up a local development environment on which we can run Docker. Docker is a system which uses Linux containers to run your software and all its dependencies in an isolated environment.

If you need a primer on Docker, check out Getting Started with Docker on Servers for Hackers (March 20, 2014), and A quick intro to Docker for a Drupal project.

Docker works best on CoreOS, which you can install quite easily on any computer using Vagrant and VirtualBox, as explained at Running CoreOS on Vagrant.

Step 2: Add a Dockerfile to your project

Because, in this example, we want to run tests which require changing things on the server, we’ll use the Docker container management system to simulate a Ubuntu machine over which we have complete control.

To see how this works, download the latest dev version of realistic_dummy_content to your CoreOS VM, take a look at the included files ./Dockerfile and ./scripts/test.sh to see how they are structured, then run the test script:

./scripts/test.sh

Without any further configuration, you will see tests run on the desired environment: Ubuntu with the correct version of PHP, SQLite, and coder review. (You can also see the results on CircleCI on the project’s CI dashbaord if you unfold the “test” section – we’ll see how to set that up for your project later on).

Setting up Docker for your own project is just a question of copy-pasting a few scripts.

Step 3: Make sure there is a mirror of your project on GitHub

Having test results on your command line is nice, but there is no reason to run them yourself. For that we use continuous integration (CI) servers, which run the tests every time someone commits something to your codebase.

Some of you might be familiar with Jenkins, which I use myself and which is great, but for open source projects, there are free CI services out there: the two I know of, CircleCI and Travis CI, synchronize with GitHub, not with Drupal.org, so you need a mirror of your project on GitHub.

Note that it is possible, using the tool HubDrop, to mirror your project on GitHub, but it’s not on your account, whereas the CI tools sync only with projects on your own account. My solution has been to add a ./scripts/mirror.sh script to Realistic Dummy Content, and call it once every ten minutes via a Jenkins job on my personal Jenkins server. If you don’t have access to a Jenkins server you can also use a cron job on any server to do this.

The mirror of Realistic Dummy Content on GitHub is here.

As mentioned above, two of the CI tools out there are CircleCI and Travis CI. One of my requirements is that the CI tool integrate well with Docker, because that’s my DevOps tool of choice.

As mentioned in Faster Builds with Container-Based Infrastructure and Docker (Mathias Meyer, Travis CI blog, 17 Dec. 2014), it seems that Travis CI is moving towards Docker, but it seems that its new infrastructure is based on Docker, but does not let you run your own Docker containers.

Circle CI, on the other hand, seems to provide more flexibility with regards to Docker, as explained in the article Continuous Integration and Delivery with Docker on CircleCI’s website.

Although Travis is a great, widely-used tool (Drush uses it), we’ll use CircleCI because I found it easier to set up with Docker.

Once you open a CircleCI account and link it to your GitHub account, you will be able to turn on CI for your mirrored project, in my case Realistic Dummy Content.

Step 5: Add a circle.yml file to your project

In order for Circle CI to know what to do with your project, it needs a circle.yml file at the root of your project. If you look at the circle.yml file at the root Realistic Dummy Content, it is actually quite simple:

machine:
  services:
    - docker

test:
  override:
    - ./scripts/test.sh

That’s it! Commit your circle.yml file, and if mirroring with GitHub works correctly, Circle CI will test your build. Debug any errors you may have, and voilà!

Here is the result of a recent Realistic Dummy Content build on CircleCI: unfold the “test” section to see the complete output: PHP version, SQLite database, coder review…

Conclusion

We have seen how you can easily add Docker support to make sure the tests and checks you run on your code are in a controlled environment, with the extensions you need (one could imagine a module which requires some external system like ApacheSolr installed on the server – Docker allows this too). This is one concrete application of DevOps: reducing the risk of glitches where “tests pass on my dev machine but not on my CI server”.

Please enable JavaScript to view the comments powered by Disqus.

Feb 18 2015
Feb 18

February 18, 2015

I recently added Docker support to Realistic Dummy Content, a project I maintain on Drupal.org. It is now possible (with Docker installed, preferably on a CoreOS VM) to run ./scripts/dev.sh directly from the project directory (use the latest dev version if you try this), and have a development environment, sans MAMP.

I don’t consider myself an expert in Docker, virtualization, DevOps and config management, but here, nonetheless, is my experience. If I’m wrong about something, please leave a comment!

Intro: Docker and DevOps

The DevOps movement, popularized starting in about 2010, promises to include environment information along with application information in the same git repo for smoother development, testing, and production environments. For example, if your Drupal module requires version 5.4 of PHP, along with a given library, then that information should be somewhere in your Git repo. Building an environment for testing, development or production should then use that information and not be dependent on anything which is unversioned. Docker is a tool which is anchored in the DevOps movement.

DevOps: the Config management approach

The family of tools which has been around for awhile now includes Puppet, Chef, and Ansible. These tools are configuration management tools: they define environment information (PHP version should be 5.3, Apache mod_rewrite should be on, etc.) and make sure a given environment conforms to that information.

I have used Puppet, along with Vagrant, to deliver applications, including my Jenkins server hosted on GitHub.

Virtualization and containers

Using Puppet and Vagrant, you need to use Virtualization: create a Virtual Machine on your host machine.

Docker works with a different principle: instead of creating a VM on top of your host OS, Docker uses containers, so resources are shared. The article Getting Started with Docker (Servers for Hackers, 2014/03/20) contains some graphics which demonstrate how much more efficient containers are as opposed to virtualization.

Puppet and Vagrant are slow; Docker is fast

Puppet and Vagrant together work for packaging software and environment configuration, but it is excruciatingly slow: it can take several minutes to launch an environment. My reaction to this has been to cringe every time I have to do it.

Docker, on the other hand, uses caching agressively: if a server was already in a given state, Docker uses a cached version of it to move along faster. So, when building a container, Docker goes through a series of steps, and caches each step to make it lightning fast.

One example: launching a dev environment of the Jenkins Vagrant project on Mac OS takes over five minutes, but launching a dev environment of my Drupal project Realistic Dummy Content (which uses Docker), takes less than 15 seconds the first time it is run once the server code has been downloaded, and, because of caching, less than one (1) second subsequent times if no changes have been made. Less than one second to fire up a full-fledged development environment which is functionally independent from your host. That’s huge to me.

Configuration management is idempotent, Docker is not

Before we move on, note that Docker is not incompatible with config management tools, but Docker does not require them. Here is why I think, in many cases, config management tools are not necessary.

The config management tools such as Puppet are idempotent: you define how an environment should be, and the tools run whatever steps are necessary to make it that way. This sounds like a good idea in theory, but it looks like this in practice. I have come to the conclusion that this is not the way I think, and it forces me to relearn how to think of my environments. I suspect that many developers have a hard time wrapping their heads around idempotence.

Docker is not idempotent; it defines a series of steps to get to a given state. If you like idempotence, one of the steps can be to run a puppet manifest; but if, like me, you think idempotence is overrated, then you don’t need to use it. Here is what a Dockerfile looks like: I understood it at first glace, it doesn’t require me to learn a new way of thinking.

The CoreOS project

The CoreOS project has seen the promise of Docker and containers. It is an OS which ships with Docker, Git, and a few other tools, but is designed so that everything you do happens within containers (using the included Docker, and eventually Rocket, a tool they are building). The result is that CoreOS is tiny: it takes 10 seconds to build a CoreOS instance on DigitalOcean, for example, but almost a minute to set up a CentOS instance.

Because Docker does not work on Mac OS without going through hoops, I decided to use Vagrant to set up a CoreOS VM on my Mac, which is speedy and works great.

Docker for deploying to production

We have seen that Docker can work for quickly setting up dev and testing environments. Can it be used to deploy to production? I don’t see why not, especially if used with CoreOS. For an example see the blog post Building an Internal Cloud with Docker and CoreOS (Shopify, Oct. 15, 2014).

In conclusion, I am just beginning to play with Docker, and it just feels right to me. I remember working with Joomla in 2006, when I discovered Drupal, and it just felt right, and I have made a career of it since then. I am having the same feeling now discovering Docker and CoreOs.

I am looking forward to your comments explaining why I am wrong about not liking idempotence, how to make config management and virutalization faster, and how and why to integrate config management tools with Docker!

Please enable JavaScript to view the comments powered by Disqus.

Feb 09 2015
Feb 09

February 09, 2015

To get the most of this blog post, please read and understand Getting Started with Docker (Servers for Hackers, 2014/03/20). Also, all the steps outlined here have been done on a Vagrant CoreOS virtual machine (VM).

I recently needed a really simple non-production Drupal Docker image on which I could run tests. b7alt/drupal (which you can find by typing docker search drupal, or on GitHub) worked for my needs, except that it did not have the cUrl php library installed, so drush en simpletest -y was throwing an error.

Therefore, I decided to create a new Docker image which is based on b7alt/drupal, but with the php5-curl library installed.

I started by creating a new local directory (on my CoreOS VM), which I called docker-drupal:

mkdir docker-drupal

In that directory, I created Dockerfile which takes b7alt/drupal as its base, and runs apt-get install curl.

FROM b7alt/drupal

RUN apt-get update
RUN apt-get -y install curl

(You can find this code at my GitHub account at alberto56/docker-drupal.)

When you run this you will get:

docker build .
...
Successfully built 55a8c8999520

That hash is a Docker image ID, and your hash might be different. You can run it and see if it works as expected:

docker run -d 55a8c8999520
c9a98bdcab4e027e8571bde71ee92b4380247a44ef9314749ef5680864de2928

In the above, we are telling Docker to create a container based on the image we just created (55a8c8999520). The resulting container hash is displayed (yours might be different). We are using -d so that our containers runs in the background. You can see that the container is actually running by typing:

docker ps
CONTAINER ID        IMAGE               COMMAND...
c9a98bdcab4e        55a8c8999520        "/usr/bin/supervisor...

This tells you that there is a running container (c9a98bdcab4e) based on the image 55a8c8999520. Again, your hases will be different. Let’s log into that container now:

docker exec -it c9a98bdcab4e bash
[email protected]:/#

To make sure that cUrl is successfully installed, I will figure out where Drupal resides on this container, and then try to enable Simpletest. If that works, I will consider my image a success, and exit from my container:

[email protected]:/# find / -name 'index.php'
/srv/drupal/www/index.php
[email protected]:/# cd /srv/drupal/www
[email protected]:/srv/drupal/www# drush en simpletest -y
The following extensions will be enabled: simpletest
Do you really want to continue? (y/n): y
simpletest was enabled successfully.                   [ok]
[email protected]:/srv/drupal/www# exit
exit

Now I know that my 55a8c8999520 image is good for now and for my purposes; I can create an account on Docker.com and push it to my account for later use:

Docker build -t alberto56/docker-drupal .
docker push alberto56/docker-drupal

Anyone can now run this Docker image by simply typing:

docker run alberto56/docker-drupal

One thing I had a hard time getting my head around was having a GitHub project and Docker project, and both are different but linked. The GitHub project is the the recipe for creating an image, whereas the Docker project is the image itself.

One we start thinking of our environments like this (as entities which should be versioned and shared), the risk of differences between environments is greatly reduced. I was used to running simpletests for my projects on an environment which is managed by hand; when I got a strange permissions error on the test environment, I decided to start using Docker and version control to manage the container where tests are run.

Please enable JavaScript to view the comments powered by Disqus.

Feb 06 2015
Feb 06

February 06, 2015

I have been using Simpletest on Drupal 7 for several years, and, used well, it can greatly enhance the quality of your code. I like to practice test-driven development: writing a failing test first, then run it multiple times, each time tweaking the code, until the test passes.

Simpletest works by spawning a completely new Drupal site (ignoring your current database), running tests, and destroying the database. Sometimes, a test will fail and you’re not quite sure why. Here are two tips to help you debug why your tests are failing:

Tip #1: debug()

The Drupal debug() function can be placed anywhere in your test or your source code, and the result will appear on the test results page in the GUI.

For example, if when you are playing around with the dev version of your site, things work fine, but in the test, a specific node contains invalid data, you can add this line anywhere in your test or source code which is being called during your test:

...
debug($node);
...

This will provide formatted output of your $node variable, alongside your test results.

Tip #2: die()

Sometimes the temporary test environment’s behaviour seems to make no sense. And it can be frustrating to not be able to simply log into it and play around with it, because it is destroyed after the test is over.

To understand this technique, here is quick primer on how Simpletest works:

  • In Drupal 7, running a test requires a host site and database. This is basically an installed Drupal site with Simpletest enabled, and your module somewhere in the modules directory (the module you are testing does not have to be enabled).
  • When you run a test, Simpletest creates a brand-new installation of Drupal using a special prefix simpletest123456 where 123456 is a random number. This allows Simpletest to have an isolated environment where to run tests, but on the same database and with the same credentials as the host.
  • When your test does something, like call a function, or load a page with, for example, $this->drupalGet('user'), the host environment is ignored and temporary environment (which uses the prefixed database tables) is used. In the previous example, the test loads the “user” page using a real HTTP calls. Simpletest knows to use the temporary environment because the call is made using a specially-crafted user agent.
  • When the test is over, all tables with the prefix simpletest123456 are destroyed.

If you have ever tried to run a test on a host environment which already contains a prefix, you will understand why you can get “table name too long” errors in certain cases: Simpletest is trying to add a prefix to another prefix. That’s one reason to avoid prefixes when you can, but I digress.

Now you can try this: somewhere in your test code, add die(), this will kill Simpletest, leaving the temporary database intact.

Here is an example: a colleague recently was testing a feature which exported a view. In the dev environment, the view was available to users with the role manager, as was expected. However when the test logged in as a manager user and attempted to access the view, the result was an “Access denied” page.

Because we couldn’t easily figure it out, I suggested adding die() to play around in the environment:

...
$this->drupalLogin($manager);
$this->drupalGet('inventory');
die();
$this->assertNoText('denied', 'A manager accessing the inventory page does not see "access denied"');
...

Now, when the test was run, we could:

  • wait for it to crash,
  • then examine our database to figure out which prefix the test was using,
  • change the database prefix in sites/default/settings.php from '' to (for example) 'simpletest73845'.
  • run drush uli to get a one-time login.

Now, it was easier to debug the source of the problem by visiting the views configuration for inventory: it turns out that features exports views with access by role using the role ID, not the role name (the role ID can be different for each environment). Simply changing the access method for the view from “by role” to “by permission” made the test pass, and prevented a potential security flaw in the code.

(Another reason to avoid “by role” access in views is that User 1 often does not have the role required, and it is often disconcerting to be user 1 and have “access denied” to a view.)

So in conclusion, Simpletest is great when it works as expected and when you understand what it does, but when you don’t, it is always good to know a few techniques for further investigation.

Please enable JavaScript to view the comments powered by Disqus.

Jan 20 2015
Jan 20

January 20, 2015

When building a Drupal 7 site, one oft-used technique is to keep the entire Drupal root under git (for Drupal 8 sites, I favor having the Drupal root one level up).

Starting a new project can be done by downloading an unversioned copy of D7, and initializing a git repo, like this:

Approach #1

drush dl
cd drupal*
git init
git add .
git commit -am 'initial project commit'
git remote add origin ssh://[email protected]/myproject

Another trick I learned from my colleagues at the Linux Foundation is to get Drupal via git and have two origins, like this:

Approach #2

git clone --branch 7.x http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

This second approach lets you push changes to your own repo, and pull changes from the Drupal git repo. This has the advantage of keeping track of Drupal project commits, and your own project commits, in a unified git history.

git push origin 7.x
git pull drupal 7.x

If you are tight for space though, there might be one inconvenience: Approach #2 keeps track of the entire Drupal 7.x commit history, for example we are now tracking in our own repo commit e829881 by natrak, on June 2, 2000:

git log |grep e829881 --after-context=4
commit e8298816587f79e090cb6e78ea17b00fae705deb
Author: natrak <>
Date:   Fri Jun 2 18:43:11 2000 +0000

    CVS drives me nuts *G*

All of this information takes disk space: Approach #2 takes 156Mb, vs. 23Mb for approach #1. This may add up if you are working on several projects, and especially if for each project you have several environments for feature branches. If you have a continuous integration server tracking multiple projects and spawning new environments for each feature branch, several gigs of disk space can be used.

If you want to streamline the size of your git repos, you might want to try the --depth option of git clone, like this:

Approach #3

git clone --branch 7.x --depth 1 http://git.drupal.org/project/drupal.git drupal
cd drupal
git remote rename origin drupal
git remote add origin ssh://[email protected]/myproject

Adding the --depth parameter here reduces the initial size of your repo to 18Mb in my test, which interestingly is even less than approach #1. Even though your repo is now linked to the Drupal git repo, by running git log you will see that the entire history is not being stored.

Please enable JavaScript to view the comments powered by Disqus.

Jan 12 2015
Jan 12

In the spirit of the computer video game Doom and its skill levels, we’ll review a few ways you can improve  your Drupal speed performance     and optimize for better results and server response time. These tips that we’ll cover may be at times specific to Drupal 6 versions, although     you can always learn the best practices from these examples and apply them on your own code base.

Doom

Doom skill levels: (easiest first)

1. I’m too young to die

2. Hey, not too rough

3. Hurt me plenty

4. Ultra-violence

5. Nightmare!

  This post is rated “I’m too young too die” difficulty level.

Drupal is known for its plethora of hooks, and their use is abundant through-out any Drupal modules to plug into the way that Drupal works. That’s fine, though once you’ve decided you’re moving on with Drupal as your live web application/website and you’re using modules from the eco-system, that is when you need to spend some more time reviewing modules a little bit closer than just their download counts or issues on drupal.org

hook_init() runs on every page load. Imagine you’re having a few modules implementing this hook, then you already have impact on your server response time performance for every page access in Drupal. Maybe those modules have a very slight overhead there, maybe that’s part of what they do, and that’s fine, but it may at times benefit you to review and investigate if the code there, that maybe your team added too, is better being re-factored to some other place and not on every page load.

There is another perspective for it of course, maybe things do need to take place on every page load, but their implementation in the code might be faulty. Imagine you’re doing some expensive IO on every page load, like calling an API, or querying a heavy table. Maybe you can re-factor to cache this information?

drupal_perf-4

Series Navigation<< Drupal Performance Tip – “I’m too young to die” – know your DB engines
Jan 02 2015
Jan 02
Drupal Commerce Delivery Partner

For eight years now, Cocomore builds large but also smaller websites with Drupal. Since then the team, the amount of projects and the possibilities have become more extensive. There were special solutions for special requirements in the past, for example content management systems (CMS) for managing content in websites (Wordpress, Jooma, Drupal) or e-Commerce systems for managing online shops like Magento. But those times are gone. In future, the challenge will be to integrate all these standalone systems to “rich-content-systems”, which will offer all the above mentioned functions within one big full service solution.

The integration and the interaction of these systems with each other offer the opportunity to use all the convenient functions of the different components together. That way, it will be possible to present the users a much better online solution with much more and much better functions. Just displaying a photo and some text for a product in a shop is not sufficient anymore today. The modern user expects more, especially concerning interactions with the shop itself like videos, custom reports, votings or a discussion board where he can get much more information and where he can interchange with others about the products. A pioneer in this sector is Amazon, where the user can comment and vote for a product - so users can get feedback from other users. Moreover, videos and further information is shown. This is today’s standard. This is what the user expects and every other shop in the web needs to come up to this mark.

Drupal Commerce is based on these means for interaction. Drupal has never been just a content management system, but also a framework with much more possibilities than just managing content. Drupal Commerce takes these advantages of Drupal and integrates an online shop on top. Consequently, all aspects of Drupal as a content management system, as a framework and also as a shop are combined. All features in one system. The usage of interfaces for migrating content or data from one system into another is not needed any more.

Since November 2014 the Cocomore AG is official Drupal Commerce Delivery Partner. The first aim of this partnership is a better establishment of Drupal Commerce in the German market. Therefore, some adaptions are needed for a better and smoother integration. The current strategy is to enlarge the existing Drupal Commerce distribution or to create a new German installation profile which fits the German market out of the box. In the current status,

Cocomore and the Commerce Guys are creating a concept to integrate further payment and tracking systems of logistic providers to make it fit better to German customers. Moreover, the consideration of the German system of taxation or the integration of several bonus programs, for example payback(.de), are planned. As soon as this concept phase is finished, the implementation can start. Then there are no barriers for Drupal Commerce anymore to become more famous in Germany. And with the implementation of first big lighthouse projects there will be good examples to set Drupal Commerce as the first choice for other customers as well. The future is based on cross-linked systems, and especially for big customers the interaction and integration of e-Commerce, CRM (customer relationship management) and personalization is a must-have today.

Nowadays, an online shop has to offer a special shopping experience for the end user to stand out from competitors, because the next online shop is just a click away.

Cocomore wants to help his customers to master this challenge by using Drupal and Drupal Commerce – either in e-Commerce or in another sector. Online communication becomes more and more important and it is impossible to succeed without it these days. With the sectors communication and IT the Cocomore AG offers – starting with the idea and concept, via arts and creation, up to implementation and management of the content - everything in one place.

Dec 15 2014
Dec 15

In the spirit of the computer video game Doom and its skill levels, we’ll review a few ways you can improve  your Drupal speed performance     and optimize for better results and server response time. These tips that we’ll cover may be at times specific to Drupal 6 versions, although     you can always learn the best practices from these examples and apply them on your own code base.

Doom

Doom skill levels: (easiest first)

1. I’m too young to die

2. Hey, not too rough

3. Hurt me plenty

4. Ultra-violence

5. Nightmare!

  This post is rated “I’m too young too die” difficulty level.

Drupal 6 shipped with all tables being MyISAM, and then Drupal 7 changed all that and shipped with all of its tables using the InnoDB database engine. Each one with its own strengths and weaknesses but it’s quite clear that InnoDB will probably perform better for your Drupal site (though it has quite a bit of fine tuning configuration to be tweaked on my.cnf).

Some modules, whether on Drupal 6, or those on Drupal 7 that simply upgraded but didn’t quite review all of their code, might ship with queries like SELECT COUNT() which if you have migrated your tables to InnoDB (or simply using Drupal 7) then this will hinder on database performance. That’s mainly because InnoDB and MyISAM work differently, and where-as this proved as quite a fast responding query being executed on a MyISAM database which uses the main index to store this information, for InnoDB the situation is different and will result in doing a full table scan for the count. Obviously, on an InnoDB configuration running such queries on large tables will result in very poor performance

drupal_perf-5

Note to ponder upon – what about the Views module which uses similar type of COUNT() queries to create the pagination for its views?

Series Navigation<< Drupal Performance Tip – replace views blocks with vanilla blocksDrupal Performance Tip – be humble on hook_init() >>
Dec 03 2014
Dec 03

December 03, 2014

What is content? What is configuration? At first glance, the question seems simple, almost quaint, the kind one finds oneself patiently answering for the benefit of Drupal novices: content is usually information like nodes and taxonomy terms, while content types, views and taxonomy vocabularies are usually configuration.

Content lives in the database of each environment, we say, while configuration is exportable via Features or other mechanisms and should live in the Git repo (this has been called code-driven development).

Still, a definition of content and configuration is naggingly elusive: why “usually”? Why are there so many edge cases? We’re engineers, we need precision! I often feel like I’m trying to define what a bird is: every child knows what a bird is, but it’s hard to define it. Ostriches can’t fly; platypuses lay eggs but aren’t birds.

Why the distinction?

I recently saw an interesting comment titled “A heretic speaks” on a blog post about code-driven development. It sums up some of the uneasiness about the place of configuration in Drupal: “Drupal was built primarily with site builders in mind, and this is one reason [configuration] is in the database”.

In effect, the primary distinction in Drupal is between code (Drupal core and config), and the database, which contains content types, nodes, and everything else.

As more complex sites were being built, a new distinction had to be made between two types of information in the database: configuration and content. This was required to allow development in a dev-stage-production workflow where features being developed outside of a production site could be deployed to production without squashing the database (and existing comments, nodes, and the like). We needed to move those features into code and we called them “configuration”.

Thus the features module was born, allowing views, content types, and vocabularies (but not nodes and taxonomy terms) to be developed outside of the database, and then deployed into production.

Drupal 8’s config management system takes that one step further by providing a mature, central API to deal with this.

The devil is in the details

This is all fine and good, but edge cases soon begin to arise:

  • What about an “About us” page? It’s a menu item (deployable) linking to a node (content). Is it config? Is it content?
  • What about a “Social media” menu and its menu items? We want a Facebook link to be deployable, but we don’t want to hard-code the actual link to our client’s Facebook page (which feels like content) – we probably don’t even know what that link is during development.
  • What about a block whose placement is known, but whose content is not? Is this content? Is it configuration?
  • What about a view which references a taxonomy term id in a hard-coded filter. We can export the view, but the taxonomy term has an incremental ID ans is not guaranteed to work on all environments.

The wrong answer to any of these questions can lead to a misguided development approach which will come back to haunt you afterward. You might wind up using incremental IDs in your code or deploying something as configuration which is, in fact, content.

Defining our terms

At the risk of irking you, dear reader, I will suggest doing away with the terms “content” and “configuration” for our purposes: they are just too vague. Because we want a formal definition with no edge cases, I propose that we use these terms instead (we’ll look at each in detail a bit further on):

  • Code: this is what our deliverable is for a given project. It should be testable, versioned, and deployable to any number of environments.
  • Data: this is whatever is potentially different on each environment to which our code is deployed. One example is comments: On a dev environment, we might generate thousands of dummy comments for theming purposes, but on prod there might be a few dozen only.
  • Placeholder content: this is any data which should be created as part of the installation process, meant to be changed later on.

Code

This is what our deliverable is for a given project. This is important. There is no single answer. Let’s take the following examples:

  • If I am a contributor to the Views contrib project, my deliverable is a system which allows users to create views in the database. In this case I will not export many particular views.

  • For another project, my deliverable may be a website which contains a set number of lists (views). In this case I may use features (D7) or config management (D8) to export all the views my client asked for. Furthermore, I may enable views_ui (the Views User interface) only on my development box, and disable it on production.

  • For a third project, my deliverable may a website with a number of set views, plus the ability for the client to add new ones. In this only certain views will be in code, and I will enable the views UI as a dependency of my site deployment module. The views my client creates on production will be data.

Data

A few years ago, I took a step back from my day-to-day Drupal work and thought about what my main pain points were and how to do away with them. After consulting with colleagues, looking at bugs which took longest to fix, and looking at major sources of regressions, I realized that the one thing all major pain points had in common were our deployment techniques.

It struck me that cloning the database from production to development was wrong. Relying on production data to do development is sloppy and will cause problems. It is better to invest in realistic dummy content and a good site deployment module, allowing the standardized deployment of an environment in a few minutes from any commit.

Once we remove data from the development equation in this way, it is easier to define what data is: anything which can differ from one environment to the next without overriding a feature.

Furthermore, I like to think of production as just another environment, there is nothing special about it.

A new view or content type created on production outside of our development cycle resides on the database, is never used during the course of development, and is therefore data.

Nodes and taxonomy terms are data.

What about a view which is deployed through features and later changed on another environment? That’s a tough one, I’ll get to it (See Overriden features, below).

Placeholder content

Let’s get back to our “About us” page. Three components are involved here:

  • The menu which contains the “About us” menu item. These types of menus are generally deployable, so let’s call them code.
  • The “About us” node itself which has an incremental nid which can be different on each environment. On some environments it might not even exist.
  • The “About us” menu item, which should link to the node.

Remember: we are not cloning the production database, so the “About us” does not exist anywhere. For situations such as this, I will suggest the use of Placeholder content.

For sake of argument, let’s define our deliverable for this sample project as follows:

"Define an _About us_ page which is modifiable".

We might be tempted to figure out a way to assign a unique ID to our “About us” node to make it deployable, and devise all kinds of techniques to make sure it cannot be deleted or overridden.

I have an approach which I consider more logical for these situations:

First, in my site deployment module’s hook_update_N(), create the node and the menu item, bypassing features entirely. Something like:

function mysite_deploy_update_7023() {
  $node = new stdClass();
  $node->title = 'About us';
  $node->body[LANGUAGE_NONE][0]['format'] = 'filtered_html';
  $node->body[LANGUAGE_NONE][0]['value'] = 'Lorem ipsum...';
  $node->type = 'page';
  node_object_prepare($node);
  $node->uid = 1;
  $node->status = 1;
  $node->promote = 0;
  node_save($node);

  $menu_item = array(
    'link_path' => 'node/' . $node->nid,
    'link_title' => 'About us',
    'menu_name' => 'my-existing-menu-exported-via-features',
  );

  menu_link_save($item);
}

If you wish, you can also implement hook_requirements() in your custom module, to check that the About us page has not been accidentally deleted, that the menu item exists and points to a valid path.

What are the advantages of placeholder content?

  • It is deployable in a standard manner: any environment can simply run drush updb -y and the placeholder content will be deployed.
  • It can be changed without rendering your features (D7) or configuration (D8) overriden. This is a good thing: if our incremental deployment script calls features_revert() or drush fra -y (D7) or drush cim -y (D8), all changes to features are deleted. We do not want changes made to our placeholder content to be deleted.
  • It can be easily tested. All we need to do is make sure our site deployment module’s hook_install() calls all hook_update_N()s; then we can enable our site deployment module within our simpletest, and run any tests we want against a known good starting point.

Overriden features

Although it is easy to override features on production, I would not recommend it. It is important to define with your client and your team what is code and what is data. Again, this depends on the project.

When a feature gets overridden, it is a symptom that someone does not understand the process. Here are a few ways to mitigate this:

  • Make sure your features are reverted (D7) or your configuration is imported (D8) as part of your deployment process, and automate that process with a continuous integration server. That way, if anyone overrides a feature on a production, it won’t stay overridden long.
  • Limit administrator permissions so that only user 1 can override features (this can be more trouble than it’s worth though).
  • Implement hook_requirements() to check for overridden features, warning you on the environment’s dashboard if a feature has been overridden.

Some edge cases

Now, with our more rigorous approach, how do our edge cases fare?

Social media menu and items: Our deliverable here is the existence of a social media menu with two items (twitter and facebook), but whose links can be changed at any time on production without triggering an overridden feature. For this I would use placeholder content. Still, we need to theme each button separately, and our css does not know the incremental IDs of the menu items we are creating. I have successfully used the menu attributes module to associate classes to menu items, allowing easy theming. Here is an example, assuming menu_attributes exists and menu-social has been exported as a feature.

/**
 * Add facebook and twitter menu items
 */
function mysite_deploy_update_7117() {
  $item = array(
    'link_path' => 'http://twitter.com',
    'link_title' => 'Twitter',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'twitter',
      )
    )
  );
  menu_link_save($item);
  $item = array(
    'link_path' => 'http://facebook.com',
    'link_title' => 'Facebook',
    'menu_name' => 'menu-social',
    'options' => array(
      'attributes' => array(
        'class' => 'facebook',
      )
    )
  );
  menu_link_save($item);
}

The above code creates the menu items linking to Facebook and Twitter home pages, so that content editors can put in the correct links directly on production when they have them.

Placeholder content is just like regular data but it’s created as part of the deployment process, as a service to the webmaster.

A block whose placement is known, but whose content is not. It may be tempting to use the box module which makes blocks exportable with feature. But in this case the block is more like placeholder content, so it should be deployed outside of features. And if you create your block programmatically, its id is incremental and it cannot be deployed with context, but should be placed in a region directly, again, programmatically in a hook_update_N().

Another approach here is to create a content type and a view with a block display, fetching the last published node of that content type and displaying it at the right place. If you go that route (which seems a bit overengineered to me), you can then place your block with the context module and export it via features.

A view which references a taxonomy term id in its filter: If a view requires access to a taxonomy term nid, then perhaps taxonomy is the wrong tool here. Taxonomy terms are data, they can be deleted, their names can be changed. It is not a good idea for a view to reference a specific taxonomy term. (Your view can use taxonomy terms for contextual filters without a problem, but we don’t want to hard-code a specific term in a non-contextual filter – See this issue for an example of how I learned this the hard way, I’ll get around to fixing that soon…).

For this problem I would suggest rethinking our use of a taxonomy term. Rather I would define a select field with a set number of options (with defined keys and values). These are deployable and guaranteed to not change without triggering a features override. Thus, our views can safely use them. If you are implementing this change on an existing site, you will need to update all nodes from the old to the new technique in a hook_update_N() – and probably add an automated test to make sure you’re updating the data correctly. This is one more reason to think things through properly at the onset of your project, not midway through.

In conclusion

Content and configuration are hard to define, I prefer the following definitions:

  • Code: deployable, deliverable, versioned, tested piece of software.
  • Data: anything which can differ from one environment to the next.
  • Placeholder content: any data which should be created as part of the deployment process.

In my experience, what fits in each category depends on each project. Defining these with your team as part of your sprint planning will allow you create a system with less edge cases.

Please enable JavaScript to view the comments powered by Disqus.

Nov 29 2014
Nov 29

In the spirit of the computer video game Doom and its skill levels, we’ll review a few ways you can improve  your Drupal speed performance     and optimize for better results and server response time. These tips that we’ll cover may be at times specific to Drupal 6 versions, although     you can always learn the best practices from these examples and apply them on your own code base.

Doom

Doom skill levels: (easiest first)

1. I’m too young to die

2. Hey, not too rough

3. Hurt me plenty

4. Ultra-violence

5. Nightmare!

  This post is rated “I’m too young too die” difficulty level.

When we start out building Drupal websites, we gradually build functionality and a common use case is creating a view, then you might want to create some blocks, very much related to the view, so you create a block view using the Views module. Then you maybe combine it with Panels or Context, it doesn’t really matter, but essentially you’ve been using the UI tools which are for ease of use, and the overhead for that lies in quite a bit of abstraction layer which later may cost in performance. Replacing the quicklinks and help and support blocks that were used in our theme’s sidebar from being a view based block to a simple programmatiaclly created block implementation proved to reduce a sizzling amount of ~200ms to ~2ms of server time spent on doing the same operation. That accounted for about ~200ms of page load time redduction for each page load, as this item was featured in many pages consistently on our theme.


drupal_perf-3

Series Navigation<< Drupal Performance Tip – removing unused modulesDrupal Performance Tip – “I’m too young to die” – know your DB engines >>
Nov 12 2014
Nov 12

In the spirit of the computer video game Doom and its skill levels, we’ll review a few ways you can improve         your Drupal speed performance     and optimize for better results and server response time. These tips that we’ll cover may be at times specific to Drupal 6 versions, although     you can always learn the best practices from these examples and apply them on your own code base.

Doom

Doom skill levels: (easiest first)

  1. I’m too young to die

  2. Hey, not too rough

  3. Hurt me plenty

  4. Ultra-violence

  5. Nightmare!

  This post is rated “I’m too young too die” difficulty level.

If you’re using a Drupal distribution which is great for kick-starting a project with many features built-in, you should still review added modules which are managed through the installation profile as they might prove un-necessary for your product as time goes and your product evolves and matures. Remember that even if you’re not using a distribution, you might have added some modules to meet a functionality, which you no longer use and you disabled through CSS, through the menus, through the theme, but you forgot all about removing the actual module. These un-used modules account for memory footprint as they are loaded through PHP and they can also account for Drupal hooks, which is even worse in terms of performance for you.

Remember to review your installed modules base on Drupal and remove any un-used functionality:

drupal_perf-2

Series Navigation<< Drupal Performance Tip – “I’m too young to die” – indexes and SQLsDrupal Performance Tip – replace views blocks with vanilla blocks >>
Nov 05 2014
Nov 05

In the spirit of the computer video game Doom and its skill levels, we’ll review a few ways you can improve your Drupal speed performance and optimize for better results and server response time. These tips that we’ll cover may be at times specific to Drupal 6 versions, although you can always learn the best practices from these examples and apply them on your own code base.

Doom

Doom

Using indexes, and proper SQL queries can boost performance by a huge factor, especially if the affected tables are very big (millions of rows). Take a look at the diff below showing a fix to a not so proper, and ill-advised use of querying the database:

drupal_perf-1

The bad performing query took anything between 6 to 60 seconds to run, depending on the data, and database load, and database’s current cache state. The newer query takes milliseconds.

Series NavigationDrupal Performance Tip – removing unused modules >>

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web