Mar 06 2019
Mar 06

Platform.sh, like any good PaaS, exposes a lot of useful information to applications via environment variables. The obvious parts, of course, are database credentials, but there's far more that we make available to allow an application to introspect its environment.

Sometimes those environment variables aren't as obvious to use as we'd like. Environment variables have their limits, such as only being able to store strings. For that reason, many of the most important environment variables are offered as JSON values, which are then base64-encoded so they fit nicely into environment variables. Those are not always the easiest to read.

That's why we're happy to announce all new, completely revamped client libraries for PHP, Python, and Node.js to make inspecting the environment as dead-simple as possible.

Installation

All of the libraries are available through their respective language package managers:

PHP:

composer install platformsh/config-reader

Python:

pip install platformshconfig

Node.js:

npm install platformsh-config --save

That's it, you're done.

Usage

All three libraries work the same way, but are flavored for their own language. All of them start by instantiating a "config" object. That object then offers methods to introspect the environment in intelligent ways.

For instance, it's easy to tell if a project is running on Platform.sh, in the build hook or not, or if it's in a Platform.sh Enterprise environment. In PHP:

$config = new \Platformsh\ConfigReader\Config();

$config->inValidPlatform(); // True if env vars are available at all.
$config->inBuild();
$config->inRuntime();
$config->onEnterprise();
$config->onProduction();

// Individual Platform.sh environment variables are available as their own properties, too.
$config->applicationName;
$config->port;
$config->project;
// ...

The onProduction() method already takes care of the differences between Platform.sh Professional and Platform.sh Enterprise and will return true in either case.

What about the common case of accessing relationships to get credentials for connecting to a database? Currently, that requires deserializing and introspecting the environment blob yourself. But with the new libraries, it's reduced to a single method call. In Python:

config = platformshconfig.Config()

creds = config.credentials('database')

This will return the access credentials to connect to the database relationship. Any relationship listed in .platform.app.yaml is valid there.

What if you need the credentials formatted a particular way for a third-party library? Fortunately, the new clients are extensible. They support "credential formatters", which are simply functions (or callables, or lambdas, or whatever the language of your choice calls them) that take a relationship definition and format it for a particular service library.

For example, one of the most popular Node.js libraries for connecting to Apache Solr, solr-node wants the name of a collection as its own string. The Platform.sh relationship provides a path, since there are other libraries that use a path to connect. Rather than reformat that string inline, the Node.js library includes a formatter specifically for solr-node:

const solr = require('solr-node');
const config = require("platformsh-config").config();

let client = new solr(config.formattedCredentials('solr-relationship-name', 'solr-node'));

Et voila. client is now a solr-node client and is ready to be used. It's entirely possible to register your own formatters, too, and third-party libraries can include them as well:

config.registerFormatter('my-client-library', (creds) => {
  // Do something here to return a string, struct, dictionary, array, or whatever.
});

We've included a few common formatters in each library to cover some common libraries. We'll be adding more as time goes by, and, of course, PRs are always extremely welcome to add more!

But what about my language?

We wanted to get these three client libraries out the door and into your hands as soon as possible. But don't worry; Go and Ruby versions are already in the works and will be released soon.

We'll continue to evolve these new libraries, keeping the API roughly in sync between all languages, but allowing each to feel as natural as possible for each language.

Aug 23 2018
Aug 23

PHP continues its steady march forward, and today marks the release of the latest version, PHP 7.3.

It also marks its release on Platform.sh, and our first holiday gift to you this season.

So what's new?

PHP 7.3 brings continued incremental performance improvements to the language. It's not as big as the jump to 7.0 was (few things can be), but the same code running under PHP 7.3 should be a bit faster than on 7.2.

While there's no earth-shattering additions in this version, there are a few nice pluses. Like a handful of new utility functions, such as is_countable(), array_key_first(), and array_key_last() (all of which are fairly self-explanatory). What's most exciting for language nerds that follow PHP's development (like yours truly)? Trailing commas in function calls are now legal, just as they have been on array definitions for years; Heredoc and Nowdoc syntax are now also more forgiving, allowing for more nicely formatted, multiline strings; the JSON utilities can now be set to throw exceptions, like most newer functionality, allowing error handling to be more consistent throughout the application.

OK, it's not going to change the world, but it's still nice.

There are also a number of deprecations around edge cases to flag behaviors that are expected to go away in PHP 8.0. (Yes, we did just say that number.) See the Changelog for the full list of changes and fixes.

Cool, so how do I do it?

As always at Platform.sh, it's just a YAML tweak away. In your .platform.app.yaml file (not on the master branch), change your application type to php:7.3, like so:

type: php:7.3

That's it. When next you push on that branch, you'll get the PHP 7.3 container. Test it out, and make sure everything is working (it should be fine), then merge back to master when you're ready.

Enjoy the latest and greatest PHP has to offer—any day of the week!

Aug 09 2018
Aug 09

The cicada is a flying insect found world-wide. It's loud but not particularly threatening. It's most famous attribute, though, is that many species of cicada (particularly in North America) are periodic, only emerging every 13 or 17 years depending on the species. When it does emerge, a huge brood reaches maturity all at once, mates, lays eggs, and then dies. The eggs hatch and the offspring spend the next 13 or 17 years living deep underground and burrowing before repeating the cycle again.

But why 13 and 17 years? That's a rather odd set of numbers... And that's actually the point. Those lifespans are both prime numbers, that is, they are divisible only by themselves and one. Many cicada predators also have multi-year life cycles rather than emerging every year. So what are the odds of a large number of cicada predators emerging in the same year as a large number of cicadas?

Very low, in fact. That's the point. Because a prime number is only divisible by 1 and itself, a smaller number sequence will overlap with it only when those two are multiplied. That is, a 4 year cycle predator and a 13 year cicada will only emerge at the same time every 4 * 13 = 52 years. If the cicada emerged every 12 years, however, the 4 year predator would have a veritable buffet every third generation and the cicadas would have a bad time every time.

Over time, evolutionary pressure weeded out the many-common-divisor periodic species of cicada, leaving only those that have overlapping generations every year and those that have a huge all-at-once generation on a prime-number schedule.

What can we learn from the little cicada? If you have two repeating events, and you want them to happen at the same time as rarely as possible, have them repeat on prime numbers.

But what does that have to do with web development?

A website frequently has background tasks that it needs to run from time to time; sometimes every few minutes, sometimes every few hours, sometimes every few days. Most often these are run using a cron task.

Generally speaking it's a bad idea to run more than one cron job at once. Even if they don't interfere with each other they may use a lot of CPU, and you don't want them to slam the system all at once. In fact, on Platform.sh we don't allow that to happen: If a cron task tries to start but there's another already running, we force the new one to pause and wait for the first to complete.

That can sometimes cause issues if, say, a nightly backup process wants to start while a routine every-few-minutes cron task is running. The snapshot will start but block waiting for the other cron task to finish, which if it's a long running task could result in a brief period of site outage while the snapshot waits its turn.

Avoiding predatory cron jobs

So how do we make sure one cron job runs at the same time as another as little as possible? The same way cicadas avoid predators: Prime numbers!

More specifically, say we have a cron task that runs normal system maintenance every 20 minutes. Then we have an import process that periodically reads data from an external system every 10 minutes, and another that runs every 5 minutes to send out pending emails.

The result will be that every 10 minutes we have two cron tasks competing to run at the same time, and every 20 minutes we have three cron tasks competing. That's no good at all!

Instead, let's set the system maintenance to run every 23 minutes, the import to run every 11 minutes, and the email runner every 7 minutes. It's almost the same schedule, but because the numbers are prime they will only very rarely overlap. (Every 77 minutes in the shortest case.) That spreads the load out far better and avoids any process blocking on another.

Now if we want to add a nightly backup, we can have it run at, say, 17 minutes past 4:00 am. It will be extremely rare for the other cron tasks to hit at the 17 minute mark exactly, so our snapshot will almost never need to block on another cron task and our site won't freeze while it waits.

Isn't it nice when bugs end up helping your software run faster?

Aug 07 2018
Aug 07

Microservices have been all the rage for the past several years. They're the new way to make applications scalable, robust, and break down the old silos that kept different layers of an application at odds with each other.

But let's not pretend they don't have costs of their own. They do. And, in fact, they are frequently, perhaps most of the time, not the right choice. There are, however, other options besides one monolith to rule them all and microservice-all-the-things.

What is a microservice?

As usual, let's start with the canonical source of human knowledge, Wikipedia:

"There is no industry consensus yet regarding the properties of microservices, and an official definition is missing as well."

Well that was helpful.

Still, there are common attributes that tend to typify a microservice design:

  • Single-purpose components
  • Linked together over a non-shared medium (usually a network with HTTP or similar, but technically inter-process communication would qualify)
  • Maintained by separate teams
  • And released (or replaced) on their own, independent schedule

The separate teams part is often overlooked, but shouldn't be. The advantages of the microservice approach make it clear why:

  • Allow the use of different languages and tools for different services (PHP/MongoDB for one and Node/MySQL for another, for instance.)
  • Allows small, interdisciplinary teams to manage targeted components (that is, the team has one coder, one UI person, and one DB monkey rather than having a team of coders, a team of UI people, and a team of DB monkeys)
  • Allows different components to evolve and scale scale independently
  • Encourages strong separation of concerns

Most of those benefits tie closely to Conway's Law:

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.

A microservice approach works best when you have discrete teams that can view each other as customers or vendors, despite being within the same organization. And if you're in an organization where that's the case then microservices are definitely an approach to consider.

However, as with any architecture there are tradeoffs. Microservices have cost:

  • Adding network services to your system introduces the network as a point of failure.
  • PointS of failure should always be plural, as a network, even a virtual and containerized one, has many, many points of failure.
  • The network will always be 10x slower than calling a function, even a virtual network. If you're using a shared-nothing framework like PHP you have to factor in the process startup cost of every microservice.
  • If you need to move some logic from one microservice to another it's 10x harder than from one library to another within an application.
  • You need to staff multiple interdisciplinary teams.
  • Teams need to coordinate carefully to avoid breaking any informal APIs
  • Coarse APIs
  • Needing new information from another team involves a much longer turnaround time than just accessing a database.

Or, more simply: Microservices add complexity. A lot of complexity. That means a lot more places where things can go wrong. A common refrain from microservice skeptics (with whom I agree) is

"if one of your microservices going down means the others don't work, you don't have a microservice; you have a distributed monolith."

To be sure, that doesn't mean you shouldn't use microservices. Sometimes that is the right approach to a problem. However, the scale at which that's the is considerably higher than most people realize.

What's the alternative?

Fortunately, there are other options than the extremes of a single monolith and a large team of separate applications that happen to talk to each other. There's no formal term for these yet, but I will refer to them as "clustered applications".

A clustered application:

  • Is maintained by a single interdisciplinary team
  • Is split into discrete components that run as their own processes, possibly in separate containers
  • Deploys as a single unit
  • May be in multiple languages but usually uses a single language
  • May share its datastore(s) between processes

This "in between" model has been with us for a very long time. The simplest example is also the oldest: cron tasks. Especially in the PHP world, many applications have had a separate cron process from their web request/response process for literally decades. The web process exists as, essentially, a monolith, but any tasks that can be pushed off to "later" get saved for later. The cron process, which could share, some, all, or none of the same code, takes care of the "later". That could include sending emails, maintenance tasks, refreshing 3rd party data, and anything else that doesn't have to happen immediately upon a user request for the response to be generated.

Moving up a level from cron are queue workers. Again, the idea is to split off any tasks that do not absolutely need to be completed before a response can be generated and push them to "later". In the case of a queue worker "later" is generally sooner than with a cron job but that's not guaranteed. The workers could be part and parcel of the application, or they could be a stand-alone application in the same language, or they could be in an entirely different language. A PHP application with a Node.js worker is one common pattern, but it could really be any combination.

Another variant is to make an "Admin" area of a site a separate application from the front-end. It would still be working on the same database, but it's possible then to have two entirely separate user pools, two different sets of access control, two different caching configurations, etc. Often the admin could be built as just an API with a single-page-app frontend (since all users will be authenticated with a known set of browser characteristics and no need for SEO) while the public-facing application produces straight HTML for better performance, scalability, cacheability, accessibility, and SEO.

Similarly, one could make a website in Django but build a partner REST API in a separate application, possibly in Go to squeeze the last drop of performance out of your system.

There's an important commonality to all of these examples: Any given web request runs through exactly one of them at a time. That helps to avoid the main pitfall of microservices, which is adding network requests to every web request. The fewer internal IO calls you have the better; just ask anyone who's complained about an application making too many SQL queries per request. The boundaries where it's reasonable to "cut" an application into multiple clustered services are anywhere there is, or can be, an asynchronous boundary.

There is still additional complexity overhead beyond a traditional monolith: while an individual request only needs one working service and there's only one team to coordinate, there's still multiple services to have to manage. The communication paths between them are still points of failure, even if they're much more performance tolerant. There could also be an unpredictable delay between actions; an hourly cron could run 1 minute or 59 minutes after the web request that gave it an email to send. A queue could fill up with lots of traffic. Queues are not always perfectly reliable.

Still, that cost is lower than the overhead of full separate-team microservices while offering many (but not all) of the benefits in terms of separation of concerns and allowing different parts of the system to scale and evolve mostly independently. (You can always throw more worker processes at the queue even if you don't need more resources for web requests.) It's a model well worth considering before diving into microservices.

How do I do either of these on Platform.sh?

I'm so glad you asked! Platform.sh is quite capable of supporting both models. While our CPO might yell at me for this, I would say that if you want to do "microservices" you need multiple Platform.sh projects.

Each microservice is supposed to have its own team, its own datastore, its own release cycle, etc. Doing that in a single project, with a single Git repository, is rather counter to that design. If your system is to be built with 4 microservices, then that's 4 Platform.sh projects; however, bear in mind that's a logical separation. Since they're all on Platform.sh and presumably in the same region, they're still physically located in the same data center. The latency between them shouldn't be noticeably different than if they were in the same project.

Clustered applications, though, are where Platform.sh especially shines. Every project can have multiple applications in a single project/Git repository, either in the same language or different language. They can share the same data store or not.

To use the same codebase for both the web front-end and a background worker (which is very common), we support the ability to spin up the same built application image as a separate worker container. Each container is the same codebase but can have different disk configuration, different environment variables, and start a different process. However, because they all run the same code base it's only a single code base to maintain, a single set of unit tests to write, etc.

And of course cron tasks are available on every app container for all the things cron tasks are good for.

Within a clustered application processes will usually communicate either by sharing a database (be it MariaDB, PostgreSQL, or MongoDB) or through a queue server, for which we offer RabbitMQ.

Mixing and matching is also entirely possible. In a past life (in the bad old days before Platform.sh existed) I built a customer site that consisted of an admin curation tool built in Drupal 7 that pulled data in from a 3rd party, allowed users to process it, and then exported pre-formatted JSON to Elasticsearch. That exporting was done via a cron job, however, to avoid blocking the UI. A Silex application then served a read-only API off of the data in Elasticsearch, and far faster than a Drupal request could possibly have done.

Were I building that system today it would make a perfect case for a multi-app Platform.sh project: A Drupal app container, a MySQL service, an Elasticsearch service, and a Silex app container.

Please code responsibly

There are always tradeoffs in different software design decisions. Sometimes the extra management, performance, and complexity overhead of microservices is worth it. Sometimes it's... not, and a tried-and-true monolith is the most effective solution.

Or maybe there's an in-between that will get you a better balance between complexity, performance, and scalability. Sometimes all you need is "just" a clustered application.

Pick the approach that fits your needs best, not the one that fits the marketing zeitgeist best. Don't worry, we can handle all of them.

Jun 25 2018
Jun 25

PHP is a rapidly-evolving language. Recent years have seen the adoption of a very clear and mostly consistent release process with a new version coming out every fall, with previous versions falling to maintenance support, then security-only support, then unsupported on a regular, predictable schedule.

That's great for those who appreciate the new language features that continually come down the pipeline, and the performance improvements they bring. It does mean it's important to stay up to date, though. PHP 5.6, the last release of the PHP 5 series, got an extra year of security support for people that would have a hard time updating to the new PHP 7, but even that expires at the end of this year.

In fact, as of the end of this year the oldest supported version of PHP will be PHP 7.1. Yeah, really.

Which begs the question... Do you know what your PHP version is?

How to check

On Platform.sh it's easy. Just check your .platform.app.yaml file for the type key. If it looks like this:

type: php:7.1

or this:

type: php:7.2

Then you're good! If it says php:7.0 then you should really start planning your update to 7.2. If it says anything older... well, you're missing out.

What happens on 31 December 2018?

Aside from uncorking some champagne, nothing. Platform.sh still offers container images all the way back to PHP 5.4, and we have no immediate plans to drop those images any time soon. However, they are completely unsupported. If there's a bug in them, no one is going to fix them. In some cases they're still built using older versions of Debian, so other related software is out of date as well. We won't be updating those.

If security vulnerabilities are found in PHP versions older than 7.1 no one is going to be fixing them. There are, in fact, known security holes in older versions of PHP that are no longer supported, and thus have never been fixed. That's normal and it's what unsupported means. Over time no doubt other issues will be found in PHP 5.6 and 7.0 that will also not be fixed as they are no longer supported.

If you want to keep your site secure, it's time to upgrade.

Why else should I upgrade?

Glad you asked! Security is a good reason to keep your software up to date, but it's far from the only reason. If you're still running PHP 5.x, then the even bigger reason is speed.

PHP 7.anything blows PHP 5 out of the water on performance. Benchmarks from dozens of companies have shown over and over again that twice the requests/second and half the memory usage is a normal improvement; some code bases can see even more. Rasmus Lerdorf (creator of PHP) publishes benchmarks periodically. His most recent, from earlier this year, shows PHP 7 smoking PHP 5 on WordPress performance, specifically:

Wordpress is twice as fast on PHP 7 as on PHP 5

Wordpress uses a tiny fraction as much memory on PHP 7 as PHP 5.

Other benchmarks show similar (although not quite as dramatic) impact on Drupal, Magento, Symfony, Moodle, and various other systems.

It's rare in tech that any "silver bullet" appears, but upgrading from PHP 5 to PHP 7 is about as close to a performance silver bullet as you're ever going to see.

Of course, there's ample new functionality available for developers, too:

PHP 7.0 brought scalar type hints, return types, anonymous classes, and vastly improved error handling.

PHP 7.1 brought the void and iterable types, nullable types, and multi-catch exceptions.

PHP 7.2 brought the strongest security and encryption library available to the core language, along with even more improvements to the type system.

For a full rundown of the best parts of PHP 7, see my presentation from this year's php[tek] conference.

And of course PHP 7.3 is coming out this fall. (We'll support it when it comes out, too.)

OK, so how do I upgrade?

It's Platform.sh, which means upgrading is easy. Just change your type key in .platform.app.yaml to php:7.2, and push. You're done.

Well, you really should test it in a branch first. Push that change to a branch and give it a whirl. Assuming everything is good, click Merge.

There's tooling available to help audit your code for future compatibility, too. For instance, the PHPCompatibility extension for PHP_CodeSniffer can flag most places where you may want to tweak your code to keep it compatible with newer versions. You can run it locally over your code base to ensure you're ready to update, then update.

If you're using Drupal, WordPress, Symfony, Zend, Laravel, or most other major systems, their latest versions are already tested and stable on PHP 7.2. That makes upgrading even easier. In fact, several systems have already made PHP 7.1 a minimum requirement for their latest version, which gives both their developers and you more power in your PHP.

Enjoy your faster, more robust PHP! Don't get left behind on unsupported versions, especially when the benefits of upgrading are so high and the cost is so low. And don't forget to plan for upgrading to PHP 7.3 later this year. It should be just as easy then, too.

Apr 25 2018
Apr 25

The Drupal project today released another security update to Drupal 7 and 8 core, SA-CORE-20108-004. It is largely a refinement of the previous fix released for SA-CORE-2018-002 a few weeks ago, which introduced a Drupal-specific firewall to filter incoming requests. The new patch tightens the firewall further, preventing newly-discovered ways of getting around the filters, as well as correcting some deeper issues in Drupal itself.

We previously added the same logic to our own network-wide WAF to address SA-CORE-2018-002. With the latest release we've updated out WAF rules to match Drupal's updates, and the new code is rolling out to all projects and regions as we speak.

The upshot?

  1. You really need to update Drupal to 7.59 or 8.5.3 as soon as possible. We believe that some of the attack vectors fixed in the latest patch cannot be blocked by a WAF. See our earlier post for quick and easy instructions to update your Drupal 7 or 8 sites on Platform.sh in just a few minutes.

  2. Still, most of the attack vectors fixed in the latest release are covered by the WAF. That should help keep your site safe from most attacks until you can update. But please, update early and often.

Stay safe out there on the Internet!

Apr 04 2018
Apr 04

A key part of Platform.sh's benefit comes from its integrated build and deploy hooks. Deploying a new version of your site or application rarely means simply dumping what's in Git onto a server anymore. Platform.sh was built from the ground up to let you execute whatever commands you need to "build" your application — turning what's in Git into what's actually on the server — and then to "deploy" the application — cleanup tasks like database migrations that should be run before the site is opened to visitors again.

There's a caveat there, however. Some deploy tasks need to block the site from new visitors until they complete; think updating the database schema, for instance. Others may not really need exclusive access to the site, but they still get it. That keeps the site unresponsive for critical seconds until those tasks complete.

So, let's fix that. We've now added a third hook, post_deploy. It works pretty much as you'd expect. You can do all the same things in it that you can do with a deploy hook, but it runs after the site is reopened to the world to accept new requests. Any tasks that don't need exclusive access to the database can be moved there, keeping the site up and responsive as much as possible while allowing for more robust and flexible automated tasks.

For example, the following configuration would run any pending database updates as part of the deploy hook but then import new content in the post_deploy hook. The new content will become available as soon as possible but the site will still be up and running while it's being updated. Once the import is done we'll also clear the cache a second time to ensure the new content is visible to the next request.

hooks:
    deploy: |
        set -e
        update_db.php
        clear_cache.php
    post_deploy: |
        set -e
        migrate_content.php import/
        clear_cache.php

What's "safe" to move to the post_deploy hook? That's up to you. What does or does not need an exclusive database lock will vary by site. Sometimes a cache clear is safe to do post-open, other times not. You get to make that determination for your application.

See the hook documentation for more information, and enjoy faster deploy tasks.

Apr 04 2018
Apr 04

We've offered customers the ability to subscribe to updates and notices about our service for a long time, using https://status.platform.sh/. That's great if you want to know when we have maintenance planned but as we've grown and added new regions to our network network it's become apparent that not all customers want to know what's happening on every part of our network. (Who knew?)

For that reason we've now added support for separate notification channels on our status service. When creating a new subscription or editing your existing one you should see a screen something like this:

A list of available regions to select (Screenshot).

That will let you select just the regions and message types you care about.  That way, you can safely ignore maintenance windows in the Netherlands you don't care about for your Australian site.  (Of course, if you really do care what's happening to servers on the other side of the world that's fine by us. We don't judge.)

If you aren't already subscribed to get status notifications this is a good time to do it.  And while you're at it, make sure you have health notifications setup for your specific project, too.

Mar 28 2018
Mar 28

Platform.sh customers should visit Safe from DrupalGeddon II aka SA-CORE-2018-02 for the specific steps we took to protect all our Drupal instances.

Earlier today, a critical remote code execution vulnerability in Drupal 6, 7, and 8 was disclosed. This highly-critical issue affects all Drupal 7.x and 8.x sites and most Drupal 6.x sites. You should update immediately any Drupal site you have to versions 8.5.1, 8.4.6, or 7.58, as appropriate.

How to know if I am affected?

We are currently not aware of exploits of this vulnerability in the wild but this will undoubtedly change in the next few hours. Writing an exploit for this is trivial and you should expect automated internet-wide attacks before the day is out.

You should take immediate steps to protect yourself. This is as bad or worse than the previous highly-critical vulnerability SA-CORE-2014-05 that wreaked havoc three and a half years ago affecting more than 12 Million websites.

(Like, seriously, if you are reading this and you are not on Platform.sh or another provider that has put a platform-level mitigation in place, go update your sites and then come back and finish reading. Please. Platform.sh customers, see below for how to quickly update your site.)

Where does the vulnerability come from?

The issue is in Drupal's handling of HTTP request parameters that contain certain special characters. These characters have special meaning in various places in Drupal, which if misinterpreted could lead to unexpected code paths being executed. The solution in the latest patch is to filter out such values before passing them off to application code.

Fortunately that same strategy can be implemented at the network layer. We have therefore applied the same logic to our Web Application Firewall to reject requests containing such values and deployed it across all projects in all regions, both Platform.sh Professional and Platform.sh Enterprise. That should protect all Drupal and Backdrop installations running anywhere on Platform.sh until they are upgraded.

What to do?

You must update any and all Drupal instances with 6.x, 7.x and 8.x or Backdrop CMS, or verify that your hosting provider has put in place an automated mitigation strategy for this vulnerability. (All Platform.sh clients are safe; our new WAF now detects and blocks all variants of this attack). Even if your hosting provider has a mitigation strategy in place you should update immediately anyway.

Drupal 6.x is no longer maintained and unlike Drupal 7.x and 8.x it does not support automated updates. Third-party support providers may provide a patch but you should make plans to upgrade from Drupal 6 to Drupal 8 as soon as possible.

Hopefully you are using Composer for your Drupal 7.x and 8.x or Drush make for Drupal 7.x, as is the default with Platform.sh installations.

To upgrade Drupal via Composer

To update your Drupal instances, and test nothing breaks you can follow the following simple procedure:

Verify that your composer.json file does not lock down drupal core to a minor version it should be something like "drupal/core": "~8.0". Then run:

git checkout -b security_update
composer update

Make sure that Drupal Core was updated to 8.5.1 or higher. (Check composer.lock using git diff). Commit and push your changes:

git commit –am ’fix for SA-CORE-2018-02’ && git push

On Platform.sh you can test that everything is fine on your automatically-generated staging environment, then merge to master putting this to production.

If you do not use Platform.sh you should test this either locally or your testing server; and follow your normal procedure to update your live sites.

To upgrade Drupal using Drush Make

If you are using "Drush Make" style of dependency management, again, make sure you are not locked down to a vulnerable version such as:

projects[drupal][version] = 7.57

if it is, bump it up to 7.58. Then make a branch and update it:

git checkout -b security_update
drush pm-update

Commit the changes and push the result to Platform.sh for testing. Once you're satisfied nothing is broken merge back to master and deploy.

To upgrade Drupal if you're checking Drupal core into your repository

If you're running a "vanilla" Drupal setup, with all of Drupal checked into Git, the easiest way to upgrade is using drush.

In your local environment, go to your Drupal document root and run:

git checkout -b security_update
drush pm-update drupal

Commit the changes and push the result to Platform.sh for testing. Once you're satisfied nothing is broken merge back to master and deploy. Afterward, look into how to migrate your site to a dependency managed configuration, preferably Composer. It will make maintenance far easier and more robust in the future.

As a reminder, your Platform.sh instances are not vulnerable as they are protected by our WAF. You should still apply the fixes ASAP.

Mar 28 2018
Mar 28

An hour ago the SA-CORE-2018-002 critical Drupal vulnerability was disclosed. It was announced a week ago PSA-2018-001. That allowed us to gather our technical team and make sure we can develop and deploy a mitigation to all our clients immediately as the issue is made known.

If you're not running on Platform.sh, please stop reading this post and go update your Drupal site to version 8.5.1 / 8.4.9 / 8.3.8 / 7.58 right now. We're serious; upgrade first and ask questions later.

If you are running on Platform.sh: You're safe and can continue reading... then upgrade.

The vulnerability (also referred to as CVE-2108-7600) affects the vast majority of Drupal 6.x, 7.x and 8.x sites and allows arbitrary remote code execution that allow anonymous remote users to take full control of any affected Drupal site prior to 8.5.1 / 8.4.9 / 8.3.8 / 7.58.

The same issue is present in Backdrop CMS installations prior to 1.9.3.

If your Drupal site is not hosted on Platform.sh we encourage you to immediately update all your Drupal sites to 8.5.1 / 7.58 or to take your site offline. This is serious and trivially exploitable. You can expect automated attacks to appear within hours at most. If you are not on Platform.sh or another provider that has implemented a mitigation your site will be hacked. This is as critical as the notorious “DrupaGeddon” episode from three and a half years ago.

If you are hosting on Platform.sh...

Platform.sh is pleased to announce all Drupal sites hosted on all our regions and all our plans are automatically safe from this attack.

Platform.sh has many security layers that make attacks such as this much harder than on comparable services. Starting from our read-only hosts and our read-only containers, through our auditable and reproducible build-chain, and static-analysis based protective block.

In response to this latest vulnerability, we've taken two important steps:

  1. We've added a new rule to our Web Application Firewall (WAF) on all regions and on all Enterprise clusters that detects and blocks requests trying to exploit this latest attack vector, even if your site hasn't been updated. (But still, please update.)

  2. We are adding a check to our protective block to prevent deployment of affected Drupal versions. If you try to push an insecure Drupal version our system will flag it for you and warn you that you are pushing known-insecure code. Please update your code base as soon as possible.

As a client if you need any further assistance or want more information about the vulnerability, how it may affect you, and our mitigation strategy don’t hesitate to contact support. We have set our WAF to an especially aggressive stance for now and this may result in some users seeing a "400 Bad Request" message in some edge cases for legitimate traffic. If you experience this, please contact our support immediately they will be able to help.

Jan 25 2018
Jan 25

We always aim to offer our customers the best experience possible, with the tools they want to use. Usually that means expanding the platforms and languages we support (which now stands at six languages and counting), but occasionally it means dropping tools that are not being used so that we can focus resources on those that are.

For that reason, we will be dropping support for the HHVM runtime on 1 March 2018.

HHVM began life at Facebook as a faster, more robust PHP runtime. Although it never quite reached 100% PHP compatibility it got extremely close, and did see some success and buy-in outside of Facebook itself. Its most notable achievement, however, was providing PHP itself with much-needed competition, which in turn spurred the work that resulted in the massive performance improvements of PHP 7.

Similarly, Facebook's "PHP extended" language, Hack (which ran on HHVM), has seen only limited use outside of Facebook itself but served as a test bed and proving ground for many improvements and features that have since made their way into PHP itself. Like HHVM itself, though, Hack never achieved critical mass in the marketplace outside of Facebook.

Back in September, Facebook announced that they would be continuing development of Hack as its own language, and not aiming for PHP compatibility. Essentially Hack/HHVM will be a "full fork" of the PHP language and go its own way, and no longer try to be a drop-in replacement for PHP.

Platform.sh has offered HHVM support as a PHP alternative for several years, although as in the broader market it didn't see much use and with the release of PHP 7 the performance advantage of HHVM basically disappeared, leading people to migrate back to vanilla PHP 7. Looking at our own statistics, in fact, we recently found that HHVM was virtually unused on our system.

"Give the people what they want" also means not giving them what they clearly don't want, and the PHP market clearly doesn't want HHVM at this point. We will therefore be dropping support for it on 1 March. If Hack/HHVM develops its own market in the future and there's demand for it we may look into re-adding it at that time, but we'll wait and see.

Good night, sweet HHVM, and may a herd of ElePHPants sing thee to they REST!

Dec 28 2017
Dec 28

Platform.sh allows users to create a byte-for-byte snapshot of any running environment, production or otherwise, at any time with a single button click or command line directive.

That's great for one off use, like preparing to deploy a new version or run some large batch process, but what about for routine disaster recovery backups? Can we do those?

Believe it or, not, it's possible to automate yourself! And it's only a 3 step process.

The basic idea is that the Platform.sh CLI can be triggered from any automation tool you'd like... including cron from within the application container. It just needs an authentication token available in the environment.

Step 1: Get a token

Create an authentication token for your user or a dedicated automation user. That's easily done through the UI.

Set that token as a variable on your project, like so:

platform project:variable:set env:PLATFORMSH_CLI_TOKEN your_token_value

Step 2: Install the CLI

The Platform.sh CLI can be installed as part of a build hook within your project. Simply add the following line to your build hook:

curl -sS https://platform.sh/cli/installer | php

Now the CLI will be available in cron hooks, the deploy hook, or when logging in via SSH. It will use the token you provided a moment ago, and will automatically pick up the project and environment name from the existing environment variables.

Step 3: Snapshot on cron

You can now add a new cron entry to your .platform.app.yaml file, like so:

crons:
    snapshot:
        spec: '0 5 * * *'
        cmd: |
            if [ "$PLATFORM_BRANCH" = master ]; then
                platform snapshot:create --yes --no-wait
            fi

That will run the cmd once a day at 5 am UTC. (Adjust for whenever low-traffic time is for your site.) Then if and only if it's running on the master environment (production), the platform snapshot:create command will run and trigger a snapshot, just as if you'd run the command yourself. Poof, done.

Of note, though, are the --yes --no-wait flags. The first skips any user-interaction, since the command is running from cron. The second is extra important, as it tells cron to not block on the snapshot being created. If you forget that, cron will block on the snapshot which means so will any deploy you happen to try and trigger. That can result in extra-long deploys and site downtime. You don't want that, we don't want that, so make sure to include --no-wait.

That's it that's all, you're done! Rejoice in daily automated backups of your production environment.

Dec 20 2017
Dec 20

Time flies. It's been quite a year for Platform.sh as our product continues to improve. One of the great things about a managed services product is that it can continually improve without you even realizing it. The sign of a successful product feature is that it feels like it's always been there. Who can imagine life without it?

Let's take a look at what we've improved just in the last 12 months...

January opened with support for HTTP/2 on all projects. HTTP/2 changes the way browsers and servers communicate, making it faster, more streamlined, and better tailored to modern, asset-heavy web sites. HTTP/2 "just works" automatically as long as you're using a reasonably modern browser and HTTPS.

And as of April, you're using HTTPS. Courtesy of Let's Encrypt, you now get a free, automatic SSL certificate provisioned for every environment. No one should have to think about HTTPS in 2017. It's just a part of the package.

April also saw the launch of our tiered CDN for Platform.sh Enterprise. The Global CDN combines a flexible, high-feature CDN for dynamic pages with a low-cost, high-bandwidth CDN for static assets. That offers the best of both worlds for sites that want the best performance for the least cost.

We've also continued to expand our available services. We kicked off the year with support for persistent Redis as a key/value store rather than just as a cache server. March saw the addition of InfluxDB, a popular time-series data service for recording time-based data. In June, we added support for Memcached in case Redis doesn't do it for you.

We also beefed up the functionality of our existing services, adding support for multiple-core Solr configurations and multi-database MySQL configurations. We even now support regular expressions in the router for more fine-grained cookie control.

And of course we've kept up with the latest releases of your favorite languages, be that Python, Ruby, NodeJS, or perennial favorite PHP 7.2. We even added preliminary support for Go and Java, both of which are in beta now. (Interested in trying them out? Please reach out to us!)

August included support for arbitrary worker processes in their own container. That allows an application to easily spin up a background task to handle queue processing, image generation, or other out-of-band tasks with just a few lines of YAML with no impact on production responsiveness.

As of October, we've added health notification support for all projects. At the moment they only cover disk usage, but in time will expand to other health notices. (If you haven't configured them on your project yet we strongly recommend you do so.)

We're also closing out the year with new support for GitLab, as well as more flexible control over TLS and Client TLS, plus a few other changes that line us up for even bigger things in the year to come.

Last but not least, all of that goodness is available down under as of July with our new Sydney region for Platform.sh Professional.

And that's all been just this year! What do we have coming in 2018 to further redefine "modern hosting"?

You'll just have to join us in 2018 to find out...

Dec 11 2017
Dec 11

PHP 7.2 introduced a neat new feature called "type widening". In short, it allows methods that inherit from a parent class or interface to be more liberal in what they accept (parameters) and more strict in what they return (return values) than their parent. In practice they can only do so by removing a type hint (for parameters) or adding one where one didn't exist before (return values), not for subclasses of a parameter. (The reasons for that are largely implementation details far too nerdy for us to go into here.) Still, it's a nice enhancement and in many ways makes PHP 7.2 more compatible with earlier, less-typed versions of PHP than 7.0 or 7.1 were.

There's a catch, though: Because the PHP engine is paying more attention to parameter types than it used to, it means it's now rejecting more invalid uses than it used to. That's historically one of the main sources of incompatibilities between different PHP versions: Code that was technically wrong but the engine didn't care stops working when the engine starts caring in new version. Type widening is PHP 7.2's case of that change.

Consider this code:

interface StuffDoer {
  public function doStuff();
}

class A implements StuffDoer {
  public function doStuff(StuffDoer $x = null) {}
}

This is nominally valid, since A allows zero parameters in doStuff(), which is thus compatible with the StuffDoer interface.

Now consider this code:

class A {
  public function doStuff(StuffDoer $x = null) {}
}

class B extends A {
  public function doStuff() {}
}

While it seems at first like it makes sense, it's still invalid. We know that B is going to not do anything with the optional $x parameter, so let's not bother defining it. While that intuitively seems logical the PHP engine disagrees and insists on the parameter being defined in the child class, even though you and I know it will never be used. The reason is that another child of B, say C, could try to re-add another optional parameter of another type; that would technically be compatible with B, but could never be compatible with A. So, yeah, let's not do that.

But what happens if you combine them?

interface StuffDoer {
  public function doStuff();
}

class A implements StuffDoer {
  public function doStuff(StuffDoer $x = null) {}
}

class B extends A {
  public function doStuff() {}
}

There's two possible ways to think about this code.

  1. B::doStuff() implements StuffDoer::doStuff(), which has no parameters, so everything is fine.
  2. B::doStuff() extends A::doStuff(), which has a parameter. You can't leave off a parameter, so that is not cool.

Prior to PHP 7.2, the engine implicitly went with interpretation 1. The code ran fine. As of PHP 7.2.0, the engine now uses interpretation 2. It has to, because it's now being more careful about when you're allowed to drop a type on a parameter in order to support type widening. So this wrong-but-working code now causes a fatal error. Oopsies.

Fortunately, the quickfix is super easy: Just be explicit with the parameter, even if you know you're not going to be using it:

interface StuffDoer {
  public function doStuff();
}

class A implements StuffDoer {
  public function doStuff(StuffDoer $x = null) {}
}

class B extends A {
  public function doStuff(StuffDoer $x = null) {}
}

The more robust fix is conceptually simpler: Don't do that. While adding optional parameters to a method technically doesn't violate the letter of an interface, it does violate the spirit of the interface. The method is now behaving differently, at least sometimes, and so is not a true drop-in implementation of the interface.

If you find your code is doing that sort of stealth interface extension, it's probably time to think about refactoring it. As a stopgap, though, you should be able to just be more explicit about the parameters in child classes to work around the fatal error.

Enjoy your PHP 7.2!

Nov 03 2017
Nov 03

Transport Layer Security (TLS) is the encryption protocol used by all secure websites today. It's the "S" in "HTTPS", which you'll see on virtually all Platform.sh-hosted projects (thank you, Let's Encrypt!), and has replaced SSL for that task. For most sites simply enabling it by default is sufficient to keep a site secure, and that happens automatically in Platform.sh's case. However, in some cases it's helpful to tweak even further.

That's why we're happy to announce that as of today we're rolling out several new TLS-related features for all sites.

TLS version restriction

Like any protocol, TLS is periodically updated with new versions that address security weaknesses in older versions. Almost all browsers today TLS 1.2, which is the latest, as well as all earlier versions including SSL. That means when a browser connects to your site it will use the most up to date version that both the server and browser both support. In most cases that's perfectly fine.

If you want to really lock down your site, however, at the cost of banning a few really old web browsers, you can now set a minimum TLS version that a browser must use. That's a requirement of some security compliance programs, too. If the browser tries to use an older, insecure version of TLS it will get blocked. Just add the following snippet to a particular route in your routes.yaml file.

tls:
    min_version: TLSv1.2

And now that domain will reject any HTTPS connection that isn't using at least TLS 1.2.

HSTS support

HTTP Strict Transport Security (HSTS) lets you tell browsers that they should use HTTPS for all requests to a site, even if a stray link happens to use HTTP. You can now enable it by simply adding the following block to a route in routes.yaml:

tls:
    strict_transport_security:
        enabled:true

Now, that site will send an HSTS header with all requests, telling browsers to enforce HTTPS usage.

Client-authenticated TLS

Often when a site is being used as an API backend for an IoT device or mobile client application it's necessary to lock the site down to access just from selected users using TLS. This process is called "Client-authenticated TLS", and requires loading custom root TLS certificates on the server that determine whether or not a particular client request is authorized.

Starting today, it's also possible to provide those custom certificates as part of your route. Once again, it's just a few lines in a route definition:

tls:
    client_authentication: "require"
    client_certificate_authorities:
        - !include
            type: string
            path: file1.key
        - !include
            type: string
            path: file2.key

More information on all three options is available in our documentation.

Enjoy your more-secure sites!

Oct 12 2017
Oct 12

Platform.sh aims to be a complete solution for web development and hosting, while at the same time offering the flexibility to slot into your own development tools and methodologies. That's a tricky balance to strike at times: Providing a complete solution while offering maximum flexibility at the same time.

One area where we generally favor flexibility is in your local development environment. You can use whatever local development tools you're most comfortable with: MAMP, WAMP, VirtualBox, Docker, or just a native local install of your needed tools.

For those who fear analysis-paralysis from so many choices, though, we've decided to start reviewing and green-lighting recommended tools that we've found work well. And the first local development tool we can recommend is Lando.

Lando is a Docker-based local development environment that grew out of Kalabox, a VirtualBox-based local dev tool for Drupal. Lando is much more flexible and lighter-weight than a virtual machine-based solution, and has direct support for a variety of systems including Drupal, Laravel, Backdrop, and WordPress. It even goes beyond PHP with support for Node.js, Python, and Ruby as well, just as we do.

Like Platform.sh, Lando is controlled by a YAML configuration file. Although being Docker-based it cannot directly mimic how a Platform.sh project works, it can approximate it reasonably well.

We've included a recommended Lando configuration file in our documentation. It's fairly straightforward and easy to adapt for your particular application. It's also possible to synchronize data from a Platform.sh environment to your local Lando instance in just a few short commands. Lando's own documentation provides more details on how to trick out your local system with whatever you may need.

We still believe in allowing you to pick your own development workflow, so you don't have to change anything if you already have a workflow that works for you; if you want our advice, though, Lando is a solid option that should get you up and running locally in minutes, while Platform.sh handles all of your staging and production needs.

Sep 08 2017
Sep 08

One of our most requested features is better built-in monitoring and notification support for user projects. We've just made it easy monitor your projects' health.

Many of the applications we host have a tendency to use up disk space faster than expected with cache and temp data, and when a disk gets full applications tend to misbehave. And really, no one wants misbehaving applications.

We are therefore happy to report that we now offer health notification checks on all Platform Professional projects, at no extra cost.

Health notifications can be sent via email, Slack, or Pager Duty. Any time disk space drops below 20% or 10%, or when it goes back up (because you cleared up space or increased your disk size), a notification will get sent to whatever destinations you have configured. For example, to get email notifications you can simply run the following command using the Platform CLI tool:

platform integration:add --type health.email --from-address [email protected] --recipients [email protected] --recipients [email protected]

Then, any time one of those thresholds is crossed, both [email protected] and [email protected] will be emailed. (As a side note, your email address is still webmaster? Neat! That's so retro...)

Slack and Pager Duty can be configured in the same way. See the documentation for all the details.

For now disk space is the only notification that gets generated but we plan to add more health checks in the future. Until then, sleep tight knowing that your disk won't start misbehaving with your knowledge.

Jun 07 2017
Jun 07

One of the requests we've gotten in the past few months is the ability to customize the HTTP headers that get sent with static assets. For requests coming from a PHP or Ruby application it's easy enough to send any headers you want, but for static files there was no way to customize the headers. While that seems like an obscure and nerdy feature it's actually quite important. Custom headers are necessary for supporting atypical file types, for CORS security measures, or for "Same-Origin" restrictions to prevent click-jacking.

So we said to ourselves, "selves, we try to be a flexible host, we should just add that feature." And ourselves responded "OK, let's do that."

And it's now available on all new projects, too.

On all new projects you can now specify additional headers to send in your .platform.app.yaml file. Those can apply to all files (say for a Same-Origin or CORS header) or selectively by file extension or any other regular expression. For instance, the following lines will add an X-Frame-Options header to every static file.

web:
    locations:
        "/":
            # ...
            headers:
              X-Frame-Options: SAMEORIGIN

Again, though, that applies only to static files; for responses from your application you can still set whatever headers you need directly in code. See the documentation for more details, and the provided example.

For now this feature is only available for newly created projects. We'll be rolling out updates to existing projects over time. If you want to use it before that just file a support ticket and we'll bump your project to the head of the line.

May 25 2017
May 25

Platform.sh has always prided itself on offering our customers as much flexibility to control their own projects as we can. What language to use, what services to use, how the server should be configured, what applications to run, all of these are under the user's control. We even allow users to set various control variables and environment variables per-environment.

And now there's even another way to set them, via your application configuration file.

Platform.sh's variable support is designed to allow users to set per-environment configuration (such as API keys for 3rd party services) as well as to control aspects of the environment. Some applications, though, have their own environment variables they rely on for various reasons, such as to set a dev/prod toggle or control a build process. Those generally shouldn't vary by environment.

For that reason it's now possible to set variables from .platform.app.yaml. Those values will be tracked in Git just like the rest of your code base, keeping all of the important bits in the same place.

If you're using PHP, you can even use this system to set php.ini values. Need to change your memory limit? Set a prepend file? Control the error reporting level? That can all be done now directly from the .platform.app.yaml file.

For environment variables that should change per-environment or contain sensitive information the current mechanism (setting variables through the UI or using the CLI tool) nothing changes. Your current workflow is fine.

Apr 18 2017
Apr 18

At Platform.sh, we believe that all websites deserve to be secure, fast, and feature-rich, and that it should be easy to have all three. Secure has always meant that a site is encrypted using SSL, which is why we’ve never charged for an SSL certificate. Fast means using HTTP/2, which we added support for earlier this year, but most browsers only support HTTP/2 over SSL. And feature-rich means allowing the full range of newer web functionality such as geolocation, access to media devices, or notifications, many of which browsers are now only permitting over SSL connections. You know what? The modern web only works properly with SSL so let’s cut out the middleman. Let’s Encrypt everything.

We’re happy to announce automatic support for Let’s Encrypt SSL certificates on every production site on Platform.sh Professional, at no charge.

Starting today for all new projects, on every deploy we will automatically provision and install an SSL certificate for you using the free Let’s Encrypt service. You don’t have to do anything. It will just be there.

For existing projects, we're bringing that functionality online in batches to avoid overwhelming the Let's Encrypt servers. We expect to finish getting through them all within the next week or two. If you're about to bring a site live and want to make sure you get Let's Encrypt functionality before that, just file a support ticket and we'll bump you to the front of the line.

Wait, what does this mean for my site?

If you currently just have HTTP routes defined in your routes.yaml file, then as of your next deploy HTTPS requests will be served as HTTPS requests rather than being redirected to HTTP. Both will “just work”.

If you want to serve your entire site over HTTPS all the time (and yes, you do), simply change all http:// routes in your routing file to be https://. That will automatically redirect HTTP requests to HTTPS going forward.

See the Routes section of the documentation for more details, but really, there’s not many details beyond that. It just works.

What about Platform.sh Enterprise?

Most Platform.sh Enterprise sites are served through a Content Delivery Network already, in which case the SSL certificate is handled by the CDN. This change has no impact on Platform.sh Enterprise customers.

Neat! So what should I do?

You don’t have to do anything. HTTPS just works now. As above, you can configure your site to use HTTPS exclusively by adding the letter "s" to your routes.yaml file in a few places. (We told you it was easy.)

Of course, now that you know your site will use SSL, you also know it will be using HTTP/2. All SSL-protected sites on Platform.sh use HTTP/2. HTTP/2 is supported by nearly 80% of web browsers in the world. That makes it safe, and a good investment, to start optimizing your site for HTTP/2, layering in HTTP/2-specific capabilities like server push, and so forth.

Secure, fast, feature-rich, and easy. Welcome to Platform.sh!

Dec 16 2014
Dec 16

The world of PHP development is changing. The past few years have seen the rise of a new wave of frameworks, the birth of component-based architectures, the advent of Composer and Packagist.org, and even our own standards body, the PHP-Framework Interoperability Group. A new "Proudly Invented Elsewhere" culture (because everyone loves PIE, right?) has grown exponentially on the reusable community engine of PHP. The future of PHP is now unquestionably in assembling and using decoupled components, from a variety of different sources.

Glasgow Bridge by Moyan Brenn

Specializing and cooperating

For PHP developers this evolution means building with tools from a variety of different sources, not just from Drupal.

When I first started using Drupal nearly a decade ago, PHP was a very different place. Every framework or application was its own universe, and the differences between projects made it normal to buy into one universe or another and hard to get to know many of them.

Drupal, my universe of choice, prided itself on being not really an application nor just a framework. This allowed it to be a "jack of all trades", even if master of none. Getting really, really good at Drupal could serve a developer very well because it let them find creative solutions to many different challenges. At Palantir.net, we had a saying for a while that "Drupal may not always be the best solution, but it is almost always a solution." We could take on almost any project, CMS-ish or framework-ish, knowing that one really good tool really well.

That's not the case anymore. The PHP world – Drupal included – has come together dramatically in recent years and the market has shifted. Drupal itself has evolved, too, to be less a framework and more a first-class Content Management Platform. As it's done so, it has ceded some "pure framework" use cases while pure frameworks have had a renaissance in the form of Symfony2, Zend Framework 2, Aura, Laravel, and other younger projects. The days when knowing just Drupal (or Joomla, or WordPress, or Symfony, or whatever) was all one needed to know are fading. Instead, the future is purpose-built tools born out of common components.

Calgary Peace Bridge by Dave Bloggs

I can work with that!

So where does that leave Drupal 7 developers? It turns out it leaves them with Drupal 8, a child of this new era of PHP. Drupal 8's architectural redesign was driven in a large part, very deliberately, by a desire to bring it more in line with The New PHP, a robust library of tools ready for developers to remix them. The conceptual difference between Drupal, as embodied in Drupal 8, and other major PHP projects is vastly smaller than ever before, providing PHP coders two key advantages: it makes it easier for non-Drupal developers to learn Drupal and it makes it easier for Drupal developers to learn non-Drupal systems.

That's good for everyone for many reasons. For one, it can serve as a solution to Drupal's talent shortage: the easier it is to train experienced, accomplished developers on Drupal, the easier it is for organizations with Drupal sites to find and hire people to maintain them. The most common feedback I get from other PHP developers upon seeing Drupal 8 is "Huh, cool, I can work with that!" I've even gotten that feedback from non-PHP developers, who are another untapped (but now tappable) resource to grow the pool of Drupal service providers the market has been asking for.

On the flipside, this all opens up new opportunities for Drupal developers outside of Drupal. Heretical as it may be to say, Drupal is not the answer to all problems. As software developers, our job is to solve problems, not to solve problems only with Drupal. The more tools we have in our kit, the more problems we can solve effectively. When Drupal isn't the right tool, we need to be able to pivot quickly to the one that fits the job. Pivoting from Drupal 8 and back again will be far easier than ever before.

Navajo Bridge by Glyn Lowe

The future is now

At Palantir, we've already started that process. Last year, we built a REST-centric decoupled CMS platform for a media client using Drupal 7 and the Silex micro-framework, which is based on the same Symfony components as Drupal 8. Earlier this year we built a low-touch site for a client using Sculpin, a PHP-based static site generator that uses some Symfony components but more importantly Twig, the same templating engine used by Drupal 8 (and Symfony, and Oro Platform, Sylius, Foxycart, eZ Publish, phpBB, Piwik, and others). As I write this, my main client project is a two-install all-Symfony project.

Could Drupal have been used for those cases? Quite possibly, but it would not have been a good fit. And that's OK. Look at the commonalities of those projects: Drupal 8, Symfony, Silex, and Sculpin. All four can use Twig templates. All four are built on similar service architectures. All four use, or can use, any of the 30,000 PHP packages available on Packagist. Three of them use the same request/response/routing pipeline. That means any of our engineers can move with relative ease between those systems with only limited retraining. We can use the right tool for the job without having to learn several completely different tools. To do all this all developers only have to learn one main tool: Modern PHP.

The future of PHP is much more heterogeneous than it has been in the past. Of course, such potential is only useful if we’re prepared for it. Every platform has its strengths and weaknesses, and as professional developers we have a responsibility – to ourselves, our clients, and our employers – to know what those are so we can recognize and use the right tool for the job. Modern PHP ties the whole population of solutions together.

Sunset bridge by F Mira

Be the bridges connecting the islands

Two years ago I called on developers to Get Off the Island and attend conferences outside of their comfort space: to meet new faces, learn new things, and open their minds. I've been pleased to see many doing so, with Drupal developers showing up and presenting at general PHP conferences and general PHP developers attending and presenting at DrupalCons and Drupal Camps.

In 2015, let's take that a step further: Don't just learn; Build.

Make it your goal in 2015 to build and launch something meaningful with a new PHP tool. If you're traditionally a Drupal developer, build an all-Symfony project. If you normally use Symfony, build a Zend Framework-based project. If Zend Framework is your home turf, build and ship a project using Aura. If Aura is your happy place, see what Laravel has to offer. If Laravel is your go-to tool, launch a site with Drupal 8. Or even combine pieces of all of them – you can do that now – each playing to its strengths. Get out of your comfort zone … discover a bigger comfort zone!

But don't just build; Teach. Document what you learn by going through the new process. Blog your experiences with your new tools. Share your new insights with your colleagues at your company, at your user group, at conferences.

And don’t just teach; Help. Be a part of The New PHP by helping to build it and the community around it. Be the bridges that are bringing the islands of PHP together, and become a better developer in the process.

Let's learn, let’s teach, let’s help and let’s all build something good together.

Guest Dossier

Image credits

bridge_navajo.jpg - Image by Glyn Lowe - https://www.flickr.com/photos/glynlowe/7183121469 - License https://creativecommons.org/licenses/by/2.0/

bridge_peace_calgary.jpg - Image by Dave Bloggs - https://www.flickr.com/photos/davebloggs007/15267799517 - License https://creativecommons.org/licenses/by/2.0/

bridge_glasgow.jpg - Image by Moyan Brenn - https://www.flickr.com/photos/aigle_dore/4019285756 - License https://creativecommons.org/licenses/by-nd/2.0/

bridge_sunset.jpg - Image by F Mira https://www.flickr.com/photos/fhmira/5019343521 - License https://creativecommons.org/licenses/by-sa/2.0/

Dec 16 2014
Dec 16

The world of PHP development is changing. The past few years have seen the rise of a new wave of frameworks, the birth of component-based architectures, the advent of Composer and Packagist.org, and even our own standards body, the PHP-Framework Interoperability Group. A new "Proudly Invented Elsewhere" culture (because everyone loves PIE, right?) has grown exponentially on the reusable community engine of PHP. The future of PHP is now unquestionably in assembling and using decoupled components, from a variety of different sources.

Glasgow Bridge by Moyan Brenn

Specializing and cooperating

For PHP developers this evolution means building with tools from a variety of different sources, not just from Drupal.

When I first started using Drupal nearly a decade ago, PHP was a very different place. Every framework or application was its own universe, and the differences between projects made it normal to buy into one universe or another and hard to get to know many of them.

Drupal, my universe of choice, prided itself on being not really an application nor just a framework. This allowed it to be a "jack of all trades", even if master of none. Getting really, really good at Drupal could serve a developer very well because it let them find creative solutions to many different challenges. At Palantir.net, we had a saying for a while that "Drupal may not always be the best solution, but it is almost always a solution." We could take on almost any project, CMS-ish or framework-ish, knowing that one really good tool really well.

That's not the case anymore. The PHP world – Drupal included – has come together dramatically in recent years and the market has shifted. Drupal itself has evolved, too, to be less a framework and more a first-class Content Management Platform. As it's done so, it has ceded some "pure framework" use cases while pure frameworks have had a renaissance in the form of Symfony2, Zend Framework 2, Aura, Laravel, and other younger projects. The days when knowing just Drupal (or Joomla, or WordPress, or Symfony, or whatever) was all one needed to know are fading. Instead, the future is purpose-built tools born out of common components.

Calgary Peace Bridge by Dave Bloggs

I can work with that!

So where does that leave Drupal 7 developers? It turns out it leaves them with Drupal 8, a child of this new era of PHP. Drupal 8's architectural redesign was driven in a large part, very deliberately, by a desire to bring it more in line with The New PHP, a robust library of tools ready for developers to remix them. The conceptual difference between Drupal, as embodied in Drupal 8, and other major PHP projects is vastly smaller than ever before, providing PHP coders two key advantages: it makes it easier for non-Drupal developers to learn Drupal and it makes it easier for Drupal developers to learn non-Drupal systems.

That's good for everyone for many reasons. For one, it can serve as a solution to Drupal's talent shortage: the easier it is to train experienced, accomplished developers on Drupal, the easier it is for organizations with Drupal sites to find and hire people to maintain them. The most common feedback I get from other PHP developers upon seeing Drupal 8 is "Huh, cool, I can work with that!" I've even gotten that feedback from non-PHP developers, who are another untapped (but now tappable) resource to grow the pool of Drupal service providers the market has been asking for.

On the flipside, this all opens up new opportunities for Drupal developers outside of Drupal. Heretical as it may be to say, Drupal is not the answer to all problems. As software developers, our job is to solve problems, not to solve problems only with Drupal. The more tools we have in our kit, the more problems we can solve effectively. When Drupal isn't the right tool, we need to be able to pivot quickly to the one that fits the job. Pivoting from Drupal 8 and back again will be far easier than ever before.

Navajo Bridge by Glyn Lowe

The future is now

At Palantir, we've already started that process. Last year, we built a REST-centric decoupled CMS platform for a media client using Drupal 7 and the Silex micro-framework, which is based on the same Symfony components as Drupal 8. Earlier this year we built a low-touch site for a client using Sculpin, a PHP-based static site generator that uses some Symfony components but more importantly Twig, the same templating engine used by Drupal 8 (and Symfony, and Oro Platform, Sylius, Foxycart, eZ Publish, phpBB, Piwik, and others). As I write this, my main client project is a two-install all-Symfony project.

Could Drupal have been used for those cases? Quite possibly, but it would not have been a good fit. And that's OK. Look at the commonalities of those projects: Drupal 8, Symfony, Silex, and Sculpin. All four can use Twig templates. All four are built on similar service architectures. All four use, or can use, any of the 30,000 PHP packages available on Packagist. Three of them use the same request/response/routing pipeline. That means any of our engineers can move with relative ease between those systems with only limited retraining. We can use the right tool for the job without having to learn several completely different tools. To do all this all developers only have to learn one main tool: Modern PHP.

The future of PHP is much more heterogeneous than it has been in the past. Of course, such potential is only useful if we’re prepared for it. Every platform has its strengths and weaknesses, and as professional developers we have a responsibility – to ourselves, our clients, and our employers – to know what those are so we can recognize and use the right tool for the job. Modern PHP ties the whole population of solutions together.

Sunset bridge by F Mira

Be the bridges connecting the islands

Two years ago I called on developers to Get Off the Island and attend conferences outside of their comfort space: to meet new faces, learn new things, and open their minds. I've been pleased to see many doing so, with Drupal developers showing up and presenting at general PHP conferences and general PHP developers attending and presenting at DrupalCons and Drupal Camps.

In 2015, let's take that a step further: Don't just learn; Build.

Make it your goal in 2015 to build and launch something meaningful with a new PHP tool. If you're traditionally a Drupal developer, build an all-Symfony project. If you normally use Symfony, build a Zend Framework-based project. If Zend Framework is your home turf, build and ship a project using Aura. If Aura is your happy place, see what Laravel has to offer. If Laravel is your go-to tool, launch a site with Drupal 8. Or even combine pieces of all of them – you can do that now – each playing to its strengths. Get out of your comfort zone … discover a bigger comfort zone!

But don't just build; Teach. Document what you learn by going through the new process. Blog your experiences with your new tools. Share your new insights with your colleagues at your company, at your user group, at conferences.

Be a part of The New PHP by helping to build it and the community around it. Be the bridges that are bringing the islands of PHP together, and become a better developer in the process.

Let's learn, let’s teach, and let’s all build something good together.

Guest Dossier

Image credits

bridge_navajo.jpg - Image by Glyn Lowe - https://www.flickr.com/photos/glynlowe/7183121469 - License https://creativecommons.org/licenses/by/2.0/

bridge_peace_calgary.jpg - Image by Dave Bloggs - https://www.flickr.com/photos/davebloggs007/15267799517 - License https://creativecommons.org/licenses/by/2.0/

bridge_glasgow.jpg - Image by Moyan Brenn - https://www.flickr.com/photos/aigle_dore/4019285756 - License https://creativecommons.org/licenses/by-nd/2.0/

bridge_sunset.jpg - Image by F Mira https://www.flickr.com/photos/fhmira/5019343521 - License https://creativecommons.org/licenses/by-sa/2.0/

Mar 05 2013
Mar 05

I've posted a status update for the WSCCI Initiative, and where we stand as of Drupal 8 Feature Freeze. We've gotten a ton done, but we still have a ton to do before we can really close the deal on Drupal 8 being an earth-shattering success. If you want Drupal 8 to be an earth-shattering success, see that post for the various places that we could use help right now, and how you can get involved.

Mar 04 2013
Mar 04

Drupal 8 Feature Freeze has come and gone. 2 years of hard work by dozens of people later, it's a good time to pause and take stock of where the Web Services and Context Core Initiative is, and what's left to do.

Short version: Holy crap we did a lot! Holy crap there's a lot left to do!

Drupal 8's first year of development didn't see much in the way of new features or radical changes. A lot of bugs were being fixed, a lot of groundwork being laid, but the really heavy lifting didn't really come in until February 2012. That's when the decision was made to fully adopt Symfony2 Components as the basis for Drupal 8's core pipeline. In hindsight, it was one of the biggest and, I believe, best decisions we've made in Drupal in years.

The work to date

WSCCI basically forked into SCOTCH and WSCCI at that point, and WSCCI rebooted. That means nearly everything that has been accomplished has, realistically, been done just in the past year. Let's see what that is:

  1. Adopted the Symfony2 HttpKernel component and model for handling web requests.
  2. Rebuilt the routing system to be mostly-decoupled from menus.
  3. The new routing system supports routing by path, HTTP method, HTTP vs HTTPS, and basic mime-based routing, as well as pluggable additional filters. That's vastly more flexible than before.
  4. The new routing system can very easily be swapped out for a non-SQL backend.
  5. A new, more flexible access control system that supports multiple access checks on a single route.
  6. A new serialization system to render any entity into a standard format, such as Atom, JSON, or HAL, and all without having to write support for every format, entity, and field combination in existence.
  7. A new rest.module, which offers any new-style Entity in the system up as a RESTful resource using any available serialization format. (See Klausi's writeup for more detail on all of the fun things that entails.)
  8. REST/serialization support for Views, to easily expose any View as a serialized resource collection.
  9. Introduced the Guzzle library to core, a full-featured HTTP 1.1 client that is far more powerful and flexible than drupal_http_request().

Although not strictly part of WSCCI per se, we've also had a knock-on effect throughout Drupal of paying down gobs of code debt, refactoring many core systems to be cleanly injected services, introducing a Dependency Injection Container to help make code more unit testable, and lots of general API improvements. As a side effect, core is far more modular and it's now far easier to swap out entire swaths of core without hacking core. Score!

Some of those feature requests have been open in some form or another for years. (consider rendering into multiple formats, which was first opened in 2007.) The impact of these improvements is huge, and far reaching. Drupal 8 will be a viable Hypermedia server, something it never has been before. Combined with all of the other work that's gone into Drupal 8, and Drupal's long history as a CMS, we are looking at a platform that can take the web by storm.

Wait, who did that?

A lot of people have been involved in WSCCI to date. I couldn't list out everyone's name, because that would take a long time, but I especially want to highlight a couple of people:

And of course the ever-amazing Kat Bailey, who had the courage to dive into the depths of Drupal's bootstrap pipeline from which few return to tell the tale.

I also want to give a shout-out to Acquia, who has provided funding for both Klaus and Lin to work on the serialization and REST modules for Drupal 8 for the past several months. Without that, we would likely have ended up with a powerful new routing system and nothing to actually use it for, you know, web services. There's still work to do there, however, so if you would find it easier to help make Drupal 8 the best REST server on the market by sponsoring a developer than cracking open an IDE yourself, Klaus and Lin are still looking for additional funding to keep them on the job.

Awesome! So we're done, right? Right?

Eh, not hardly. Moving Drupal to a new foundation isn't a one-year job. We've built a new basement, poured the concrete, added plumbing, built the house framework, and installed windows. That's huge, but we still need to put down the new floors, plaster the new interior walls, paint, and buy furniture.

In short, it's time to port Drupal to Drupal 8.

The WSCCI team is still meeting weekly in IRC, every Tuesday at noon US Eastern Time in #Drupal-WSCCI. If you want to get involved, that's a great place to start. Our project manager, Ashleigh Thevenet, is posting regular updates on the WSCCI group with our current hit-list of high-priority items. A high-impact issue list is also available on the Core Initiatives page.

In short, you know what you need to do. :-)

In broad strokes, here's the major work that still needs to be done:

  1. Finish separating Routes from Menus. We have an initial way forward that needs people to help with it, but will need to figure out how to go the rest of the way.
  2. Fully implement HAL as our preferred format for Hypermedia resources.
  3. Properly handle Entity References when importing an entity via REST, which would enable full round-tripping and therefore REST-based Deploy.module-like functionality. (All together now, "Oooo...")
  4. Move our session handling system over to Symfony's, so we can eliminate tons of redundant code.
  5. Finish the migration to a UrlGenerator, which unifies all of our outgoing-path logic into a single flexible, cachable, pluggable pipeline.
  6. Finish porting our outgoing requests from drupal_http_request() to Guzzle.
  7. The big one: Merge page caching and block caching to use HTTP caching and ESI, based on the new Symfony fragment pipeline that was developed in a large part in response to Drupal's needs.

And finally... move all of our page callbacks and routes over to the new routing system as controller classes. This is the most easily crowdsourced part, but we need to complete the first point above before we can really pounce on it.

So no, we're not done. Not by a long shot.

OK, so what's this got to do with me?

Everything! Drupal 8 is poised to kick ass. But it's not going to kick ass unless we close the deal. We need your help to help close the deal.

Come by the WSCCI channel any time to get up to speed. We'll also be working on documentation soon for converting page callbacks to controllers (stay tuned). And don't forget that there's a massive world-wide sprint coming up next weekend! Is there one in your area? Are you going to it? You should be. This is a great opportunity to get involved in core, and in WSCCI specifically. Several WSCCI regulars will be available from their home cities if you need help.

If you're coming to DrupalCon Portland (you are, aren't you?), there are also going to be pre- and post-conference sprints in which WSCCI is participating. Want to get your Drupal on for even longer? Sign up for Portland sprinting and let's crank out as many patches as we can!

Drupal 8 ain't finished until it's finished. Let's get stuff done.

Feb 26 2013
Feb 26

The REST team believes it's in Drupal's best interest to back off from supporting PUT requests directly in the rest.module. While it is still supportable by the core routing system, the semantics of PUT and the complexity of Drupal entities are just too incompatible to be viable at this time.

Feedback on this decision is welcome in the announcement thread until 4 March, but at this time we don't believe it is possible to support well.

Feb 01 2013
Feb 01

We're attempting to integrate Assetic into Drupal, and also eliminate the global drupal_add_css/js() functions. This would pave the way for smarter asset bundling & aggregation than we've ever had before, as well as let you use coffeescript, SASS, LESS, or any other format Assetic supports natively in your modules and themes. It also lends itself toward supporting partial page rendering, which will be critical for the SCOTCH initiative and ESI support.

Without help, this patch is very unlikely to make it in. The issue summary contains an itemized list of things that need help, including fun sorting algorithms, filesystem interactions, and overall DX questions. Something for everybody!

If you're interested in helping, contact Sam Boyer (sdboyer on Twitter and IRC) directly or post in the issue.

Jan 15 2013
Jan 15

The Drupal 7 menu system is responsible for a huge number of tasks: Routing, links, actions, local tasks, access control, etc. Menu links are being split off to entities for Drupal 8. Routing is now a separate Symfony-based system. But what about the rest of it?

There have been a number of issues discussing this tricky question. The most recent is probably the best place to start. This is a question we need to nail down soon, as there are a lot of moving parts that we need to know where they're landing when the dust settles.

Jan 14 2013
Jan 14

Round 3 of the revised Drupal 8 routing system is ready for final testing. Please weigh in now. This verison is an update from the previous "Nested Matcher" approach, which has merged with the Symfony CMF Routing component that we are now using. It offers a couple more features, fewer moving parts, and less Drupal-specific code.

Check the issue summary and the latest patch, and let's get it landed so we can move on to the next big task.

Oct 30 2012
Oct 30

The new routing system doesn't allow for arbitrary implicit dynamic tail path elements. (Basically, $args = func_get_args(); becomes useless in a controller/page callback.) There are currently no plans to fix that, as it is not particularly useful and is a slight security risk. A discussion to confirm that decision is up until Friday, in case there's some compelling reason we must spend time reimplementing that.

Oct 29 2012
Oct 29

I've posted a proposed REST API for entities over in the WSCCI group. It addresses the question of PUT idempotence, revisions, forward revisions, hooks, and so forth, based on discussion with a number of people and even a guest appearance by the co-editor of the HTTP 2.0 draft specification.

Feedback is welcome, but timeboxed to this coming Friday.

Oct 12 2012
Oct 12

The Web Services Initiative (WSCCI) is holding two (yes, 2) sprints at upcoming camps. If you are coming to either camp, please sign up and come help.

Kat Bailey will be hosting a sprint at PNW Drupal Summit in Seattle, October 20-12st.

Larry Garfield will be hosting a sprint at BADCamp in Berkeley, CA, November 2-4th.

A list of what we have left to tackle before freeze is available. It's not short, so we need dedicated help. Join us!

Oct 03 2012
Oct 03

A new Symfony-based routing system has been added to core. The intent is to replace hook_menu() with it as page callbacks are ported over, and as the SCOTCH initiative comes up to full steam. See the change notice for more details.

There are a number of important follow-up issues that need attention, however, to fully flesh out the system, as well as improve the DX of registering routes. Your attention and input is requested:

Aug 29 2012
Aug 29

I've posted a writeup of the battle plans for WSCCI coming out of DrupalCon Munich. It includes a number of tasks, as well as the people who have volunteered to work on those tasks. If you're one of those people, please make time to work on what you volunteered to work on. :-) If you are not one of those people, this is an excellent time to become one.

Aug 15 2012
Aug 15

The Web Services and Context Initiative has a number of outstanding patches that are in urgent need of review, as they are turning into blockers for other work, have been sitting for a while, or both. :-) If you have any spare time, thoughtful review of the following would be greatly appreciated:

http://drupal.org/node/1606794 - The new routing system. Architecture is in place, some functionality is in place, but not all of it. Feedback now or forever hold your peace. :-)
http://drupal.org/node/335411 - We have this great new Symfony session system in core, complete with improved flash messages that were improved specifically for us. We should be using it. There's a patch waiting review. It will need some refactoring with recent developments but reviews now are still valuable.
http://drupal.org/node/1728666 - Although modules can now expose "bundles" (Symfony equivalent of modules), they're not able to override core services as much as we want them to. This attempts to do so. (It's also a good small introduction to Symfony if anyone is looking for that...)

Jul 04 2012
Jul 04

The new routing system being developed by the WSCCI initiative is ready for its first round of reviews. It's still a work in progress but needs feedback on the architecture. See the issue for more details. This is a very high-priority blocker for the WSCCI and SCOTCH initiatives.

Jun 14 2012
Jun 14

A small team met in Paris at the office of Commerce Guys from June 3-5th to discuss Drupal's web services serialization and syndication format needs. In short, "OK, we are going to have all of this fun new routing capability, now what do we do with it?" More specifically, how do we go about serializing Drupal data for consumption by remote programs (either other Drupal sites as in the case of content staging, other non-Drupal sites, client-side applications, or mobile apps), and what protocols and APIs do we make available to manipulate that data?

In attendance were:

The raw notes from the sprint are available as a Google Doc, but as usual are rather disjoint and likely won't make sense if you weren't there. A more complete report is below.

Executive summary

There are no clear and obvious winners in this space. All of the available options have different serious limitations, either in their format compatibility with Drupal, their access protocols (or lack thereof), or available mature toolchain.

Our recommendation at this time is to make JSON-LD our primary supported web services data format. It is quite flexible, and supports the self-discovery capabilities that we want to support. What it appears to lack is the set of tools and standards provided by the Atom and AtomPub specifications, which provide everything we want except for an actual data payload format. For use cases where the capabilities of Atom (such as Pubsubhubbub support) are necessary, wrapping JSON-LD strings in an Atom wrapper is ugly but technically possible. Alternatively, the JCR/PHPCR XML serialization format can serve as a forward-looking XML-based serialization when Atom functionality and true hypermedia are required.

This will require changes to the Entity system, most of which are already in progress. However, this provides new impetus to complete these changes in a timely manner. In short:

  • "Fields" get renamed to "Properties", and become the one and only form of data on an Entity. Any non-Property data on an Entity will not be supported in any way (except for IDs).
  • Properties become classed objects and include what is currently fields plus what is currently raw entity data (e.g., {node}.uid).
  • Entity Reference (or similar) gets moved into core.
  • All entity relationships are considered intrinsic on one side (the side with a reference field) and extrinsic on the other (the side referenced). That is, all relationships are mono-directional.
  • Every relationship (may) have a virtual Property assigned to the entity that is linked to, which stores no data but provides a mechanism to look up "all entities that reference to me". That is, back-references.
  • Content metadata (eg the sticky bit on nodes, og memberships, etc.) is implemented as a foreign entity with reference.
  • The responsibility for entity storage will be moved from the Field/Property level to the Entity level. That is, we eliminate per-field storage backends.

Background

There are two broad categories of web services to consider: s2s (Server to Server, Drupal or otherwise) and s2c (Server to Client, where client could be a mobile app, web app, client-side editor like Aloha or CreateJS, etc.). There is of course plenty of overlap. Both markets have different existing conventions, which frequently are not entirely compatible as they have different histories and priorities.

Entity API Revisions

In order to support generic handling of Entity->serialized translations, we need to standardize and normalize how entities are structured. Currently in Drupal 7 entities are largely free-form naked data structures. While Fielded data has a semi-regular form, its API is inadequate and much data is present on an entity via some other means. In order to handle serialization of entities, we need to either:

  1. Allow modules to implement per-property, per-serialization format bridge code. That would result in n*m elements that would need to get written by someone (whether hooks or objects or plugins or whatever).
  2. Provide a single standard interface by which all relevant data on an entity can be accessed, so that a generic implementation may be written to handle all Property types.

Given the extremely high burden the first option would place on module developers, we felt strongly that the second option would be preferable and result in better DX.

Ongoing work on the "Entity Property Metadata" in core effort has already begun this process. What we describe here is not a radical change, but more a tweaking of ongoing work.

The renaming of "Fields" to "Properties" is largely for DX. The word "field" means three different things in Drupal right now: A data fragment on an entity, a column in an SQL table, and a part of a record in Views. With Views likely moving into core for Drupal 8, eliminating one use of the term will help avoid confusion.

We therefore have a data model that looks as follows:

Entity [ Property [ PropertyItem [ primitive data values ] ] ]

Where "Property" was called in Drupal 7 "Field" and PropertyItem was called in Drupal 7 an "Item". This is largely just a rename.

That is, an Entity object is a glorified array of Property objects. A Property object is a glorified array of PropertyItem objects. A PropertyItem object contains some number of primitive values (strings and numbers), but no nested complex data structures. (An array or stdClass object may be PHP-serialized to a string as now, but the serialization system will treat that as an opaque string and not support any additional sub-value structure.

Additionally, each Entity class and Property class will be responsible for identifying its metadata on demand via a method. That is, much of the information currently captured in hook_entity_info() will move into a metadata() method of the Entity class; the information currently captured in hook_field_schema() and some of that captured in hook_field_info() will move into a metadata() method of the Property class. That allows the necessary information be available where it is needed, without having to pre-define giant lookup arrays. It also allows for that information to vary per-instance, such as field schema already does now.

Entity and Property classes will implement PHP magic methods for easier traversal. A preliminary, partial, demonstration-only implementation is as follows:

<?php
class Entity implements IteratorAggregate {// Keyed array of Property objects
protected $properties;public function getProperties() {
   return
$properties;
}public function
getIterator() {
   return new
ArrayIterator($this->properties);
}public function
__get($name) {
  
// returns the Property named $name
  
return $this->properties[$name];
}public function
__set($name, $value) {
  
// sets the Property named $name
}
}interface
PropertyInterface {
// Returns the stuff that was in hook_field_schema().
public function metadata();
}interface
ReferencePropertyItemInterface extends PropertyInterface {
public function
entity();
}class
Property implements PropertyInterface, ArrayAccess, IteratorAggregate {// Indexes array of PropertyItem objects
protected $items;public function offsetGet($offset) {
  
// Returns a PropertyItem object.
  
return $this->items[$offset];
}
// On Properties, as a convenience, the [0] is optional. If
// you just access a value name, you get the 0th item. That is
// useful for properties that you know for sure are single-value. However,
// because the [] version is always there this will never fatal out the way it
// would if the data structure itself actually changed.
public function __get($name) {
   return
$this->items[0][$name];
}public function
getIterator() {
   return new
ArrayIterator($this->properties);
}
}class
Node extends Entity {
// Convenience method.
public function author() {
  
// Could also micro-optimize and call offsetGet(0).
  
return $this->author[0]->entity;
}
}interface
PropertyItemInterface { }class PropertyItem implements PropertyItemInterface {// The internal primitive values
protected $primitives;public function __get($name) {
    return
$this->primitives[$name];
}public function
processed($name) {
  
// This is pseudo-code only; the real implementation here will not call any functions directly
   // but use something injected as appropriate.  We have not figured out that level of detail yet.
  
return filter_format($this->primitives[$name]);
}
}class
ReferencePropertyItem extends PropertyItem implements ReferencePropertyItemInterface {
public function
entity() {
  
// Look up the ID of the entity we're referencing to, load it, and return it.
}
}
// Individual properties can totally add their own useful methods as appropriate. This is encouraged.
class DateProperty extends PropertyItem {
public function
value() {
   return new
DateTime($this->properties['date_string'], new DateTimezone($this->properties['timezone']));
}
}class
ReferencedPropertyItem extends PropertyItem implements ReferencedPropertyItemInterface {
public function
getReferences() {
  
// Returns a list of all entities that referenced TO this entity via this property.
}
}
// For values that do not store anything, but calculate values on the fly
interface CalculatedPropertyInterface { /* ... */ }$entity = new Entity();
foreach (
$entity as $property) {
// $property is an instance of Property, always.
foreach ($property as $item) {
 
// $item is an instance of PropertyItemInterface, always.  if ($item instanceof ReferencePropertyItemInterface) {
  
$o = $value->entity();
  
// do something with $o.
 
}
 
// Do something with $item.
}
}
// Usage examples$node
// __get() returns a Property
 
->updated
   
// ArrayAccess returns a PropertyItem
   
[0]
     
// __get() returns the internal primitive string called timezone.
     
->timezone;$node
// __get() returns a Property
 
->updated
   
// ArrayAccess returns a PropertyItem
   
[0]
     
// __set() assigns the value of the internal timezone primitive.
     
->timezone = 'America/Chicago';$node
// __get() returns a Property
->author
 
// If you leave out [], it defaults to 0
  // The entity method returns the referenced user object
 
->entity()
   
// If you leave out the [], it defaults to 0
   
->name
     
// The actual string name value.
     
->value;// In practice, often use the utility methods.
$node->author()->label();
?>

By default, when you load an entity you will specify the language to use. That value will propagate down to all Properties and Items, so by default module developers will not need to think about language in each call. If a module developer does care about specific languages further down, additional non-magic equivalent methods will be provided that allow for specific languages to be specified. The details here will have to be worked out with Gabor and the rest of the i18n team.

When defining a new Entity Type, certain Properties may be defined as mandatory for the structure; the Title and Updated properties for nodes, for instance. These properties will be hard-coded into the definition of the Entity Type, and may be stored differently by the entity storage engine. However, to consuming code that is processing an entity there is no difference between a built-in Property and a user-added Property. An Entity Type is also free to define itself as not allowing user-added Properties (effectively mirroring non-fieldable entities today).

While objects are not as expensive in PHP as they once were back in the PHP 4 days, the amount of specialty method calls above MAY lead to performance concerns. We do not anticipate it being a large issue in practice. If so, more direct, less magical methods may be used in high-cost critical path areas (such as calling offsetGet() directly rather than using []) to minimize the overhead.

Extrinsic information

Currently there are a number of values on some entities that, in this model, do not "belong to" that entity. The best example here are the sticky and promote flags on nodes. This data is properly extrinsic to the node, but for legacy reasons are still there. That is information that often should not be syndicated. Organic Group membership is another example of extrinsic data.

We discussed the need to therefore represent extrinsic data separately from Properties. However, developing yet-another-api seemed like a dead-end. Instead, we decided that the way to resolve intrinsic vs. extrinsic data was as follows:

  • All Properties are intrinsic to the Entity the Property is on.
  • A ReferencedProperty (backlink) is not a part of the Entity itself. That is, the Entity knows about the existence of such linked data, but the data in question is extrinsic to it.
  • Extrinsic data on an Entity should be implemented as a separate Entity type, which references to the Entity it describes.
  • If data links two entities but is extrinsic to both, then an intermediary entity may have a reference to both entities.

For example, core will introduce a BinaryAttribute entity type (or something like that). It will contain only two values: its own ID, and a single-value ReferenceProperty to the entity it describes. There will be two bundles provided by core: Sticky (references to nodes) and Promoted (references to nodes). To mark a node as Sticky, create a BinaryAttribute entity, bundle Sticky. To mark it unsticky, delete that entity. Same for Promoted. (Note: Additional metdata fields, such as the date the sticky was created or the user that marked it sticky, may also be desired. Unmarking an entity may also be implemented not by deleting the flagging entity but having a boolean field that holds a yes or no. That is an implementation detail that we did not explore in full as it is out of scope for this document.)

After speaking with Workbench Moderation maintainer Steve Persch, we concluded that some such metadata (such as published) is relevant not to entities but to entity revisions. Fortunately that is easy enough to implement by providing an EntityReference Property and an EntityVersionReference Property, the latter of which references by version ID while the former references by entity ID. Which is appropriate in which case is left as an exercise to the implementer of each case.

Although not the intent, this effectively ports the functionality of Flag module into core, at least for global flags. Only a UI would be missing (which is out of scope for this document). It also suggests how per-user flags could be implemented: A UserBinaryAttribute entity type that references to both an entity and a user (specifically).

These changes would open up a number of interesting possibilities, such as much more robust content workflows, the ability to control access to the Sticky and Promoted values without "administer nodes" or even without node-edit capability, etc. We did not fully explore the implications of this change, other than to decide we liked the possibilities that it opened.

Implications for services

The primary relevant reason for all of this refactoring is to normalize the the data model sufficiently that we can automate the process of serializing entities to JSON or XML. As above, we want to avoid forcing nm (or worse, ij*k) necessary bridge components for each entity type, property type, and output format. It also neatly separates intrinsic and extrinsic data in a way that allows us to include it or not as the situation dictates. The other DX and data modeling benefits that it implies are very nice gravy, and exploring the implications of those changes and what additional benefits they offer is left as an exercise for the reader (and for later teams).

Syndication formats

With a generically mappable data model, we then turned to the question of what to do with it. We identified a number of needs and use cases that we needed to address:

  • Exposing entities in a machine-readable format
  • Exposing collections of entities in a machine-readable format
  • Exposing entities in both raw form suitable for round-tripping back to a node object and in a "processed" format that is safe for anonymous user consumption. (E.g., with public:// URLs converted to something useful, with text formats applied to textual data, etc.)
  • A way to resolve relationships between entities such that multiple related entities could be syndicated in a single string or a series of related strings (linked by some known mechanism). E.g., A node with its author object embedded or not, or with tags represented as links to tag entities or as inline term objects.
  • Every entity (even those that do not have an HTML URI) need to have a universally accessible canonical URI.
  • Semantically correct use of HTTP hypermedia information (GET,POST, DELETE, etc. PUT and PATCH are quirky and of questionable use.)
  • Data primitives we must support: String, int, float, date (not just as a string/int), URI (special case of string), duration.
  • Compound data types (Fields) are limited to being built on those data primitives; includes "string (contains html)".
  • Data structure inspection: Given "node of type X", what are its fields? Given "field of type Y", what are its primitives?
  • While we were not directly concerning ourselves with arbitrary non-entity data, a format that lent itself to other uses (such as Views that did not map directly to a single entity) is a strong benefit.

Given that set of requirements, we evaluated a number of existing specifications. All of them had serious deficiencies vis a vis the above list.

CMIS CMIS is a big and robust specification. However, it consists mainly of optional feature sets, which would allow us to implement only a portion of CMIS and punt on the rest of it. CMIS' data model is very traditional: Documents are very simple creatures, and are organized into Directories to form a hierarchy.

CMIS also includes a number of different bindings for manipulation. The basic web bindings are designed to closely mimic HTML forms, right down to requiring a POST for all manipulation operations. They also required very specific value structures that we felt did not map to how Drupal entities are structured nor to how Drupal forms work, making it of little use.

CMIS also includes bindings for AtomPub, which is a much more hypermedia-friendly high-level API for communication. CMIS has no innate concept of internationalization, so that needs to be emulated in the data with separate data properties.

CMIS is based in XML, although a JSON variant is in draft form at this time.

Atom Atom is an XML-based envelope format. That is, it does not define the format of a single item. Rather, it defines a mechanism for collecting a set of items together feed-like, for defining links to related content, for paging sets of content, etc. The structure of a single content item is undefined, and may be defined by the user. Atom also includes a number of useful extensions, in particular Pubsubhubbub and Tombstone, which allow for push-notifications and push-deletion. That is extremely useful for many content sharing and content syndication situations.

There are a couple of JSON-variants of Atom, including one from Google, but none seem to have any market traction.

AtomPub AtomPub is a separate IETF spec from Atom the format, although the two are designed to complement each other. AtomPub defines the HTTP level usage of Atom, as well as the semantic meaning of various links to embed within an Atom document. (e.g., link rel="edit", which defines the link to use to POST an updated version of the document or collection.) JSON-LD JSON-LD is not quite a format as much as it is a meta-format. Rather, it's a way to represent RDF-like semantic information in a JSON document, without firmly specifying the structure of the JSON document itself. That makes it much more flexible than CMIS in terms of supporting an existing data specification (like Drupal's), but also means we need to spend the time to define which semantics we're actually using. That includes determining what vocabularies to use where, and which to custom-define for Drupal.

Our initial thought was to try to map entities as above to CMIS, so that we could leverage the AtomPub bindings that were already defined. We figured that would result in the least amount of "we have to invent our own stuff". However, we determined that would be infeasible. Documents in CMIS are too limited to represent a Drupal entity, even in the more rigid form described above. We would have to map individual Properties to CMIS Documents, and Entities and Language would have to be represented as Directories. However, that would make representing an Entity in a single XML string quite difficult, and/or require custom extensions to the CMIS format. At that point, there's little advantage to using CMIS in the first place.

While CMIS may work very well for low-complexity highly-organized data such as a Document Repository like Alfresco, it is less well suited to highly-complex but low-organization data such as Drupal.

Atom/AtomPub, while really nice and offering almost everything we want, are missing the one most important piece of the puzzle: They are by design mum on the question of the actual data format itself.

We then turned to JSON-LD. It took a while to wrap our heads around it, but once we understood what it was trying to do we determined that it was possible to implement a Drupal entity data model in JSON-LD. While not the most pristine, it is not too bad. We developed a few prototypes before speaking with Lin Clark and ending up with the following prototype implementation:

{
"@context": {
  "@language": "de",
  "ex": "http://example.org/schema/",
  "title": "ex:node/title",
  "body": "ex:node/body",
  "tags": "ex:node/tags"
},
"title": [
  {
    "@value": "Das Kapital"
  }
],
"body":
[
  {
    "@value": "Ich habe Durst."
  }
],
"tags":
[
  {
    "@id": "http://example.com/taxonomy/term/1",
    "@type": "ex:TaxonomyTerm/Tags",
    "title": "Wasser"
  }
]
}

This is still preliminary and will certainly evolve but should get the basic idea across.

Of specific note, JSON-LD has native support for language variation. It's imperfect, but should be adequate to represent Drupal's multi-lingual entities.

Defining what the semantic vocabularies in use will be is another question. Our conclusion there is that the schema information provided by a Property implementation should also include the vocabulary and particular semantics that field should use.

That is not actually as large a burden as it sounds. In most cases it will be reasonably obvious, once standards are developed. For instance, date fields should use ical. In cases where multiple possible vocabularies exist, a Property can make it variable in the same fashion as the field schema itself is currently variable, but only on Property creation (just as it is now). If no vocabulary is specified, it falls back to generic default "text" and "number" semantics.

As a nice side-effect, this bakes RDF-esque semantics into our data model at a basic level, which should keep all of the semantic-web fans happy. It also will ease integration with CreateJS, VIE, and similar rich client-side editors that can integrate with Aloha, which is already under consideration for Spark and potentially Drupal 8.

This does not, of course, provide the REST/hypermedia semantics we need. As far as we were aware there is no JSON-based hypermedia standard. There are a couple of suggested proposed standards, but none that is actually a standard standard.

Symfony Live Addendum

Following the Sprint, Larry attended Symfony Live Paris, the main developer conference for the Symfony project at which he was a speaker. There, Larry was able to do some additional research with domain experts in this area.

One of the keynote speakers was David Zuelke of Agave, and the topic was (surprise!) REST and Hypermedia APIs. The session video is not yet online, but the slides were 90% the same as this presentation. It is recommended viewing for everyone in this thread. In particular, note the hypermedia section that starts at slide 95. One of the key take-aways from the session (echoed in other articles that we've checked as a follow-up), is that we're not the only ones with trouble mapping JSON to Hypermedia. It just doesn't do it well. XML is simply a better underlying format for true-REST/HATEOAS functionality, and the speaker encouraged the audience to respond to knee-jerk JSON preference with "tough, it's the wrong tool."

After the session, David acknowledged that the situation is rather suboptimal right now (XML Is better for Document representation, JSON for Object representation; and we need to do both).

Larry also spoke with Henri Bergus, Midgard developer, author of CreateJS, and future DrupalCon Munich speaker. Henri pointed out that the JCR/PHPCR standard (Java Content Repository and its PHP-based port) does have its own XML serialization format independent of CMIS. After a brief look, that format appears much more viable than CMIS although additional research is needed. It is defined in the JCR specification, section 6.4.

Assuming JCR/PHPCR's XML serialization can stand up to further scrutiny, particularly around multi-lingual needs, it would be a much more viable option for true-HATEOAS behavior as we could easily wrap it in Atom/AtomPub for standardized linking, flow control, subscription, and all of the other things HATEOAS and Atom offer. While Atom would allow JSON-LD to be wrapped as the payload as well, wrapping JSON-LD in Atom would require both producers and consumers to implement both an Atom and a JSON-LD parser, regardless of their language. That would be possible, but sub-optimal.

At this time we are not firm on this conclusion, but given the varied needs of different use cases we are leaning toward recommending the use of PHPCR-in-Atom and JSON-LD as twin implementations. Attempts at implementing both options will likely highlight any flaws in either approach that cannot be determined at this time. Whether one or both ends up in core vs. contrib should be simply a matter of timing and resource availability, as being physically in core should not provide any magical architectural benefit. (If it does, then we did it wrong.) That said, the Entity API improvements discussed above are the same regardless of format, and offer a variety of additional benefits as well.

Acknowledgements

Thank you to everyone who attended the sprint, including those who just popped in briefly during the Tuesday biweekly WSCCI meeting. Thank you also to Steve Persch and Lin Clark for their impromptu help. Thanks to Acquia for sponsoring travel expenses for some attendees. And of course thank you to Commerce Guys for being such wonderful hosts, and to Sensio Labs for bringing Larry in to speak for Symfony Live as that's how we were able to have this sprint in the first place.

Onwards!

Jun 04 2012
Jun 04

As a content management system, Drupal has long recognized that security is a key component of managing content. A CMS in which the wrong people can edit a page is not very useful. Drupal has a number of systems to manage access.

First is Drupal's permissions system. Any code can easily check to determine if a given user has a given permission in the system and alter its behavior, or reject the user entirely if not.

Second -- and more importantly -- is the node access system.

Drupal's node access system actually has two parts: Runtime access control and the grants system. Both have existed for years, but saw major improvements in Drupal 7 as a result of a meeting at DrupalCon Szeged. Let's have a look at how these systems work to keep content secure.

First, we need to understand what sort of access we are controlling. There are five operations that a user can perform on a node: Create, Read (view), Update, Delete, and List. The first four are obvious, but the final, List, is a special case of "Read". It's a special case because, when listing nodes, we have tighter constraints on how we can determine access. We'll see why in a moment.

In code, anywhere we want to check a user's access to a node we can simply call node_access($op, $node), where $op is one of "view", "update", "delete", or "create" and $node is a fully-loaded node object or, in the case of create, the node type to create. To check a user other than the current user, passing a user object as an optional 3rd parameter. node_access() will return either boolean true or boolean false.

node_access() runs a number of checks. If the user has "bypass node access" permission, it returns true unconditionally. In earlier versions of Drupal this permission was part of the overly-broad "administer nodes" permission. If the user does not have "access content" permission, it will return false unconditionally. That is useful for limiting a site to authenticated users only.

Now we get to the meat of the node access system, where we can take control. First, node_access() invokes hook_node_access() with the node, operation, and user in question. In Drupal 6 and earlier, hook_access() was a pseudo-hook, and only the module that defined the node type could implement it. For Drupal 7, that was replaced with hook_node_access() to allow all modules to influence the access of any node. Implementations of hook_node_access() may return one of three constant values: NODE_ACCESS_ALLOW, NODE_ACCESS_DENY, or NODE_ACCESS_IGNORE. Returning nothing is equivalent to returning NODE_ACCESS_IGNORE.

A given user will have access to perform an operation, say "update," if and only if at least one module returns NODE_ACCESS_ALLOW and no module returns NODE_ACCESS_DENY. Because Drupal will deny access to a node by default, it is rare for access control modules to explicitly deny access, as that prevents other modules from granting access.

Familiar node access permissions such as "edit page content" or "edit own article content" are provided by the node module's implementation of its own hook and?as of Drupal 7?are provided for all node types regardless of the module that created them. They can now also be disabled by setting the node_permissions_$type variable to false. That's useful if you are using some other access logic and want to entirely disable the permission-based controls. (Just remember to re-enable them in your uninstall hook.)

As an example, let's say we want to allow a user to edit (but not delete) their own nodes, but only for an hour after it's posted. That allows the user to correct typos they find right after hitting submit (which always happens) or to delete an inappropriate comment, but not go back and change it days later. The simple implementation is shown in the following code snippet. As we can see, if one of our conditions fails we do not deny access; we simply do not grant access and let other modules decide what to do. (Making the affected node types and time period configurable is left as an exercise for the reader.)

<?php
function example_node_access($node, $op, $account) {
  if ($op == 'edit' 
      && $node->uid == $account->uid 
      && $node->created > (REQUEST_TIME - 3600)) {
    return NODE_ACCESS_ALLOW;
  }
  return NODE_ACCESS_IGNORE;
}
?>

This is incredibly powerful, and yet was completely impossible in Drupal 6 unless our module defined the node type in the first place. In Drupal 7, it's one if() statement.

There are two other checks made:

  1. If no module decided to either grant or deny access to a node, we check to see if the node is unpublished and the user has the "view own unpublished content" permission. If so, and it's the user's own node, permission is granted.
  2. The check of last resort is the node access grants system. This is Drupal's most fine-grained?but least understood?access system. It is also the only one that can handle List operations. Consider the case of listing the 10 most recent forum posts on a site where not all users have access to all forums. If we simply queried for the 10 most recent nodes of type forum, we'd get a number of nodes that the user shouldn't be seeing since he doesn't have access to view them. If we wanted to filter those out, we would have to load all those nodes and then run node_access() on each of them in turn, after which we have fewer than 10 nodes left! We could then query again for more nodes and repeat the check, but could easily find ourselves back in the same situation.

List operations are tricky because the filtering must be done in the search operation itself, such as a database query. To that end, Drupal includes a node_access table in the database that acts as a giant materialized access lookup table. Any module may inject rules into it, keyed by group (usually, but not always, user ID) and node ID.

The node_access() function will check that table directly for a record, but in practice the grants system is more useful when running listing queries. Listing queries for nodes must always use the db_select() query builder and be tagged with the "node_access" tag. That in turn fires hook_query_node_access_alter(), which allows the node module to add an extra join to the query itself to filter out nodes that the user doesn't have access to according to the node_access table. An even better approach is to always run node listing queries using the EntityFieldQuery query builder. Although it does not offer as much fine-grained control as SQL, it will translate the listing query into any storage engine in use for nodes; SQL, MongoDB, Cassandra, etc. It will also apply the node_access filter appropriate to that backend.

The grants system has several hooks of its own, which we don't have space to cover here. For now, understand that in practice it is only useful for controlling view listings. Update and Delete usually both require a full node anyway, which makes it easier, and more flexible, to just use hook_node_access().

Drupal 7's access control system has improved dramatically. The introduction of a single hook in the right place has made possible functionality that didn't exist before, and modules such as Organic Groups and Workbench are already taking advantage of it to build more efficient and powerful functionality. It will be exciting to see what other new capabilities module developers come up with in contrib.

Jun 02 2012
Jun 02

The "Kernel patch", also known as step one for the Web Services and Context Core Initiative, has been committed. This has significant implications for Drupal's request-handling process. See the announcment for more details and follow-ups.

May 29 2012
May 29

One of the hardest concepts for new users of Drupal to grasp is that Drupal is not a page management system but a content management system. That may seem obvious, but it's actually an important distinction; Drupal doesn't think in terms of pages but in pieces of content. Not every page that will be displayed is a piece of content, and not every piece of content will have a page.

Figure 1, Drupal StackTo understand how to build pages in Drupal, it's important to understand the Drupal Stack; that is, the order in which a Drupal architect will approach a build-out. There are actually four separate and distinct concepts at work: Content, Data slices, Layout, and Display. (See Figure 1.)

  • By Content, we mean the data model within Drupal. Generally this means nodes and fields—Drupal's primary content object—but in Drupal 7, it refers to any entity and entity type. This is the structure of the content we are managing and how different pieces of content inter-relate;
  • Data slices are sections of our content that we want to display. In the vast majority of cases, these are defined using the Views module. Each of these slices ends up as a displayable chunk, such as a block or a page callback;
  • Layout refers to how those displayable chunks are placed on the page. Generally there are two ways that is done in Drupal: either with the core blocks system or with the contributed Panels module;
  • Display, finally, refers to the visual look and feel of the site. This is the theme layer, which is composed of HTML, CSS, and sometimes Javascript.

As with most parts of web design, designing in layers allows for a more robust, flexible, and sustainable end product. Let's look at each in turn.

Content

Content, of course, is the lifeblood of any content management system. Drupal goes a step further than simply defining each piece of content as an opaque blob, however, and allows for a robust definition of "content". Content, in Drupal 6, consisted of the node system and the Content Construction Kit (CCK) contributed module. In Drupal 7, most of CCK has been moved into core. Content now encompasses any type of Entity, including nodes, comments, taxonomy terms, and so forth, and all may potentially have Fields.

All entity types support a concept called bundles: A bundle is simply a sub-class of an entity with a given configuration, including a given definition of fields. (Not all entities support fields, but most do.) All instances of that entity will conform to one and only one node type. For the node system, bundles are called node types and individual objects are called nodes. An entity type will usually have under a dozen bundles, but could have hundreds of thousands of objects (such as nodes) that conform to one bundle or another.

Fields are the primary datum in Drupal. A field represents a rich data element, that is, email address, photo, and date rather than primitive data types such as "string", "file", or "integer". Fields can be single-value or multi-value. Most entities (although not all) are composed of one or more fields. For instance, a node of type (bundle) Event could be composed of a title, a description (text field), price (number field), date (date field), contact person (email field), and multiple URLs to related events (multi-value link field).

Critically, fields can be shared between different entities. For instance, a node type for "staff" could have fields first name, last name, office, and supervisor. A node type for "faculty" could have fields first name, last name, office, and title (Professor, Assistant Professor, etc.). The first name, last name, and office fields have the same semantic meaning in both node types. Supervisor and Title, while both text fields, do not have the same semantic meaning. Sharing the first name, last name, and office fields between both node types means we can search those fields across different node types. We'll see how that works in a moment when looking at data slices.

Examining the data we want to manage and mapping it into the appropriate entities and bundles—including the right fields—is the first step to a successful Drupal site. That process could be a simple one hour of button pushing or be a significant, but important, part of project strategy depending on how large or complex the site is.

Data slices

Now that we have the structure of our data defined, we need some way to pull out the parts of it we want. In many cases we want to display a single piece of content on a page, with some formatting. In those cases nodes work well, as nodes automatically get a page created for them at node/[nid]. Very often, however, we want to show some subset of multiple objects; a summary view, a list of recent posts, a list of related objects, images in a gallery, etc. In all of these cases we want "a list of stuff", even if that list has only one item. And if we want a list, the Drupal go-to answer is the Views module.

Views can seem daunting at first, as it has many moving parts. However, Views has its own conceptual stack that makes it much more approachable: what, how, where.

Figure 2It helps to visualize the content in Drupal as a giant spreadsheet. That is not how it is stored, of course, but conceptually imagine all nodes in one giant table, with each column corresponding to a field. Some nodes will have data only in some columns/fields, depending on their type. (See Figure 2)

Showing the entire set of data in the site at once is rarely useful, of course, so Views supports three kinds of filter to whittle down the list to only what we want.

  • Filters to restrict the data set with hard coded values configured in the View;
  • Exposed filters to allow a site viewer to configure the filter, say with a drop down box, to change what gets selected;
  • Contextual filters (formerly known as arguments) to filter the View based on some information available when the View is run, such as the URL path or a node the View relates to.

Using filters, Views lets us control what gets displayed. For instance, we can filter our data set to show only nodes of type "event" (Figure 3), happening on certain days (Figure 4), and with a price less than $10 (Figure 5). We then tell Views what fields we want to display, in this case just the title, date, location, and price. (Figure 6). We exclude the body field for now as we just want to show a listing of events. We have now "sliced" our content down to just the pieces we want at the moment. We can also specify a sort order for our data; say, the first event by date first.

Figure 7Now that we have that raw data, we get to decide how it gets displayed. In Views, the how is controlled by style plug-ins. A style plug-in determines if that abstract spreadsheet of data is shown as a table, as a bulleted list, as a grid, or as a fancy Javascript-based slideshow. Views ships with a number of basic style plug-ins but dozens more are available and they are one of the most powerful parts of Views for site builders. They allow things that do not visually look the slightest bit like a list to still be generated using the same robust query tools as simple lists. (See Figure 7.)

Figure 8Finally, with a rendered representation of our data, we need to decide where to put it. In Views, that is handled via display plug-ins. Views comes with a few display plug-ins, to expose a View on its own page, as a block, as an RSS feed, etc. Others are provided by many other modules. (See Figure 8.)

The important point here is that each of those questions—what, how, where—is answered independently. We can usually change one without changing another. The same underlying content-finding mechanism can be used to build nearly any form of visual display. A blog roll, a photo gallery, a random page image, a calendar grid, all of these can be built with the same swiss army knife and customized to a site's specific needs. The ability to fully leverage Views is one of the most important reasons to spend the necessary time getting the right data model in place.

Layout

Figure 9Once we have all our display components built, where do we put them? That is a question of page layout. Drupal offers one layout mechanism out of the box. By default, there is a single page layout consisting of a primary content area, controlled by the URL path, and several secondary content regions. (See Figure 9.) Into each of these regions can be placed Blocks. Blocks are, in Drupal-speak, secondary display components. In general they do not relate to the primary content area, nor to the context in which they are shown such as what "section" of a site they are in. Frequently, that is enough, and is exactly what we want. Some blocks, especially those produced by Views, have their own mechanisms to determine contextual information such as the page URL, the node being displayed in the main area, and so forth. There is no consistent system for that, but for a great many cases, it is sufficient.

The layout can vary somewhat between pages by controlling when given blocks will or will not be displayed. Drupal core provides a number of ways to control when a given block will display, such as only on certain node types, only for certain users, or only on certain path pages. For more complex controls, the Context module provides an alternate interface for placing and controlling the visibility of blocks. Between core and Context an impressive number of visual designs can be squeezed into a single layout template. Some layouts are more complex for core blocks to handle, however. In some cases there may be no primary content region at all. On a home page, for instance, there may be no content, just roll ups of upcoming events, latest news items, a slideshow, and so on. All of those can be built easily with Views, but how to get them on the page? For high-end layouts there is the Panels module.

Figure 10Panels takes the page rendering process and turns it around. Rather than building a main content area and then adding secondary content around it, Panels starts with a layout and then places into it chunks called Panes. Panes are essentially super-blocks. Like blocks they are displayable chunks, but unlike blocks they can be placed multiple times in a layout, can respond to context in the page such as the URL or information derived from it, and so on. (See Figure 10.) They even support much more robust caching options than blocks do, which can greatly help with performance.

In Panels, there is no primary content region. A page can be laid out to include one large pane for a node display, or a node can be broken up by field into different panes. Or there could be no primary content on the page at all, just a series of panes for different Views, showing different slices of our content. (See how it all stacks up?)

Display

Finally we come to the top layer of the stack: the display. This is the theme layer of Drupal, about which much is written elsewhere. Although the line between layout and display is sometimes fuzzy, where possible, it is good to try and treat them as separate.

One of Drupal's greatest strengths is how readily it can be reconfigured by an administrator with a few button clicks. That includes moving around blocks, placing new blocks, creating new layouts with Panels, adding fields to an existing node or other entity, and so on. A good theme addresses how a given page element should look, but not necessarily where it appears. That's because administrators can change where it appears at any moment, and they don't want their site's visual design to break down and cry as soon as they do so.

A good theme follows the principle of Sustainable Theming. That is, it doesn't specify precisely how every page should look. Rather, it defines the rules by which something should appear if it is in a given region. A good theme will display a menu block, for instance, in a sane fashion regardless of whether it is in the left sidebar, right sidebar, or a small panel pane on the home page.

"Sane fashion" may vary, depending on the region, but it should always work without breaking the page layout. Similarly, the theming treatment given to a node should withstand the administrator adding a new field to it at any time. It may not look completely awesome, but the new field should still "fit" visually in the page. Generally the best way to achieve that goal is to rely heavily on CSS rather than HTML markup for laying out a page, and not hand-crafting every template for its specific use case; as soon as that specific use case changes, the template no longer makes sense.

Bringing it together

That may seem like a lot to take in. Four layers, one of which has three more, plus multiple ways of laying out a page? I thought Drupal was supposed to be simple!

For most sites, it is. But even on a modest site, getting the most out of Drupal means understanding how to use the available tools to their best effect. By far, the most important step in building a Drupal site is figuring out the data model. A poorly thought out data model can result in a brittle, inextensible site that needs to be thrown out to add new features. A well-designed data model, on the other hand, can lend itself to building a surprising amount of functionality simply by pushing buttons in Views and placing a few blocks.

But what if your site starts with a design mockup, rather than a data model? That's fine. Go over the designs and identify: "How can I build this with Views?" "How can I place this with blocks?" "Will I need Panels for this page, or for all pages?" Working backwards from the design in this fashion can elicit a surprising amount of detail about a site's data model.

A good "measure twice, cut once" approach that considers the full Drupal stack and leverages each part to its full potential is they key to a successful project. Fortunately, it doesn't have to be difficult. With a good knowledge of the site needs, a data model can be built in a long afternoon. And with a good data model, most of the site can be built out using standard tools in a week, without ever touching a line of PHP.

May 24 2012
May 24

Drupal needs a new and better HTTP client for fetching data from remote services. mikeytown2 has put together a solid start to research on various options we could adopt. If you have experience with any of the mentioned libraries, your input would be greatly appreciated!

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web