Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
Apr 19 2018
Apr 19

Last week I wrote a Drupal module that uses face recognition to automatically tag images with the people in them. You can find it on Github, of course. With this module, you can add an image to a node, and automatically populate an entity_reference field with the names of the people in the image. This isn’t such a big deal for individual nodes of course; it’s really interesting for bulk use cases, like Digital Asset Management systems.

Automatic tags, now in a Gif.

I had a great time at Drupalcon Nashville, reconnecting with friends, mentors, and colleagues as always. But this time I had some fresh perspective. After 3 months working with Microsoft’s (badass) CSE unit - building cutting edge proofs-of-concept for some of their biggest customers - the contrast was powerful. The Drupal core development team are famously obsessive about code quality and about optimizing the experience for developers and users. The velocity in the platform is truly amazing. But we’re missing out on a lot of the recent stuff that large organizations are building in their more custom applications. You may have noticed the same: all the cool kids are posting about Machine Learning, sentiment analysis, and computer vision. We don’t see any of that at Drupalcon.

There’s no reason to miss out on this stuff, though. Services like Azure are making it extremely easy to do all of these things, layering simple HTTP-based APIs on top of the complexity. As far as I can tell, the biggest obstacle is that there aren’t well defined standards for how to interact with these kinds of services, so it’s hard to make a generic module for them. This isn’t like the Lucene/Solr/ElasticSearch world, where one set of syntax - indeed, one model of how to think of content and communicate with a search-specialized service - has come to dominate. Great modules like search_api depend on these conceptual similarities between backends, and they just don’t exist yet for cognitive services.

So I set out to try and explore those problems in a Drupal module.

Image Auto Tag is my first experiment. It works, and I encourage you to play around with it, but please don’t even think of using it in production yet. It’s a starting point for how we might build an analog to the great search_api framework, for cognitive services rather than search.

I built it on Azure’s Cognitive Services Face API to start. Since the service is free for up to 5000 requests per month, this seemed like a place that most Drupalists would feel comfortable playing. Next up I’ll abstract the Azure portion of it into a plugin system, and try to define a common interface that makes sense whether it’s referring to Azure cognitive services, or a self-hosted, open source system like OpenFace. That’s the actual “hard work”.

In the meantime, I’ll continue to make this more robust with more tests, an easier UI, asynchronous operations, and so on. At a minimum it’ll become a solid “Azure Face Detection” module for Drupal, but I would love to make it more generally useful than that.

Comments, Issues, and helpful PRs are welcome.

Jan 10 2018
Jan 10

I’m excited to announce that I’ve signed with Microsoft as a Principal Software Engineering Manager. I’m joining Microsoft because they are doing enterprise Open Source the Right Way, and I want to be a part of it. This is a sentence that I never believed I would write or say, so I want to explain.

First I have to acknowledge the history. I co-founded my first tech company just as the Halloween documents were leaked. That’s where the world learned that Microsoft considered Open Source (and Linux in particular) a threat, and was intentionally spreading FUD as a strategic counter. It was also the origin of their famous Embrace, Extend, and Extinguish strategy. The Microsoft approach to Open Source only got more aggressive from there, funneling money to SCO’s lawsuits against Linux and its users, calling OSS licensing a “cancer”, and accusing Linux of violating MS intellectual property.

I don’t need to get exhaustive about this to make my point: for the first decade of my career (or more), Microsoft was rightly perceived as a villain in the OSS world. They did real damage and disservice to the open source movement, and ultimately to their own customers. Five years ago I wouldn’t have even entertained the thought of working for “the evil empire.”

Yes, Microsoft has made nice movements towards open source since the new CEO (Satya Nadella) took over in 2014. They open sourced .NET and Visual Studio, they released Typescript, they joined the Linux Foundation and went platinum with the Open Source Initiative, but come on. I’m an open source warrior, an evangelist, and developer. I could see through the bullshit. Even when Microsoft announced the Linux subsystem on Windows, I was certain that this was just another round of Embrace, Extend, Extinguish.

Then I met Josh Holmes at the Dutch PHP Conference.

First of all, I was shocked to meet a Microsoft representative at an open source conference. He didn’t even have bodyguards. I remember my first question for him was “What are you doing here?”.

Josh told me a story about visiting startup conferences in Silicon Valley on behalf of Microsoft in 2007, and reporting back to Ballmer’s office:

“The good news is, no one is making jokes about Microsoft anymore. The bad news is, they aren’t even making jokes about Microsoft anymore.”

For Josh, this was a big “aha” moment. The booming tech startup space was focused on Open Source, so if Microsoft wanted to survive there, they had to come to the table.

That revelation led to the creation of the Microsoft Partner Catalyst Team. Here’s Josh’s explanation of the team and its job, from an interview at the time I met him:

“We work with a lot of startups, at the very top edge of the enterprise mix. We look at their toughest problems, and we go solve those problems with open source. We’ve got 70 engineers and architects, and we go work with the startups hand in hand. We’ll sit down for a little pair programming with them, sometimes it will be a large enough problem that will take it off on our own and we’ll work on it for a while, and we’ll come back and give them the code. Everything that we do ends up in Github under typically an MIT or Apache license if it’s original work that we’re doing on our own, or a lot of times we’re actually working within other open source projects.”

Meeting with Josh was a turning point for my understanding of Microsoft. This wasn’t just something that I could begrudgingly call “OK for open source”. This wasn’t just lip service. This was a whole department of people that were doing exactly what I believe in. Not only did I like the sound of this; I found that I actually wanted to work with this group.

Still, when I considered interviewing with Microsoft, I knew that my first question had to be about “Embrace, Extend, and Extinguish”. Josh is a nice guy, and very smart, but I wasn’t going to let the wool be pulled over my eyes.

Over the next months, I would speak with five different people doing exactly this kind of work at Microsoft. I I did my research, I plumbed all my back-channel resources for dirt. And everything I came back with said I was wrong.

Microsoft really is undergoing a fundamental shift towards Open Source.

CEO Sadya Nadella is frank that closed-source licensing as a profit model is a dead-end. Since 2014, Microsoft has been transitioning their core business from licensed software to platform services. After all, why sell a license once, when you can rent it out monthly? So they move all the licensed products they can online, and rent, instead of selling them. Then they rent out the infrastructure itself, too - hence Azure. Suddenly flexibility is at a premium. As one CTO put it, for Azure to be Windows-only would be a liability.

This shift is old news for most of the world. As much as the Hacker News crowd still bitches about it as FUD, this strategic direction has been in and out of the financial pages for years now. Microsoft has pivoted to platform services. Look at their profits by product over the last 8 years:

Microsoft profits by product, over year.

The trend is obvious: server and platform services are the place to invest. Office only remains at the top of the heap because it transitioned to SaaS. Even Windows license profits are declining. This means focusing on interoperability. Make sure everything can run on your platform, because anything else is to handicap the source of your biggest short- and medium-term profit. In fact, remaining adversarial to Open Source would kill the golden goose. Microsoft has to change its values in order to make this shift.

So much for financial and strategic direction; but this is a hundred-thousand-person company. That ship doesn’t turn on a dime, no matter what the press releases tell you. So my second interview question became “How is the transition going?" This sort of question makes people uncomfortable: the answer is either transparently unrealistic, or critical of your environment and colleagues. Over and over again, I heard the right answer: It’s freakin' hard.

MS has more than 40 years of proprietary development experience and institutional momentum. All of their culture and systems - from hiring, to code reviews, to legal authorizations - have been organized around that model. That’s very hard to change! I heard horror stories about the beginning of the transition, having to pass every line of contribution past the Legal department. I heard about managers feeling lost, or losing a sense of authority over their own team. I heard about development teams struggling to understand that their place in an OSS project was on par with some Rando Calrissian contributor from Kansas. And I heard about how the company was helping people with the transition, changing systems and structures to make this cultural shift happen.

The stories I heard were important evidence, which contradicted the old narrative I had in my head. Embrace, extend, extinguish does not involve leadership challenges, or breaking down of hierarchies. It does not involve personal struggle and departmental reorganization. The stories I heard evidenced an organization trying a real paradigm shift, for tens of thousands of people around the world. It is not perfect, and it is not finished, but I believe that the transition is real.

When you accept that Microsoft is trying to reorient its own culture to Open Source, suddenly all those “transparent” PR moves you dismissed get re-framed. They are accomplishments. It’s incredibly difficult to change the culture of one of the biggest companies in the world… but today, almost half of Azure users run Linux. Microsoft’s virtualization work made them the fifth largest contributor to the 3.x Linux kernel. Microsoft maintains the biggest project on Github (by contributor count). They maintain a BSD distribution and a Linux distribution. And a huge part of LXD (the container-based virtualization system for Linux) comes from Microsoft’s work with Canonical.

That’s impressive for any company. But Microsoft? It boggles the mind. This level of contribution is not lip-service. You don’t maintain a 15 thousand person community just for PR. Microsoft is contributing as much or more to open source than many other major players, who have had this in their culture from the start (Google, Facebook, Twitter, LinkedIn…). It’s an accomplishment, and it’s impressive!

In the group I’m entering, a strong commitment to Open Source is built into the project structure, the team responsibilities, and the budgeting practice. Every project has time specifically set aside for contribution; developers' connections to their communities are respected and encouraged. After a decade of working with companies who try to engage with open source responsibly, I can say that this is the strongest institutional commitment to “giving back” that I have ever seen. It’s a stronger support for contribution than I’ve ever been able to offer in any of my roles, from sole proprietor to CTO.

This does mean a lot more work outside of the Drupal world, though. I will still attend Drupalcons. I will still give technical talks, participate, and help make great open source communities for Drupal and other OSS projects. If anything, I will do those things more. And I will do them wearing a Microsoft shirt.

Microsoft is making a genuine, and enormous, push to being open source community members and leaders. From everything I’ve seen, they are doing it extremely well. From the outside at least, this is what it looks like to do enterprise Open Source The Right Way.

Jul 28 2017
Jul 28

I’ve learned something incredible as the PHP Track Chair for Drupalcon Vienna. The Drupal Association has no way to invite PHP speakers to Drupalcon.

This blew me away when I first learned about it. After all the work to bring mainstream PHP to Drupal core, after all the outreach to PHP-FIG, after all the talks Drupalists have given at major PHP conferences, how is this possible?

You see, basically every other PHP conference covers their speakers' travel and accommodation costs. Drupalcon doesn’t, and never has. Historically it has to do with Drupalcon’s identity as a community conference, rather than a professional one. But it means the best PHP speakers never get to Drupalcon.

On one hand that’s great for our project: our speakers are all passionate volunteers! They’re specialists who care deeply about the project. On the other hand, it contributes to isolated, “stay on the island” thinking. If the only speakers we hear are Drupalists, where do we get new insights? If the only people at the BoF or code sprint table are Drupalists, how do we leverage the strengths of the broader PHP community? How do we contribute back? How do we grow?

Every year, the lack of financial support holds back major PHP contributors from speaking at Drupalcon. The maintainers of Composer, PHPUnit, and Guzzle want to come to Drupalcon, but we don’t make it possible. These people built and maintain the cornerstones of Drupal. Why do we hold them at arm’s length?

This year, as Drupalcon PHP Track Chair, I’m in a position to make some changes. So I invited two notable PHP speakers to come and join us at the con: Sebastian Bergmann, author of PHPUnit, and Michelle Sanver, president of @phpwomen. Today I’m announcing a very special GoFundMe campaign to pay the travel and accommodation for these two exceptional contributors.

I believe in the benefits of closer cooperation with the PHP community.

I believe there’s a lot we can learn from these people, and a lot we can teach them too.

And I believe that I’m not the only one.

We’ve estimated costs conservatively; this is not a lot of money. Anything we collect above and beyond their needs will go to the Drupal Association, but let’s be honest with ourselves: this campaign isn’t just about bringing Sebastian and Michelle to Drupalcon. Your donation shows the Drupal Association that you want to welcome contributors from other communities. You prove to them that their constituents want to bring in this kind of speaker. When you donate, you stand up for the kind of community you believe in.

Please donate, share, and tweet the campaign today.

Jun 15 2017
Jun 15

One of the best parts of Drupal 8 is our shift to enterprise PHP coding structures. With tools like composer and Symfony’s structures like Events and Dependency Injection, Drupalists are learning to be great PHP developers, and vice-versa. Today, the fastest route to becoming a rock star Drupalist is through PHP.

I’m one of the PHP track chairs for Drupalcon Vienna, and this year our focus is better PHP === better Drupalists. How can better PHP make your life as a Drupal developer easier?

Do you like PHP 7? We want to hear about the technicalities of types, throwing all the things, and your favorite operators (mine is null coalesce, but full respect for you spaceship operator fans).

Have you seen the light of functional programming? Tell us why we should love higher orders with lambda functions and closures. Let’s hear the finer points of first class functions.

Do your tests bring all the bugs to the yard? We want to talk about it. Every method is a promise, and your tests make sure you keep your promises. We want sessions about test driven development in a drupal context, choosing the right test framework and scope, and how your real-world tests are saving you real-world time.

Have you written a composer library wrapper module yet? Submit a session about how composer is saving you lines of code.

Is your development environment fine-tuned for drupal excellence? Tell us how, and why.

We have only two weeks left until session submissions close! Get your session in now and help us make Drupal code something to be proud of.

Jun 07 2017
Jun 07

How do you import an RSS feed into entities with Drupal 8? In Drupal 6 and 7, you probably used the Feeds module. Feeds 7 made it easy (-ish) to click together a configuration that matches an RSS (or any XML, or CSV, or OPML, etc) source to a Drupal entity type, maps source data into Drupal fields, and runs an import with the site Cron. Where has that functionality gone in D8? I recently had to build a podcast mirror for a client that needed this functionality, and I was surprised at what I found.

Feeds module doesn’t have a stable release candidate, and it doesn’t look like one is coming any time soon. They’re still surveying people about what feeds module should even DO in D8. As the module page explains:

{%blockquote %} It’s not ready yet, but we are brainstorming about what would be the best way forward. Want to help us? Fill in our survey. If you decide to use it, don’t be mad if we break it later. {% endblockquote %}

This does not inspire confidence.

The next great candidate is Aggregator module (in core). Unfortunately, Aggregator gives you no control over the kind of entity to create, let alone any kind of field mapping. It imports content into its own Aggregated Content entity, with everything in one field, and linking offsite. I suppose you could extend it to choose you own entity type, map fields etc, but that seems like a lot of work for such a simple feature.

Frustrating, right?

What if I told you that Drupal 8 can do everything Feeds 7 can?

What if I told you that it’s even better: instead of clicking through endless menus and configuration links, waiting for things to load, missing problems, and banging your head against the mouse, you can set this up with one simple piece of text. You can copy and paste it directly from this blog post into Drupal’s admin interface.

What? How?

Drupal 8 can do all the Feedsy stuff you like with Migrate module. Migrate in D8 core already contains all the elements you need to build a regular importer of ANYTHING into D8. Add a couple of contrib modules to provide specific plugins for XML sources and convenience drush functions, and baby you’ve got a stew goin'!

Here’s the short version Howto:

1) Download and enable migrate_plus and migrate_tools modules. You should be doing this with composer, but I won’t judge. Just get them into your codebase and enable them. Migrate Plus provides plugins for core Migrate, so you can parse remote XML, JSON, CSV, or even arbitrary spreadsheet data. Migrate Tools gives us drush commands for running migrations.

2) Write your Migration configuration in text, and paste it into the configuration import admin page (admin/config/development/configuration/single/import), or import it another way. I’ve included a starter YAML just below, you should be able to copypasta, change a few values, and be done in time for tea.

3) Add a line to your system cron to run drush migrate -y my_rss_importer at whatever interval you like.

That’s it. One YAML file, most of which is copypasta. One cronjob. All done!

Here’s my RSS importer config for your copy and pasting pleasure. If you’re already comfortable with migration YAMLs and XPaths, just add the names of your RSS fields as selectors in the source section, map them to drupal fields in the process section, and you’re all done!

If you aren’t familiar with this stuff yet, don’t worry! We’ll dissect this together, below.

id: my_rss_importer
label: 'Import my RSS feed'
status: true

source:
  plugin: url
  data_fetcher_plugin: http
  urls: 'https://example.com/feed.rss'
  data_parser_plugin: simple_xml

  item_selector: /rss/channel/item
  fields:
    -
      name: guid
      label: GUID
      selector: guid
    -
      name: title
      label: Title
      selector: title
    -
      name: pub_date
      label: 'Publication date'
      selector: pubDate
    -
      name: link
      label: 'Origin link'
      selector: link
    -
      name: summary
      label: Summary
      selector: 'itunes:summary'
    -
      name: image
      label: Image
      selector: 'itunes:image[''href'']'

  ids:
    guid:
      type: string

destination:
  plugin: 'entity:node'

process:
  slug: stop-waiting-for-feeds-module-how-to-import-remote-feeds-in-drupal-8
title: title
  field_remote_url: link
  body: summary
  created:
    plugin: format_date
    from_format: 'D, d M Y H:i:s O'
    to_format: 'U'
    source: pub_date
  status:
    plugin: default_value
    default_value: 1
  type:
    plugin: default_value
    default_value: podcast_episode

Some of you can just stop here. If you’re familiar with the format and the structures involved, this example is probably all you need to set up your easy RSS importer.

In the interest of good examples for Migrate module though, I’m going to continue. Read on if you want to learn more about how this config works, and how you can use Migrate to do even more amazing things…

Anatomy of a migration YAML

Let’s dive into that YAML a bit. Migrate is one of the most powerful components of Drupal 8 core, and this configuration is your gateway to it.

That YAML looks like a lot, but it’s really just 4 sections. They can appear in any order, but we need all 4: General information, source, destination, and data processing. This isn’t rocket science after all! Let’s look at these sections one at a time.

General information

id: my_rss_importer
label: 'My RSS feed importer'
status: true

This is the basic stuff about the migration configuration. At a minimum it needs a unique machine-readable ID, a human-readable label, and status: true so it’s enabled. There are other keys you can include here for fun extra features, like module dependencies, groupings (so you can run several imports together!), tags, and language. These are the critical ones, though.

Source

source:
  plugin: url
  data_fetcher_plugin: file
  urls: 'https://example.com/feed.rss'
  data_parser_plugin: simple_xml

  item_selector: /rss/channel/item
  fields:
    -
      name: guid
      label: GUID
      selector: guid
    -
      name: title
      label: Item Title
      selector: title
    -
      name: pub_date
      label: 'Publication date'
      selector: pubDate
    -
      name: link
      label: 'Origin link'
      selector: link
    -
      name: summary
      label: Summary
      selector: 'itunes:summary'

  ids:
    guid:
      type: string

This is the one that intimidates most people: it’s where you describe the RSS source. Migrate module is even more flexible than Feeds was, so there’s a lot to specify here… but it all makes sense if you take it in small pieces.

First: we want to use a remote file, so we’ll use the Url plugin (there are others, but none that we care about right now). All the rest of the settings belong to the Url plugin, even though they aren’t indented or anything.

There are two possibilities for Url’s data_fetcher setting: file and http. file is for anything you could pass to PHP’s file_get_contents, including remote URLs. There are some great performance tricks in there, so it’s a good option for most use cases. We’ll be using file for our example. http is specifically for remote files accessed over HTTP, and lets you use the full power of the HTTP spec to get your file. Think authentication headers, cache rules, etc.

Next we declare which plugin will read (parse) the data from that remote URL. We can read JSON, SOAP, arbitrary XML… in our use case this is an RSS feed, so we’ll use one of the XML plugins. SimpleXML is just what it sounds like: a simple way to get data out of XML. In extreme use cases you might use XML instead, but I haven’t encountered that yet (ever, anywhere, in any of my projects). TL;DR: SimpleXML is great. Use it.

Third, we have to tell the source where it can find the actual items to import. XML is freeform, so there’s no way for Migrate to know where the future “nodes” are in the document. So you have to give it the XPath to the items. RSS feeds have a standardized path: /rss/channel/item.

Next we have to identify the “fields” in the source. You see, migrate module is built around the idea that you’ll map source fields to destination fields. That’s core to how it thinks about the whole process. Since XML (and by extension RSS) is an unstructured format - it doesn’t think of itself as having “fields” at all. So we’ll have to give our source plugin XPaths for the data we want out of the feed, assigning each path to a virtual “field”. These “fake fields” let Migrate treat this source just like any other.

If you haven’t worked with XPaths before, the example YAML in this post gives you most of what you need to know. It’s just a simple text system for specifying a tag within an unstructured XML document. Not too complicated when you get into it. You may want to find a good tutorial to learn some of the tricks.

Let’s look at one of these “fake fields”:

      name: summary
      label: Summary
      selector: 'itunes:summary'

name is how we’ll address this field in the rest of the migration. It’s the source “field name”. label is the human readable name for the field. selector is the XPath inside the item. Most items are flat - certainly in RSS - so it’s basically just the tag that surrounds the data you want. There, was that so hard?

As a side note, you can see that my RSS feeds tend to be for iTunes. Sometimes the world eats an apple, sometimes an apple eats the world. Buy me a beer at Drupalcon and we can argue about standards.

Fifth and finally, we identify which “field” in the source contains a unique identifier. Migrate module keeps track of the association between the source and destination objects, so it can handle updates, rollbacks, and more. The example YAML relies on the very common (but technically optional) tag as a unique identifier.

Destination

destination:
  plugin: 'entity:node'

Yep, it’s that simple. This is where you declare what Drupal entity type will receive the data. Actually, you could write any sort of destination plugin for this - if you want Drupal to migrate data into some crazy exotic system, you can do it! But in 99.9% of cases you’re migrating into Drupal entities, so you’ll want entity:something here. Don’t worry about bundles (content types) here; that’s something we take care of in field mapping.

Process

process:
  slug: stop-waiting-for-feeds-module-how-to-import-remote-feeds-in-drupal-8
title: title
  field_remote_url: link
  body: summary
  created:
    plugin: format_date
    from_format: 'D, d M Y H:i:s O'
    to_format: 'U'
    source: pub_date
  status:
    plugin: default_value
    default_value: 1
  type:
    plugin: default_value
    default_value: podcast_episode

This is where the action happens: the process section describes how destination fields should get their data from the source. It’s the “field mapping”, and more. Each key is a destination field, each value describes where the data comes from.

If you don’t want to migrate the whole field exactly as it’s presented in the source, you can put individual fields through Migrate plugins. These plugins apply all sorts of changes to the source content, to get it into the shape Drupal needs for a field value. If you want to take a substring from the source, explode it into an array, extract one array value and make sure it’s a valid Drupal machine name, you can do that here. I won’t do it in my example because that sort of thing isn’t common for RSS feeds, but it’s definitely possible.

The examples of plugins that you see here are simple ones. status and type show you how to set a fixed field value. There are other ways, but the default_value plugin is the best way to keep your sanity.

The created field is a bit more interesting. The Drupal field is a unix timestamp of the time a node was authored. The source RSS uses a string time format, though. We’ll use the format_date plugin to convert between the two. Neat, eh?

Don’t forget to map values into Drupal’s status and type fields! type is especially important: that’s what determines the content type, and nodes can’t be saved without it!

That’s it?

Yes, that’s it. You now have a migrator that pulls from any kind of remote source, and creates Drupal entities out of the items it finds. Your system cron entry makes sure this runs on a regular schedule, rather than overloading Drupal’s cron.

More importantly, if you’re this comfortable with Migrate module, you’ve just gained a lot of new power. This is a framework for getting data from anywhere, to anywhere, with a lot of convenience functionality in between.

Happy feeding!

Tips and tricks

OK I lied, there is way more to say about Migrate. It’s a wonderful, extensible framework, and that means there are lots of options for you. Here are some of the obstacles and solutions I’ve found helpful.

Importing files

Did you notice that I didn’t map the images into Drupal fields in my example? That’s because it’s a bit confusing. We actually have an image URL that we need to download, then we have to create a file entity based on the downloaded file, and then we add the File ID to the node’s field as a value. That’s more complicated than I wanted to get into in the general example.

To do this, we have to create a pipeline of plugins that will operate in sequence, to create the value we want to stick in our field_image. It looks something like this:

  field_image:
    -
      plugin: download
      source:
        - image
        - constants/destination_uri
      rename: true
    -
      plugin: entity_generate

Looking at that download plugin, image seems clear. That’s the source URL we got out of the RSS feed. But what is constants/destination_uri, I hear you cry? I’m glad you asked. It’s a constant, which I added in the source section and didn’t tell you about. You can add any arbitrary keys to the source section, and they’ll be available like this in processing. It is good practice to lump all your constants together into one key, to keep the namespace clean. This is what it looks like:

source:
  ... usual source stuff here ...
  constants:
    destination_uri: 'public://my_rss_feed/post.jpg'

Before you ask, yes this is exactly the same as using the default_value plugin. Still, default_value is preferred for readability wherever possible. In this case it isn’t really possible.

Also, note that the download plugin lets me set rename: true. This means that in case of a name conflict, a _0, _1, _2, _3 etc will be added to the end of the filename.

You can see the whole structure here, of one plugin passing its result to the next. You can chain unlimited plugins together this way…

Multiple interrelated migrations

One of the coolest tricks that Migrate can do is to manage interdependencies between migrations. Maybe you don’t want those images just as File entities, you actually want them in Paragraphs, which should appear in the imported node. Easy-peasy.

First, you have to create a second migration for the Paragraph. Technically you should have a separate Migration YAML for each destination entity type. (yes, entity_generate is a dirty way to get around it, use it sparingly). So we create our second migration just for the paragraph, like this:

id: my_rss_images_importer
label: 'Import the images from my RSS feed'
status: true

source:
  plugin: url
  data_fetcher_plugin: http
  urls: 'https://example.com/feed.rss'
  data_parser_plugin: simple_xml

  item_selector: /rss/channel/item
  fields:
    -
      name: guid
      label: GUID
      selector: guid
    -
      name: image
      label: Image
      selector: 'itunes:image[''href'']'

  ids:
    guid:
      type: string
  constants:
    destination_uri: 'public://my_rss_feed/post.jpg'

destination:
  plugin: 'entity:paragraph'

process:
  type:
    plugin: default_value
    default_value: podcast_image
  field_image:
    -
      plugin: download
      source:
        - image
        - constants/destination_uri
      rename: true
    -
      plugin: entity_generate

If you look at that closely, you’ll see it’s a simpler version of the node migration we did at first. I did the copy pasting myself! Here are the differences:

  • Different ID and label (duh)
  • We only care about two “fields” on the source: GUID and the image URL.
  • The destination is a paragraph instead of a node.
  • We’re doing the image trick I just mentioned.

Now, in the node migration, we can add our paragraphs field to the “process” section like this:

  field_paragraphs:
    plugin: migration_lookup
    migration: my_rss_images_importer
    source: guid

We’re using the migration_lookup plugin. This plugin takes the value of the field given in source, and looks it up in my_rss_images_importer to see if anything with that source ID was migrated. Remember where we configured the source plugin to know that guid was the unique identifier for each item in this feed? That comes in handy here.

So we pass the guid to migration_lookup, and it returns the id of the paragraph which was created for that guid. It finds out what Drupal entity ID corresponds to that source ID, and returns the Drupal entity ID to use as a field value. You can use this trick to associate content migrated from separate feeds, totally separate data sources, or whatever.

You should also add a dependency on my_rss_images_importer at the bottom of your YAML file, like this:

migration_dependencies:
  required:
    - my_rss_images_importer

This will ensure that my_rss_images_importer will always run before my_rss_importer.

(NB: in Drupal < 8.3, this plugin is called migration)

Formatting dates

Very often you will receive dates in a format other than what Drupal wants to accept as a valid field value. In this case the format_date process plugin comes in very handy, like this:

  field_published_date:
    plugin: format_date
    from_format: 'D, d M Y H:i:s O'
    to_format: 'Y-m-d\TH:i:s'
    source: pub_date

This one is pretty self-explanatory: from format, to format, and source. This is important when migrating from Drupal 6, whose date fields store dates differently from 8. It’s also sometimes handy for RSS feeds. :)

Drush commands

Very important for testing, and the whole reason we have migrate_plus module installed! Here are some handy drush commands for interacting with your migration:

  • drush ms: Gives you the status of all known migrations. How many items are there to import? How many have been imported? Is the import running?
  • drush migrate-rollback: Rolls back one or more migrations, deleting all the imported content.
  • drush migrate-messages: Get logged messages for a particular migration.
  • drush mi: Runs a migration. use --all to run them all. Don’t worry, Migrate will sort out any dependencies you’ve declared and run them in the right order. Also worth noting: --limit=10 does a limited run of 10 items, and --feedback=10 gives you an in-progress status line every 10 items (otherwise you get nothing until it’s finished!).

Okay, now that’s really it. Happy feeding!

“Feed me, Seymour!

Mar 30 2017
Mar 30

The Crellpocalypse in the Drupal world last week has shaken the entire community. This event and its handling have called our fundamental values and structures into question. We’ve had fights on social media, calls for Dries to step down, and valuable contributors stepping away from the community. I have friends on every side of the situation, but all I can think is: This seems like the perfect time for a singing, dancing, spandexed pageant about the Drupal community.

Twelve years of code, and singing the Drupalcon song with Dries and Larry is still one of my favorite memories.

Why? For those who don’t know, I’m one of the authors of the DrupalCon Prenote, the “pre-keynote” show that kicks off DrupalCon right before Dries' keynote. The organizer (and my officemate), Jeffrey A. “jam” McGuire and I have been living our own special version of the crisis (Read Jam’s post about taking sides on this here). Our friend Larry Garfield has been an enthusiastic part of the Prenote ever since his first appearance as “Lord Over-Engineering” at Drupalcon Austin. Dries has often played a special guest role, too. With Drupalcon Baltimore looming on the horizon, everything seems to be coming together in one awful moment full of painful reminders - and it’s just when we’re supposed to be cheering for “community.” That awful conjunction is what makes this next Prenote in Baltimore more important than ever.

I have a tremendous respect for how painful this whole situation is for everyone involved. This very public meltdown, which has already done tremendous material damage, is made even more painful by the personal friendships of the key people involved. Klaus, Dries, and Larry have been colleagues for more than a decade. Even if this was only a private falling out, it would have been a painful one. And this is a public explosion. I can’t imagine the emotional strain that each of them is under right now. Internet mob outrage is a terrible experience, made much worse when it comes from your friends and colleagues, directed at your friends and colleagues.

And this is exactly why we need a Prenote right now. Because this is terrible shit that we’re wading through, and the Prenote exists to remind us of why we should keep going. The Drupal community - not the specific leadership, but the agglomeration of people, practices, code, and rules - has a lot that’s worth fighting for. We are the largest open source software community in the world, with a uniquely personal connection to its members. An incredible diversity of contributors from every culture imaginable who, for the most part, manage to work very well together.

The Drupal community is on the leading edge of how a community of this size and diversity can work. No one has ever done this before. Things like our Code of Conduct, Community Working Group, and conflict resolution process, can seem like established and unassailable systems. They aren’t. Go read the version history of those links; we just get a group of people together at a Drupalcon or on video conference to try to figure out how to handle this stuff, and then codify it in writing. We take models from other kinds of communities and try to adapt them, we suggest crazy new ideas and directions. As a community, Drupal actively and aggressively tries to figure out how to make itself more diverse, and less conflict prone. Humanity has never done collaborative communities on this scale before, and the Drupal Community is on the bleeding edge of it all.

The cost of the bleeding edge is that we make mistakes. We set off conflicts, we discover new kinds of obstacles. We muddle through every time, and then in retrospect try to find a better way forward for next time. I don’t mean to diminish the size or importance of any of these conflicts. They can be serious, existential crises.

  • When Acquia first formed and started to hold outsize influence, it was an existential crisis. We had to figure out how to handle a conflict of interest in our leadership, and what to do about a (then) totally asymmetrical services market. Acquia is now just one large player of several in the Drupal marketplace, and Dries found a compromise between his interests that has lasted almost a decade.
  • When Nate and Jen forked Drupal into Backdrop CMS, it presented another existential crisis for our community. We had never had such a credible fork from such key community members before. It was the apex of a crisis in the development direction for the whole project. We had to figure out how to address developer experience, how to work with a forked project, and even how to continue working with the forkers themselves. Backdrop is now a normal part of the ecosystem; Jenn and Nate remain important and welcomed Drupal community leaders almost four years later.
  • We have had critical tensions, messy relationships, and fallings out with some of our most appreciated developers and community leaders. Whether it’s offense taken at Morten, or outbursts from Chx, these have divided our community and forced us to solve diversity problems that no one else has ever had to deal with.

I could go on. The point is: With each crucible, we the Drupal community must try to learn and build better systems for the next time.

So right now, in the midst of all this anger, this prejudice, and these accusations, I’m here to say: we will learn from this, too. The Drupal community is extraordinary, but we must adapt in order to survive. Losing Larry is a big hit to our community in almost every dimension. This public explosion has been a big hit to us in almost every other dimension. The arguments and animosities we’ve unleashed feel like they will tear us apart. But we must look forward. We must use this event for introspection and carry on as a better, improved community.

Do you think Larry was punished for thoughtcrime? Pitch in and help build a system where the next Larry can’t be treated that way. Do you think Dries and the DA deserve our trust in their decision? Join up and help make sure the next iteration preserves the strength of independent leadership.

The prenote is about why we are here, why we’ve stayed here all these years. Because it’s fun, because it’s supportive, because we love it. Sometimes the best way to start addressing your pain is through humor - and we desperately need to start addressing this.

However you feel about the Crellpocalypse, please don’t leave. Not yet. Stay, and help the community improve. Don’t stay for your job. Don’t stay for Dries, or the DA, or Larry. Stay for the community.

I’ll see you at the Prenote.

The Prenote: The most fun you can have at Drupalcon.

Feb 21 2017
Feb 21

If you believe the docs and the twitters, there is no way to automate letsencrypt certificates updates on platform.sh. You have to create the certificates manually, upload them manually, and maintain them manually.

But as readers of this blog know, the docs are only the start of the story. I’ve really enjoyed working with platform.sh with one of my private clients, and I couldn’t believe that with all the flexibility - all the POWER - letsencrypt was really out of reach. I found a few attempts to script it, and one really great snippet on gitlab. But no one had ever really synthesized this stuff into an easy howto. So here we go.

1) Add some writeable directories where platform.sh CLI and letsencrypt need them.

Normally when Platform deploys your application, it puts it all in a read-only filesystem. We’re going to mount some special directories read-write so all the letsencrypt/platform magic can work.

Edit your application’s .platform.app.yaml file, and find the mounts: section. At the bottom, add these three lines. Make sure to match the indents with everything else under the mounts: section!

    "/web/.well-known": "shared:files/.well-known"
    "/keys": "shared:files/keys"
    "/.platformsh": "shared:files/.platformsh"

Let’s walk through each of these:

  • /web/.well-known: In order to confirm that you actually control example.com, letsencrypt drops a file somewhere on your website, and then tries to fetch it. This directory is where it’s going to do the drop and fetch. My webroot is web, you should change this to match your own environment. You might use public or www or something.
  • /keys: You have to store your keyfiles SOMEWHERE. This is that place.
  • /.platformsh: Your master environment needs a bit of configuration to be able to login to platform and update the certs on your account. This is where that will go.

2) Expose the .well-known directory to the Internet

I mentioned above that letsencrypt test your control over a domain by creating a file which it tries to fetch over the Internet. We already created the writeable directory where the scripts can drop the file, but platform.sh (wisely) defaults to hide your directories from the Internet. We’re going to add some configuration to the “web” app section to expose this .well-known directory. Find the web: section of your .platform.app.yaml file, and the locations: section under that. At the bottom of that section, add this:

      '/.well-known':
            # Allow access to all files in the public files directory.
            allow: true
            expires: 5m
            passthru: false
            root: 'web/.well-known'
            # Do not execute PHP scripts.
            scripts: false

Make sure you match the indents of the other location entries! In my (default) .platform.app.yaml file, I have 8 spaces before that '/.well-known': line. Also note that the root: parameter there also uses my webroot directory, so adjust that to fit your environment.

3) Download the binaries you need during the application “build” phase

In order to do this, we’re going to need to have the platform.sh CLI tool, and a let’s encrypt CLI tool called lego. We’ll download them during the “build” phase of your application. Still in the platform.app.yaml file, find the hooks: section, and the build: section under that. Add these steps to the bottom of the build:

      cd ~
      curl -sL https://github.com/xenolf/lego/releases/download/v0.3.1/lego_linux_amd64.tar.xz | tar -C .global/bin -xJ --strip-components=1 lego/lego
      curl -sfSL -o .global/bin/platform.phar https://github.com/platformsh/platformsh-cli/releases/download/v3.12.1/platform.phar

We’re just downloading reasonably recent releases of our two tools. If anyone has a better way to get the latest release of either tool, please let me know. Otherwise we’re stuck keeping this up to date manually.

4) Configure the platform.sh CLI

In order to configure the platform.sh CLI on your server, we have to deploy the changes from steps 1-3. Go ahead and do that now. I’ll wait.

Now connect to your platform environment via SSH (platform ssh -e master for most of us). First we’ll add a config file for platform. Edit a file in .platformsh/config.yaml with the editor of choice. You don’t have to use vi, but it will win you some points with me. Here are the contents for that file:

updates:
    check: false
api:
    token_file: token

Pretty straightforward: this tells platform not to bother updating the CLI tool automatically (it can’t - read-only filesystem, remember?). It then tells it to login using an API token, which it can find in the file .platformsh/token. Let’s create that file next.

Log into the platform.sh web UI (you can launch it with platform web if you’re feeling sassy), and navigate to your account settings > api tokens. That’s at https://accounts.platform.sh/user/12345/api-tokens (with your own user ID of course). Add an API token, and copy its value into .platformsh/token on the environment we’re working on. The token should be the only contents of that file.

Now let’s test it by running php /app/.global/bin/platform.phar auth:info. If you see your account information, congratulations! You have a working platform.sh CLI installed.

5) Request your first certificate by hand

Still SSH’ed into that environment, let’s see if everything works.

lego --email="[email protected]" --domains="www.example.com" --webroot=/app/public/ --path=/app/keys/ -a run
csplit -f /app/keys/certificates/www.example.com.crt- /app/keys/certificates/www.example.com.crt '/-----BEGIN CERTIFICATE-----/' '{1}' -z -s
php /app/.global/bin/platform.phar domain:update -p $PLATFORM_PROJECT --no-wait --yes --cert /app/keys/certificates/www.example.com.crt-00 --chain /app/keys/certificates/www.example.com.crt-01 --key /app/keys/certificates/www.example.com.key example.com

This is three commands: register the cert with letsencrypt, then split the resulting file into it’s components, then register those components with platform.sh. If you didn’t get any errors, go ahead and test your site - it’s got a certificate! (yay)

6) Set up automatic renewals on cron

Back to .platform.app.yaml, look for the crons: section. If you’re running drupal, you probably have a drupal cronjob in there already. Add this one at the bottom, matching indents as always.

    letsencrypt:
        spec: '0 0 1 * *'
        cmd: '/bin/sh /app/scripts/letsencrypt.sh'

Now let’s create the script. Add the file scripts/letsencrypt.sh to your repo, with this content:

#!/usr/bin/env bash

# Checks and updates the letsencrypt HTTPS cert.

set -e

if [ "$PLATFORM_ENVIRONMENT" = "master-7rqtwti" ]
  then
    # Renew the certificate
    lego --email="[email protected]" --domains="example.org" --webroot=/app/web/ --path=/app/keys/ -a renew
    # Split the certificate from any intermediate chain
    csplit -f /app/keys/certificates/example.org.crt- /app/keys/certificates/example.org.crt '/-----BEGIN CERTIFICATE-----/' '{1}' -z -s
    # Update the certificates on the domain
    php /app/.global/bin/platform.phar domain:update -p $PLATFORM_PROJECT --no-wait --yes --cert /app/keys/certificates/example.org.crt-00 --chain /app/keys/certificates/example.org.crt-01 --key /app/keys/certificates/example.org.key example.org
fi

Obviously you should replace all those example.orgs and email addresses with your own domain. Make the file executable with chmod u+x scripts/letsencrypt.sh, commit it, and push it up to your platform.sh environment.

7) Send a bragging email to Crell

Technically this isn’t supposed to be possible, but YOU DID IT! Make sure to rub it in.

“Larry is waiting to hear from you. (photo credit Jesus Manuel Olivas)

Good luck!

PS - I’m just gonna link one more time to the guy whose snippet made this all possible: Ariel Barreiro did the hardest part of this. I’m grateful that he made his notes public!

Feb 21 2017
Feb 21

If you believe the docs and the twitters, there is no way to automate letsencrypt certificates updates on platform.sh. You have to create the certificates manually, upload them manually, and maintain them manually.

But as readers of this blog know, the docs are only the start of the story. I’ve really enjoyed working with platform.sh with one of my private clients, and I couldn’t believe that with all the flexibility – all the POWER – letsencrypt was really out of reach. I found a few attempts to script it, and one really great snippet on gitlab. But no one had ever really synthesized this stuff into an easy howto. So here we go.

1) Add some writeable directories where platform.sh CLI and letsencrypt need them.

Normally when Platform deploys your application, it puts it all in a read-only filesystem. We’re going to mount some special directories read-write so all the letsencrypt/platform magic can work.

Edit your application’s .platform.app.yaml file, and find the mounts: section. At the bottom, add these three lines. Make sure to match the indents with everything else under the mounts: section!

1
2
3
"/web/.well-known": "shared:files/.well-known"
"/keys": "shared:files/keys"
"/.platformsh": "shared:files/.platformsh"

Let’s walk through each of these:

  • /web/.well-known: In order to confirm that you actually control example.com, letsencrypt drops a file somewhere on your website, and then tries to fetch it. This directory is where it’s going to do the drop and fetch. My webroot is web, you should change this to match your own environment. You might use public or www or something.
  • /keys: You have to store your keyfiles SOMEWHERE. This is that place.
  • /.platformsh: Your master environment needs a bit of configuration to be able to login to platform and update the certs on your account. This is where that will go.

2) Expose the .well-known directory to the Internet

I mentioned above that letsencrypt test your control over a domain by creating a file which it tries to fetch over the Internet. We already created the writeable directory where the scripts can drop the file, but platform.sh (wisely) defaults to hide your directories from the Internet. We’re going to add some configuration to the “web” app section to expose this .well-known directory. Find the web: section of your .platform.app.yaml file, and the locations: section under that. At the bottom of that section, add this:

1
2
3
4
5
6
7
8
  '/.well-known':
        # Allow access to all files in the public files directory.
        allow: true
        expires: 5m
        passthru: false
        root: 'web/.well-known'
        # Do not execute PHP scripts.
        scripts: false

Make sure you match the indents of the other location entries! In my (default) .platform.app.yaml file, I have 8 spaces before that '/.well-known': line. Also note that the root: parameter there also uses my webroot directory, so adjust that to fit your environment.

3) Download the binaries you need during the application “build” phase

In order to do this, we’re going to need to have the platform.sh CLI tool, and a let’s encrypt CLI tool called lego. We’ll download them during the “build” phase of your application. Still in the platform.app.yaml file, find the hooks: section, and the build: section under that. Add these steps to the bottom of the build:

1
2
3
  cd ~
  curl -sL https://github.com/xenolf/lego/releases/download/v0.3.1/lego_linux_amd64.tar.xz | tar -C .global/bin -xJ --strip-components=1 lego/lego
  curl -sfSL -o .global/bin/platform.phar https://github.com/platformsh/platformsh-cli/releases/download/v3.12.1/platform.phar

We’re just downloading reasonably recent releases of our two tools. If anyone has a better way to get the latest release of either tool, please let me know. Otherwise we’re stuck keeping this up to date manually.

4) Configure the platform.sh CLI

In order to configure the platform.sh CLI on your server, we have to deploy the changes from steps 1-3. Go ahead and do that now. I’ll wait.

Now connect to your platform environment via SSH (platform ssh -e master for most of us). First we’ll add a config file for platform. Edit a file in .platformsh/config.yaml with the editor of choice. You don’t have to use vi, but it will win you some points with me. Here are the contents for that file:

1
2
3
4
updates:
    check: false
api:
    token_file: token

Pretty straightforward: this tells platform not to bother updating the CLI tool automatically (it can’t – read-only filesystem, remember?). It then tells it to login using an API token, which it can find in the file .platformsh/token. Let’s create that file next.

Log into the platform.sh web UI (you can launch it with platform web if you’re feeling sassy), and navigate to your account settings > api tokens. That’s at https://accounts.platform.sh/user/12345/api-tokens (with your own user ID of course). Add an API token, and copy its value into .platformsh/token on the environment we’re working on. The token should be the only contents of that file.

Now let’s test it by running php /app/.global/bin/platform.phar auth:info. If you see your account information, congratulations! You have a working platform.sh CLI installed.

5) Request your first certificate by hand

Still SSH’ed into that environment, let’s see if everything works.

1
2
3
lego --email="[email protected]" --domains="www.example.com" --webroot=/app/public/ --path=/app/keys/ -a run
csplit -f /app/keys/certificates/www.example.com.crt- /app/keys/certificates/www.example.com.crt '/-----BEGIN CERTIFICATE-----/' '{1}' -z -s
php /app/.global/bin/platform.phar domain:update -p $PLATFORM_PROJECT --no-wait --yes --cert /app/keys/certificates/www.example.com.crt-00 --chain /app/keys/certificates/www.example.com.crt-01 --key /app/keys/certificates/www.example.com.key example.com

This is three commands: register the cert with letsencrypt, then split the resulting file into it’s components, then register those components with platform.sh. If you didn’t get any errors, go ahead and test your site – it’s got a certificate! (yay)

6) Set up automatic renewals on cron

Back to .platform.app.yaml, look for the crons: section. If you’re running drupal, you probably have a drupal cronjob in there already. Add this one at the bottom, matching indents as always.

1
2
3
letsencrypt:
    spec: '0 0 1 * *'
    cmd: '/bin/sh /app/scripts/letsencrypt.sh'

Now let’s create the script. Add the file scripts/letsencrypt.sh to your repo, with this content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/env bash

# Checks and updates the letsencrypt HTTPS cert.

set -e

if [ "$PLATFORM_ENVIRONMENT" = "master-7rqtwti" ]
  then
    # Renew the certificate
    lego --email="[email protected]" --domains="example.org" --webroot=/app/web/ --path=/app/keys/ -a renew
    # Split the certificate from any intermediate chain
    csplit -f /app/keys/certificates/example.org.crt- /app/keys/certificates/example.org.crt '/-----BEGIN CERTIFICATE-----/' '{1}' -z -s
    # Update the certificates on the domain
    php /app/.global/bin/platform.phar domain:update -p $PLATFORM_PROJECT --no-wait --yes --cert /app/keys/certificates/example.org.crt-00 --chain /app/keys/certificates/example.org.crt-01 --key /app/keys/certificates/example.org.key example.org
fi

Obviously you should replace all those example.orgs and email addresses with your own domain. Make the file executable with chmod u+x scripts/letsencrypt.sh, commit it, and push it up to your platform.sh environment.

7) Send a bragging email to Crell

Technically this isn’t supposed to be possible, but YOU DID IT! Make sure to rub it in.

Good luck!

PS – I’m just gonna link one more time to the guy whose snippet made this all possible: Ariel Barreiro did the hardest part of this. I’m grateful that he made his notes public!

Feb 11 2017
Feb 11

In any decoupled architecture, people tend to focus on the pieces that will fit together. But what nobody ever tells you is: watch out for the cracks!

The cracks are the integration points between the different components. It’s not GraphQL as a communication layer; it’s that no one thinks to log GraphQL inconsistencies when they occur. It’s not “what’s my development environment”, it’s “how do these three development environments work on my localhost at the same time?”. It’s the thousand little complexities that you don’t think about, basically because they aren’t directly associated with a noun. We’ve discovered “crack” problems like this in technical architecture and devops, communication, and even project management. They add up to a lot of unplanned time, and they have presented some serious project risks.

A bit more about my recent project with Amazee Labs. It’s quite a cool stack: several data sources feed into Drupal 8, which offers an editorial experience and GraphQL endpoints. Four React/Relay sites sit in front, consuming the data and even offering an authenticated user experience (Auth0). I’ve been working with brilliant people: Sebastian Siemssen, Moshe Weitzman, Philipp Melab, and others. It has taken all of us to deal with the crack complexity.

The first crack appeared as we were setting up environments for our development teams. How do you segment repositories? They get deployed to different servers, and run in very different environments. But they are critically connected to each other. We decided to have a separate “back end” repo, and separate repos for each “front end” site. Since Relay needs to compile the entire data schema on startup, this means that every time the back end is redeployed with a data model change, we have to automatically redeploy the front end(s). For local development, we ended up building a mock data backend in MongoDB running in Docker. Add one more technology to support to your list, with normal attendant support and maintenance issues.

DevOps in general is more complicated and expensive in a decoupled environment. It’s all easy at first, but at some point you have to start connecting the front- and back-ends on peoples' local development environments. Cue obvious problems like port conflicts, but also less obvious ones. The React developers don’t know anything about drupal, drush, or php development environments. This means your enviroment setup needs to be VERY streamlined, even idiot-proof. Your devops team has to support a much wider variety of users than normal. Two of our front-enders had setups that made spinning up the back-end take more than 30 minutes. 30 minutes! We didn’t even know that was possible with our stack. The project coordinater has to budget significant time for this kind of support and maintenance.

Some of the cracks just mean you have to code very carefully. At one point we discovered that certain kinds of invalid schema are perfectly tolerable to the GraphQL module. We could query everything just fine - but React couldn’t compile the schema, and gave cryptic errors that were hard to track down. Or what about the issues where there are no error messages to work with? CORS problems were notoriously easy to miss, until everything broke without clear errors. Some of these are impossible to avoid. The best you can do is be thorough about your test coverage, add integration tests which consider all environments, and document all the things.

Not all the cracks are technological; some are purely communication. In order to use a shared data service, we need a shared data model and API. So how do you communicate and coordinate that between 5 teams and 5 applications? We found this bottleneck extremely difficult. At first, it simply took a long time to get API components built. We had to coordinate so many stakeholders, that the back-end data arch and GraphQL endpoints got way behind the front-end sites. At another point, one backender organically became the go-to for everything GraphQL. He was a bottleneck within weeks, and was stuck with all the information silo’ed in his head. This is still an active problem area for us. We’re working on thorough and well-maintained documentation as a reference point, but this costs time as well.

Even project managers and scrum masters found new complexities. We had more than 30 people working on this project, and everyone had to be well coordinated and informed. You certainly can’t do scrum with 30 people together - the sprint review would take days! But split it out into many smaller teams and your information and coordination problems just got much harder. Eventually we found our solution: we have 3 teams, each with their own PO, frontender(s) and backender(s), who take responsibility for whole features at a time. Each team does its own, quite vanilla, scrum process. Layered on top of this, developers are in groups which cut across the scrum teams, which have coordination meetings and maintain documentation and code standards. All the back-enders meet weekly and work with the same standards, but the tightest coordination is internal to a feature. So far this is working well, but ask me again in a few months. :)

Working in a fully decoupled architecture and team structure has been amazing. It really is possible, and it really does provide a lot more flexibility. But it demands a harder focus on standards, communication, coordination, and architecture. Sometimes it’s not about the bricks; it’s about the mortar between them. So the next time you start work on a decoupled architecture, watch out for the cracks!

Nov 05 2016
Nov 05

A year ago I proposed a session for Drupalcon Mumbai and Drupalcon New Orleans, called “The best of both worlds”. It promised to show attendees how to write Drupal 8 code for Drupal 7 sites. I never ended up giving the session, but this week I got an email asking for more information. So in case it ever comes up again, here’s my own collection of resources on the subject.

The big improvement that’s hard for D7 developers to get used to is injected services. The service container module makes that possible in D7. The brilliant FabianX wrote it to make his life easier in writing render cache, and his is always a good example to follow! This module creates a service container for D7, which you use just like the container in D8. You can write independent, OO code that is unit testable, with service dependencies declared in a YAML file. Note that you will also need the registry autoload module to get PS4 namespaced autoloading!

I just mentioned unit testable code as a benefit of the service container. To be honest this is a little tricksy in Drupal 7. For my own custom work I tend to isolate the test environment from the rest of Drupal, so I don’t have to deal with everything else. Again, I followed Fabian’s example there by looking at how render cache does it’s tests. If you do want better integration, there is a good lullabot post that talks about (more) proper PHPUnit integration. https://www.lullabot.com/articles/write-unit-tests-for-your-drupal-7-code-part-1 .

Next on my list is Composer-managed dependencies. The Acquia developer blog has a great post about using Composer Manager for this in D7. This is a huge win for a lot of custom modules, and very easy.

Last is plugins. The rest of this list is in no particular order, but I left plugins for last because I think this isn’t actually necessary in D7. Personally I use modules' own hooks and just autoload independent classes. You might consider using plugins instead if you’re going to write several plugins for the same module. In any case, Lee Rowlands has the go-to blog post about this.

All together, you can combine these approaches to write code for D7 with the biggest Dx features of D8: service injection, phpunit testing, composer libraries, and plugins. Note that each of these blog posts assumes different workarounds for all the other functionalities… but they should help you get an understanding of how to use that particular Dx improvement in 7.

When I wrote that session proposal, I thought of this as a good way for D7 developers to learn D8 practices gradually, one at a time. I no longer think that’s true. Mostly, there are so few working examples of D7 code using these practices, that it’s quite hard to get your stuff working. This is particularly hard when you’re just learning about the concept in the first place! Personally, I could mess around with this stuff and make my life harder with it in D7. But I couldn’t really get the best advantage out of them until I had better examples. My best learning aids were the examples in D8 core, and the code scaffolding available through Drush and Drupal console.

But now that I’m comfortable with the concepts… I would absolutely use these approaches in D7 work. You know, if I’m FORCED to work in the old system. :)

One last aside here: it is easy to fall into the mindset that Drupal 8 practices are better just because they’re newer. This is simply not true. These practices are not handed down from heaven, after all! When you have the rest of the D8 architecture in place, certain kinds of code tasks are much easier. That’s why we like developing for it so much more. But other (less common, IMO) tasks are harder. And doing any of this in D7 means you have to put the architecture in place, too. That’s a lot of time, and it’s only worthwhile if you’re going to use the particular strengths of these practices.

So if it looks like one of these D8 practices will make your life easier for a particular task in D7, then by all means use these approaches to get there. Composer manager has a particularly low bar - it’s so easy to use, and makes so many tasks easier, it’s a good approach to many tasks. But if I ever catch you implementing service container to get two lines of code into a form_alter, I will come to where you work and slap your hands off the keyboard.

Happy coding!

Oct 08 2015
Oct 08

Last night (my time) I got the good news over twitter:

That’s right, Drupal 8 has it’s first release. But what does that mean? Is it done? Can I start using it yet? What kind of changes are coming? Will dawehner get to sleep, at last?

Are we there yet?

Despite all the rejoicing on social media, this isn’t the final release for Drupal 8 - it’s only the first Release Candidate. This means that we have (officially!) 0 “critical” bugs left to fix in Drupal 8. That means exactly what it sounds like: there are no critical, world-ending bugs left… that we know of. Just like any software product, we’ll continue to discover critical issues through its entire life cycle. We’re still finding occasional critical issues in Drupal 7 almost five years since its first release candidate; that’s just a part of supporting a piece of software over the long term. The RC phase means that while Drupal 8 is stable enough to use, we’re still discovering critical bugs a little too frequently to recommend it for everyone, in every use case.

“A little too frequently” means that the rate of critical bugs incoming is still too high to be able to promise the fast respond-and-fix turnaround that we want. Every two weeks we’ll create a new Release Candidate version that fixes whatever new criticals have been discovered. Once the core team is confident that they can squash bugs in a timely enough manner, they’ll (finally) release Drupal version 8.0.0.

But when will it REALLY be released?

“When it’s ready” still applies! But we are very, very close now. To give you a point of reference, Drupal 7 went through four Release Candidates before release (two months). That codebase was a lot more fragile than this one, so it’s reasonable to hope that we’ll have a very Drupally Christmas season this year. Personally I’m betting on January.

Can I use it yet?

Yes! Some terms and conditions apply.

Just because there are no criticals left, doesn’t mean that D8 is completely bug-free! We have a big pile of known “major” issues that have been deferred until after 8.0.0, which should impact your decision. You can see at that link that some of them are already ready to be committed. The catch is that during the RC phase, we aren’t allowed to commit these fixes. We’re basically only allowed to work on criticals and documentation. So there are still some serious issues that might be a problem in some use cases.

The biggest issue (that I know of) is a potential incompatibility between Drupal 8’s new “cache tags” header and some hosting providers. The problem is that Drupal includes some important caching information on the “back of the envelope” of its response to a page request, and it’s possible to run out of envelope! If the cache tags header gets too long for the web host to handle, it can behave unpredictably. You might get white screens of death, or it might just shorten the cache tags header, removing important information. There’s a solution in the works to allow a maximum length setting, but it won’t make it in until 8.0.1 (two weeks after 8.0.0). In the meantime you should avoid D8 if you have any very complex pages with many elements. The examples in that ticket are good ones: a news site with very complex layouts, or a single page site with a lot of “stuff” crammed onto the one, front page.

The other “gotcha” to bear in mind is that it will take some time for Drupal’s contributed modules ecosystem to catch up with the new version. According to Bluespark’s status of the top 100 modules for D8 page, so far only 9 of the top 100 D7 modules have a D8 release marked “stable.” 19 of those top 100 modules are included in D8 core however, so our total count is up to 28. This is enough to give a good foundation for relatively simple sites, especially if you have some PHP skills under your belt. But I wouldn’t go building a complex Intranet on it just yet!

Wait, so it’s still busted?

No! Drupal 8 is a solid platform for most use cases - that’s the point of the RC release! It’s time to go ahead and use it for site builds. Just take it easy and use it for simple sites, first. Give the rest of the community a chance to release stable modules, and hold off on that Facebook-buster behemoth website you’ve got planned until a few months after launch.

What happens after 8.0.0?

After 8.0.0 is released, we will make an enormous, fundamental shift in how Drupal is developed. We will start using semantic versioning with a regular release schedule. Every two weeks we’ll release a new “patch level' release: 8.0.1, 8.0.2, and so on. Patch level releases will be bug fixes only, and will be backwards-compatible - that means they won’t break anything on your site. Approximately every 6 months, we’ll release a new “minor level” release: 8.1.0, 8.2.0, etc. Minor releases are allowed to contain new features, but they are still guaranteed to be backwards-compatible. So even these releases won’t break anything on your site. We’re still figuring out the exact process for minor releases, but they will include similar phases to what we’ve seen with D8 core: a beta phase, and release candidates until we’re sure there are no more criticals.

What about API changes, and features that would break existing sites? We won’t even start developing on those until well into the D8 life cycle. Those changes will belong in the 9.x branch, and will be kept completely separate from anything that could touch your site.

The key take-away here is that D8 updates should never break your site. They may add features, but they will not interfere with whatever you’ve already built. We’ll continue a regular pace of improving the product in a predictable, scheduled, and backwards-compatible way.

Where are the best Drupal 8 release parties?

The Drupal Association is coordinating promotion for official Drupal 8 launch parties. If you want to host one, just fill out their form and they’ll help you promote it! So far no one has built a site mapping the parties, but keep an eye out in the #drupal hashtag on twitter!

Who do I congratulate? Who do I thank?

Drupal 8 RC 1 is the combined effort of more than 3200 contributors. That is an incredible number. By comparison, Apache, the world’s most popular open source webserver, has 118 contributors. MySQL, the database platform which runs an enormous portion of the Internet, has 1320 contributors. So you can basically walk up to anyone at a Drupalcon and thank him or her!

Most of the contributors to Drupal 8 leaned on the support, training, and hand-holding of mentors at Drupal events all over the world. I know I needed a mentor for my first core contributions, and I got to turn around and mentor other people myself. The mentors are the support network that made this level of mass contribution possible.

But the level of effort is definitely not evenly distributed. Most contributors have made fewer than 20 registered contributions. But some people have really gone above and beyond what anyone would expect. It’s no exaggeration to say that these people have shaped the future of the Internet.

It is easy to concentrate on the number of contributions as the only metric of involvement in the release of D8. But some of the biggest influences on Drupal 8 have been community leaders, whose effort is not counted in commits under their own names. The initiative leads who architected and directed all this contribution: heyrocker, Senpai, jlambert, Crell, dmitrig01, Gábor Hojtsy, Jose Reyero, mitchell, jenlampton, bleen18, jackalope, ericduran, jhood, jacine, shyamala, rupl, JohnAlbin, twom, and sofiya. Without them, we would have had nothing to commit!

Listing all of those names brings to mind the platform that they all use to contribute and coordinate: drupal.org, maintained by the Drupal Association. It also brings to mind the events, like Drupalcon, Drupalcamps, Dev Days, which everyone attends to collaborate, teach, and learn; also maintained by the Drupal Association. Not to mention the Drupal 8 Accelerate program, which raised $250,000 towards developer grants; also created and maintained by the Drupal Association. The people at the Association have worked tirelessly to support this release.

All of this developer time is extremely valuable, and not all of it came out of the developers' own free time. Huge swaths of Drupal 8 development have been sponsored by the companies that participate in the community. We’ve only been tracking their contributions for a short time, but the information we have is powerful. This release would not have happened without the developer time donated by companies like Acquia, MD Systems, Chapter Three, Tag1, and Druid. A quick glance at Drupal.org’s Drupal Services page shows us that contribution is a normal part of the culture for the biggest Drupal companies. These were the top 5, but almost every major Drupal shop has contributed in some measure. Thank you to these companies for believing in our product and supporting it so generously.

Finally, the people who bear the greatest personal responsibility are definitely the core maintainers. These people don’t just deserve your thanks; they deserve lifetime supplies of free beer sent to their homes. I can’t offer that on a blog; all I can say is THANK YOU.

Alex Bronstein

Dries Buytaert

Angie “webchick” Byron

Nat Catchpole

Jess Myrbo

Alex Pott

To everyone who contributed, but especially the people I’ve listed here: You’ve made a new generation of Internet innovation possible. Thank you.

May 02 2015
May 02

This week I wanted to accomplish a task in Drupal 8 that would be simple in Drupal 7: Import several CSV files, each one related to the others by taxonomy terms. Most importantly, I wanted to do it with Migrate module.

Migrate in Drupal 7 is a fantastic piece of code. It is not designed to be used from the GUI, rather, it provides a framework of “source”, “destination”, and “migration” classes so that even the most convoluted migration is 90% written for you. To create a migration in Drupal 7, you create a custom module, declare your migrations in a hook_info, and then extend the built in “migration” class. You instantiate one of the given classes for the source material (is it a CSV? JSON? Direct connection to a custom DB?), then one of the classes for the destination (is it a content type? Taxonomy term?). Then you add one simple line of code mapping each field from source to destination. If you know what you’re doing, the task I had in mind shouldn’t take more than 15 minutes per source.

It’s not quite so easy in Drupal 8. First of all, with Migrate in core, we had to greatly simplify the goals for the module. The version of Migrate that is really functional and stable is specifically and only the basic framework. There is a separate migrate_drupal module to provide everything you need for migrating from Drupal 6 or 7. This has been a laser-tight focus on just the essentials, which means there’s no UI, very little drush support, and definitely no nice extras like the ability to read non-Drupal sources.

My task this week became to write the first CSV source for Drupal 8 Migrate.

Drupal 8 Migrations, when you’re using classes that already exist, are actually even easier than Migrate 7. All you do is write a single YAML file for each kind of data you’re transferring, and put it in a custom module’s config/install directory. Your YAML file gives your migration a name and a group, tells us what the source is for data, maps source fields to destination fields, and tells us what the destination objects are. Here’s an example Migration definition file from core. See if you can understand what’s being migrated here.

id: d6_system_site
label: Drupal 6 site configuration
migration_groups:
  - Drupal 6
source:
  plugin: variable
  variables:
    - site_name
    - site_mail
    - site_slogan
    - site_frontpage
    - site_403
    - site_404
    - drupal_weight_select_max
    - admin_compact_mode
process:
  name: site_name
  mail: site_mail
  slogan: site_slogan
  'page/front': site_frontpage
  'page/403': site_403
  'page/404': site_404
  weight_select_max: drupal_weight_select_max
  admin_compact_mode: admin_compact_mode
destination:
  plugin: config
  config_name: system.site

You probably figured it out: this migration takes the system settings (variables) from a Drupal 6 site, and puts them into the Drupal 7 configuration. Not terribly hard, right? You can even do data transformations from the source field value to the destination.

Unfortunately, the only sources we have so far are for Drupal 6 and 7 sites, pulling directly from the database. If you want to use Migrate 8 the way we used Migrate 7, as an easy way to pull in data from arbitrary sources, you’ll have to contribute.

Enter Migrate Plus module. This is the place in contrib, where we can fill out all the rest of the behavior we want from Migrate, that’s not necessarily a core requirement. This is where we’ll be writing our source plugin.

To add a source plugin, just create a .php file in migrate_plus/src/Plugins/migrate/source . Drupal will discover the new plugin automatically the next time you rebuild the cache. The filename has to be the same as the name of the class, so choose carefully! My file is called CSV.php . Here’s the top of the file you need for a basic :

php
/**
 * @file
 * Contains \Drupal\migrate_plus\Plugin\migrate\source\csv.
 */

namespace Drupal\migrate_plus\Plugin\migrate\source;

use Drupal\migrate\Plugin\migrate\source\SourcePluginBase;

/**
 * Source for CSV files.
 *
 * @MigrateSource(
 *   id = "csv"
 * )
 */
class CSV extends SourcePluginBase {

I’m calling this out separately because for newbies to Drupal 8, this is the hard part. This is all the information that Drupal needs to be able to find your class when it needs it. The @file comment is important. That and the namespace below have to match the actual location of the .php file.

Then you declare any other classes that you need, with their full namespace. To start with all you need is SourcePluginBase.

Finally you have to annotate the class with that @MigrateSource(id=“csv”). This is how Migrate module knows that this is a MigrateSource, and the name of your Plugin. Don’t miss it!

Inside the class, you must have the following methods. I’ll explain a bit more about each afterwards.

  • initializeIterator() : Should return a valid Iterator object.
  • getIds() : Should return an array that defines the unique identifiers of your data source.
  • __toString() : Should return a simple, string representation of the source.
  • fields() : Should return a definitive list of fields in the source.
  • __construct() : You don’t NEED this method, but you probably will end up using it.

initializeIterator()

An Iterator is a complicated sounding word for an Object that contains everything you need to read from a data source, and go through it one line at a time. Maybe you’re used to fopen(‘path/to/file’, ‘r’) to open a file, and then you write code for every possible operation with that file. An iterator takes care of all that for you. In the case of most file-based sources, you can just use the SplFileObject class that comes with PHP.

Any arguments that were passed in the source: section of the YAML file will be available under $this->configuration. So if my YAML looks like this:

source:
  plugin: csv
  path: '/vagrant/import/ACS_13_1YR_B28002_with_ann.csv'

My initializeIterator( ) method can look like this:


public function initializeIterator() {
  // File handler using our custom header-rows-respecting extension of SPLFileObject.
  $file = new SplFileObject($this->configuration['path']);
  $file->setFlags(SplFileObject::READ_CSV);
  return $file;
}

Not too complicated, right? This method is called right at the beginning of the migration, the first time Migrate wants to get any information out of your source. The iterator will be stored in $this->iterator.

getIds()

This method should return an array of all the unique keys for your source. A unique key is some value that’s unique for that row in the source material. Sometimes there’s more than one, which is why this is an array. Each key field name is also an array, with a child “type” declaration. This is hard to explain in English, but easy to show in code:

public function getIDs() {
  $ids = array();
  foreach ($this->configuration['keys'] as $key) {
    $ids[$key]['type'] = 'string';
  }
  return $ids;
}

We rely on the YAML author to tell us the key fields in the CSV, and we just reformat them accordingly. Type can be ‘string’, ‘float’, ‘integer’, whatever makes sense.

__toString()

This method has to return a simple string explanation of the source query. In the case of a file-based source, it makes sense to print the path to the file, like this:

public function __toString() {
  return (string) $this->query;
}

fields()

This method returns an array of available fields on the source. The keys should be the machine names, the values are descriptive, human-readable names. In the case of the CSV source, we look for headers at the top of the CSV file and build the array that way.

__construct()

The constructor method is called whenever a class is instantiated. You don’t technically HAVE to have a constructor on your source class, but you’ll find it helpful. For the CSV source, I used the constructor to make sure we have all the configuration that we need. Then I try and set sane values for fields, based on any header in the file.

public function __construct(array $configuration, $plugin_id, $plugin_definition, MigrationInterface $migration) {
  parent::__construct($configuration, $plugin_id, $plugin_definition, $migration);

  // Path is required.
  if (empty($this->configuration['path'])) {
    return new MigrateException('You must declare the "path" to the source CSV file in your source settings.');
  }

  // Key field(s) are required
  if (empty($this->configuration['keys'])) {
    return new MigrateException('You must declare the "keys" the source CSV file in your source settings.');
  }

  // Set header rows from the migrate configuration.
  $this->headerRows = !empty($this->configuration['header_rows']) ? $this->configuration['header_rows'] : 0;

  // Figure out what CSV columns we have.
  // One can either pass in an explicit list of column names to use, or if we have
  // a header row we can use the names from that
  if ($this->headerRows && empty($this->configuration['csvColumns'])) {
    $this->csvColumns = array();

    // Skip all but the last header
    for ($i = 0; $i < $this->headerRows - 1; $i++) {
      $this->getNextLine();
    }

    $row = $this->getNextLine();
    foreach ($row as $key => $header) {
      $header = trim($header);
      $this->getIterator()->csvColumns[] = array($header, $header);
    }
  }
  elseif ($this->configuration['csvColumns']) {
    $this->getIterator()->csvColumns = $this->configuration['csvColumns'];
  }
}

Profit!

That’s it! Four simple methods, and you have a new source type for Drupal 8 Migrate. Of course, you will probably find complications that require a bit more work. For CSV, supporting a header row turned out to be a real pain. Any time Migrate tried to “rewind” the source back to the first line, it would try and migrate the header row! I ended up extending SplFileObject just to handle this issue.

Here’s the YAML file I used to test this, importing a list of states from some US Census data.

id: states
label: States
migration_groups:
  - US Census

source:
  plugin: csv
  path: '/vagrant/import/ACS_13_1YR_B28002_with_ann.csv'
  header_rows: 2
  fields:
    - Id2
    - Geography
  keys:
    - Id2

process:
  name: Geography
  vid:
    -
      plugin: default_value
      default_value: state

destination:
  plugin: entity:taxonomy_term

You can see my CSV source and Iterator in the issue queue for migrate_plus. Good luck, and happy migrating!

Thanks

I learned a lot this week. Big thanks to the Migrate Documentation, but especially to chx, mikeryan, and the other good folks in #drupal-migrate who helped set me straight.

Nov 03 2014
Nov 03
Oh The Huge Manatee! Author Image

Thoughts and technical notes from the field. I'm Campbell Vertesi, a principal software engineering lead at Microsoft, opera singer, and parent. These are the things that go through my mind.

Jul 01 2014
Jul 01

This is an important one to note: If you use the popular Automatic Entity Label module on a multilingual site, it will break your paths because of an interaction with Drupal’s built in object cache. I looked at this briefly a few months ago and ran out of time, but my (badass) colleague bburg figured it out this week.

For now, the only solution is a slow one - we clear static entity caches when we generate multilingual titles. That’s not an awesome fix, but it’s hard to think of a better one without any of the D8 cache tagging functionality. Massive kudos to bburg for figuring this out!

And for those of you keeping score, this is a good example of how to file a bug report for a really complex issue in a really popular module… and follow up until you resolve it.

Jul 01 2014
Jul 01

A quick note to all the Drupalists in the DC general area - Forum One is trying to put together a D8 core sprint in their DC office space. They’re coordinating with the DC Meetup group to try and spread the word to as many community members as possible!

If you haven’t been to a code sprint before, it’s basically a coding party. Developers get together and help each other contribute better and faster by reviewing code on the spot, mentoring each other, and generally working in small ad-hoc groups. It’s a lot of fun, and gives a big boost to development of the next generation of Drupal.

Forum One will provide the locale in downtown DC complete with pizza, beer, and soda. We also have a few of our core mentors on hand to help you get started if this is your first time contributing to core. Because of the building security, if you want to attend you have to register first! I won’t be able to attend, but my colleagues John Brandenburg and Kalpana Goel will be there mentoring. Go sign up now!

Jun 09 2014
Jun 09

Drupal has a wide variety of highly effective solutions for caching anonymous user content. The typical setup is APC, Memcached or Redis, and Varnish in front, and this can easily serve thousands of concurrent anonymous users. There is excellent documentation out there discussing this kind of simple caching.

But what about authenticated users? You can cache elements of the page using a method like Render cache, Entity Cache, or Views Content Cache. But Drupal still has to assemble each page for your users, a relatively heavy operation! If you want to address hundreds or thousands of authenticated users, you’re simply SOL by these traditional approaches.

Enter the Auth Cache suite of modules. Though this project has been around for quite some time, it had a reputation of being finicky and hard to set up. It got a significant rewrite in the last year thanks to znerol, and is now a powerhouse of a module that brings authenticated user caching much closer to regular users.

I will say that this is still not for the squeamish. You have to really understand the building blocks of your site, and you will have to make a plan for each unique layout on your site. There are some page elements that are quite hard to build this way, but for the most part Authcache makes this easy.

The theory

The idea behind authenticated user caching is simple. We already have a great caching mechanism for pages that stay exactly the same for all users. So we simply identify the parts of the page that will change for each user, and use a placeholder for them instead. Think of it as a tag in HTML. This way the page caching mechanism can ignore the customized content, and focus on the stuff that IS the same across all requests.

There are three major ways of doing this placeholder: AJAX, ESI, and Cookies.

With AJAX, you just include a little JS that says “fill this DIV with the contents of http://example.com/user/customized/thing". The client’s web browser makes a second call to the server, which is configured to allow /user/customized/thing through the cache all the way to your website. Drupal (or whatever you’re running) fills in the HTML that belongs in that div and returns it to the browser. Congratulations! You just served an authenticated user a page which was 99% cached. You only had to generate the contents of one div.

ESI is short for Edge Side Includes, a small extension to HTML which effectively does the same thing as that Javascript, but on the “Edge server”. The Edge server is whatever service touches the HTTP request last before sending it to the client. Apache, NGINX, Varnish, pound… you want this to happen as far down the stack as you control. An ESI tag in your HTML looks like this:

It’s pretty clear, even to a human reader, what this tag means: “replace this tag with the contents of http://example.com/user/customized/thing". ESI actually supports some simple logic as well, but that’s not really relevant to what we’re doing here.

The only difference between ESI and AJAX is where the placeholder is filled. With ESI it happens on the edge service, and with AJAX it happens in the client browser. There is a subtle difference here: a page with ESI will not be delivered until all the ESI calls have returned something, while an AJAX page will return right away, even if the components don’t immediately appear. On the other hand, ESI is much better for degraded browsers. YMMV.

The last method is using Cookies. You can store arbitrary information on cookies, as long as you’re careful about security. That can be a very effective way to get certain limited information through a caching layer. Authcache actually comes with an example module for just such a use case. It passes the user’s name and account URL in a cookie, so you can display it in a block.

This is effective for very small amounts of information, but keep it limited. Cookie headers aren’t designed to hold large quantities of data, and reverse proxies can have a hard time if you put too much information in there. Still, it’s a neat trick that can cover you for that “Hello Username” block.

Authcache in Drupal

The Authcache suite of modules tries to automatically implement AJAX and/or ESI for you. It actually goes one step further, and implements a caching layer for those “fragments” which are to be returned via ESI/AJAX. The fragments can be stored in any caching system which implements DrupalCacheInterface, ie any caching module you’ve heard of. Memcache, APC, File Cache, Redis, MongoDB. The full page HTML with placeholders can be cached in Drupal’s normal page cache, in Boost, or in Varnish.

Once you have these caching mechanisms defined, it’s just a question of marking sections of your site which need a second callback. Authcache presents a large number of modules to do this. You can define any of the following as requiring a second call:

  • Blocks
  • Views
  • Panels Panes
  • Fields
  • Comments
  • Flags
  • Forms
  • Forums
  • Polls
  • Votes

… and that’s all without writing a single line of custom code! Each one of those elements gets a new “Authcache” setting, where you can define it as needing a second callback, and set the method for the callback as either AJAX or ESI. You can even fall back to another method if the first one fails!

A good example of how this works is the Forms integration. Authcache will modify any and all forms on your site, so that they have an ESI or AJAX placeholder for the form token. This means that the form itself can be stored in your page cache (Varnish, Boost, or whatever), and the form token will be filled in automatically! That’s a phenomenal speed improvement without any configuration beyond enabling the module.

Setting up Authcache is a little complicated, and I’ll cover that in depth in my next post. But once the basic AJAX or ESI support is set up and these modules are enabled, caching authenticated users becomes a question of visiting each unique layout on your site and making a plan for each element that involves user customization. Authcache makes this easy.

Next post: How to configure Authcache on Drupal 7.

May 28 2014
May 28

Regular Drupalcon attendees know that the opening pre-keynote session is one of the highlights of the con. That’s the session where we welcome everyone to the con with stupid jokes, some well known Drupalists, and a lot of fun. This year is going to be especially good - and we need your help!

The evil Lord Over Engineering is threatening to delay the release of the CMS, which we need to save the world! The only way to stop him is to assemble the greatest force of Drupal superheroes ever assembled! Can the heroes save the day? Can we manage to make the final git push? You’ll have to be there to find out!

“If you only get up early once during DrupalCon, this is the morning to do it. And hey, at least you’ll get better seats for my keynote right after.” – Dries

In Prague we had the Drupal Opera, with solos sung by Gabor Hojtsy. In Portland we had the Drupal Game show, including Doug Vann’s amazing beatbox of the Tetris theme. In Munich, we taught the world to yodel and pour good German beer. Don’t miss out this year! The fun is just getting started!

If you want to participate onstage, you can go to Robert Douglass' blog and sign up with our superhero/villain application form. But even if you just want to party from your comfy chair in the audience, costumes are encouraged! So get your best superhero costume together, and I’ll see you at the pre-keynote!

May 26 2014
May 26

Last weekend I got to keynote Drupalcamp Helsinki with my friend and often-collaborator, scaragucc - and what a great camp it was! Organizer Lauri Eskola deserves tremendous credit for taking this camp to the next level. They doubled their attendance from last year, attracted positive attention from some great notables in the global Drupal world, and got their local community energized to engage more. At all the various after parties there were frequent toasts of “one of the best Drupalcamps in the world!”

Lauri and I met at the last Drupal Dev Days event, in Szeged. That was also hailed as an example of a hugely successful Drupal event, and he took the lessons from their in-depth report to heart. To be fair, the local volunteers and sponsors also clearly busted their humps getting people registered, and finding good session speakers to work with.

The result was a really positive Drupal event for all of us. Their attendance shot past the 200 mark for the first time, their code sprint had more involvement than ever before, and their social activities were a huge success. We left Finland full of positive feeling for the local association there, the city of Helsinki, and of course the sauna culture! This was a great example of what a Drupal community event can be. I’m looking forward to next year already.

May 01 2014
May 01

I’m really excited about a new session that I’ve been doing with my friend and colleague, Adam Juran aka scaragucc: the Coder vs Themer Ultimate Grudge Match Smackdown Fight to the Death! The basic premise: we both start with the same wireframe of a front page to build. But I’m only allowed to use the module layer, and Adam is only allowed to use the theme layer. It’s a really fun and entertaining way to play with the blurry lines between “coder” and “themer”. We get the audience pretty pumped up, which is impressive for a session that’s basically about watching other people code!

If you didn’t catch it at Drupal Dev Days in Szeged, or at Drupalcamp Frankfurt, you’re probably going to have to wait for Drupalcon Amsterdam to take part! But I do have a video of the session at Frankfurt, just to whet your appetite. :)

You can consider this a challenge: if any other themers out there want to challenge me to a coder vs themer style battle, I’ll be keynoting at Drupalcamp Helsinki in a few weeks. I’ll meet you there!

Apr 02 2014
Apr 02

A few months ago I posted about how to create a custom Panels pane, a critical reference for anyone who uses Panels layouts. The other part of the toolkit for quick and awesome layouts is the Display Suite module. With DS you can create new “Display modes” for your content, to be reused around the site. For example, on one recent site I had four standard ways to display my nodes: Full, Teaser, Mini-Teaser, and Search Result. DS made this configuration a cinch.

But just as in Panels you sometimes need a pane that isn’t provided out of the box, in Display Suite you sometimes want to add a field that isn’t really a field on your content. In a recent site build, I used this capability to include information from the Organic Groups a user belongs to on his profile as it appears in search results.

DS offers some ability to create this kind of custom field through the UI, but I’m talking about more complicated outcomes where you need/want to use custom code instead. This is actually even easier than custom panels panes.

In our example, we will display the user’s name, but backwards. Obviously you can do much more complex things with this, but it’s nice to have a simple example!

First we have to tell Display Suite about our new custom field. We do this with hook_ds_fields_info().

php

//@file: Add a custom suite to display suite for Users.

/**
 * Implements hook_ds_fields_info().
 * Declare my custom field.
 */
function mymodule_ds_fields_info($entity_type) {
  $fields = array();

  if ($entity_type == 'user') { 
    $fields['backwards_username'] = array(
      'title' => t('Backwards Username'),
      'field_type' => DS_FIELD_TYPE_FUNCTION,
      'function' => 'mymodule_backwards_username',
    );
  return array($entity_type => $fields);
  }
  return;
}

Any guesses whathappens next? That’s right, we have to write our render function under the name we just declared. You can put anything here, really anything renderable at all.

/**
 * Render function for the Backwards Username field.
 */
function mymodule_backwards_username($field) {
  if (isset($field['entity']->name)) { 
    return check_plain(strrev($field['entity']->name));
  }
}

That’s it. So simple, you’ll wonder why you ever did it any other way!

Mar 29 2014
Mar 29

Today is the last day of Drupal Dev Days in Szeged, Hungary, and I’ve never been more full of the “Drupal spirit!”

One of Drupal’s greatest strengths is the closness of its' community, how friendly and accepting they can be. Drupalcons are highlight events for many, not because of the learning as much as because of the social track: the chance to see old friends and make new ones. Even more important is the chance to experience in person this incredibly friendly community. I always loved the cons because you could approach really anybody, say “hi”, and ask them about their work with the platform. Seriously, anybody. From a new user to Dries himself.

That’s become harder and harder as Drupal has grown more popular. In a convention of more than 3,000 people, you lose that feeling of being able to approach anybody. Instead, people silo into groups. In a best case it’s a group that shares an interest in a sub-system (Rules junkies, Panels proselytizers, Features fans…), but in most cases it’s because of shared connections outside the community. You end up hanging out with the same people you knew before the con. Of course you can still have fun, but that sense of community is lost.

One of the best parts of Drupal Dev Days Szeged was the way they encouraged people to mix, cross pollinate, and discuss. In a conference of 350 people I felt like I spoke to almost all of them. I could approach even the famous visitors and talk to them like a normal human being. I borrowed VGA adaptors from Gabor Hojtsy and Wim Leers, and neither of them batted an eye at it.

This kind of experience is so great, so positive and validating, that I recommend Drupal Camps for everyone. The ticket price is cheap, the location is always nearby, and the culture is fantastic. The sessions are every bit as good as most DrupalCon sessions (many of us use the Camps as a way to practice before the Con), and you will make great new friends.

Tl;DR: Drupal Dev Days in Szeged was fantastic. If you’ve never been to a Drupal Camp event, get your butt onto drupical.com and find your nearest one today!

Jan 10 2014
Jan 10

I ran into an interesting problem with the drush @self alias today. I wanted to pull a fresh copy of the DB down from a client’s live site to my local development copy. Should be as easy as drush sql-sync @clientsite.live @self, right? I’ve done this a thousand times before.

And I’ve also ignored the warning message every time before, but today I thought I’d check it out:

WARNING: Using temporary files to store and transfer sql-dump. It is recommended that you specify –source-dump and –target-dump options on the command line, or set ‘%dump’ or ‘%dump-dir’ in the path-aliases section of your site alias records. This facilitates fast file transfer via rsync.

There are actually two possible solutions to this warning (that I can think of), and they illustrate some of the useful “power user” features of Drush that any frequent user should be aware of.

The warning is there because drush would prefer to rsync the DB dump from site1 to site2, rather than a one time copy. Rsync has lots of speed improvements, not the least being diff transfer. When transferring an updated copy of a file which already exists at the destination, rsync will only send over the changes rather than the whole file. This is pretty useful if you’re dealing with a large, text based file like an SQL dump - especially one that you’ll be transferring often. In order to use this efficient processing though, Drush needs to know a safe path where it can store the DB dump in each location.

First we’ll add the %dump-dir% attribute to our alias for clientsite:

php
// Site clientsite, environment live 
$aliases['live'] = array(
  'parent' => '@parent',
  'site' => 'clientsite',
  'env' => 'live',
  'root' => '/var/www/example.com/public_html',
  'remote-host' => 'example.com',
  'remote-user' => 'cvertesi',
  'path-aliases' => array(
    '%dump-dir' => '/home/cvertesi/.drush/db_dumps',
  ),
);

Notice that %dump-dir actually goes in a special sub-array for path-aliases. This is very likely the only time you’ll need to use that section, since most everything else in there is auto-detected. This is the directory on the remote side where drush will store the dump.

Our options come in with the @self alias. In a local dev environment, the most common way to handle this is in your drushrc.php file:

$options['dump-dir'] = '~/.drush/db_dumps';

But this won’t work for all cases. You can also take advantage of Drush’s alias handling by creating a site alias with the settings you want, and letting Drush merge those settings into @self. When Drush builds its' cache of path aliases, it uses the site path as the cache key (for local sites only). That means that if you have a local alias with the same path as whatever @self happens to resolve to, your alias options will make it into the definition for @self. So here’s the alternate solution:

$aliases['localdev'] = array(
  'root' => '/Users/cvertesi/Sites/clientsite',
  'uri' => 'default',
  'path-aliases' => array(
    '%dump-dir' => '/home/cvertesi/.drush/db_dumps',
  ),
);

There’s just one, obscure caveat with the latter method: somewhere in the alias merging process, BASH aliases are lost. That means that ‘~’ stops resolving to your home directory, and you have to write it out (as I did above).

Have fun!

Jan 07 2014
Jan 07

Install profiles are a great way to throw together a functional Drupal site really quickly. In Drupal 6, an Install Profile was just a blueprint for setting up a site really quickly. What you did after the site was installed was your own business! But in Drupal 7 profiles are much more integrated with core. The assumption is that when you use an install profile, you want to rely on the profile’s maintainer for all your updates. This is not always the case.

Very often your site will diverge from the install profile as it takes on a life of its own, and it will be useful to convert it to “vanilla” Drupal. Today I’ll use a relatively simple example of a musician site which is moving away from the Pushtape distribution. Later I’ll return to this subject with the much more in-depth example of moving a community site away from Drupal Commons.

Move things around

Install profiles have all their files stored in the site root’s profiles/ directory. The first step is going to be moving everything out of there. In the case of pushtape, we have libraries, modules, and a theme stored in there. We’re going to move them to a more normal location.

# mkdir sites/all/libraries
# mv profiles/pushtape/libraries/* sites/all/libraries

# mkdir sites/all/modules/custom
# mv profiles/pushtape/modules/pushtape_* sites/all/modules/custom
# mv profiles/pushtape/modules/* sites/all/modules

# mkdir sites/all/themes
# mv profiles/pushtape/themes/* sites/all/themes

Next we need to see if there are any variables set in the install profile which really depend on the profile directory. If there are, we’ll have to set them again with the new path.

# cd profiles/pushtape
# grep 'profiles/pushtape' * -R
pushtape.install:  variable_set('sm2_path', 'profiles/pushtape/libraries/soundmanager2');

In this case, we see one variable_set which tells the system where to find the soundmanager2 library. We can update that easily enough with drush:

# drush vset sm2_path 'sites/all/libraries/soundmanager2'

Now we have to update Drupal’s setting for which install profile was used to create the site.

# drush vset install_profile standard

In some cases this will be enough to work. Personally I like to keep my modules folder more organized, so I go the extra mile:

# cd sites/all/modules
# mkdir contrib
# mv !(custom|contrib) contrib

I also separated out the custom code from the features. You can figure out which custom modules implement features with find . |grep features, and move them into a separate directory manually.

Clearing caches

Once you’re done moving things around, CLEAR CACHES. Drupal keeps an index of module, library, and theme directories, and you just broke it.

drush cc all

The only problem is, in many cases you will have moved a module that is required for drupal bootstrap. So you’ll have to get the handy drush tool Registry Rebuild, and run that before your cache clear:

# drush dl registry_rebuild
# drush rr
# drush cc all

As commenter @ericaitala notes, you may need some followup cleanup to really get all traces out. The easiest way to do this is from the SQL command line, which you can access via drush:

drush sqlq "DELETE FROM `system` WHERE filename LIKE 'profiles/profilename/profilename.profile"
drush sqlq "UPDATE `system` SET status=1 WHERE filename LIKE 'profiles/standard/standard.profile'"

Technically these should both be covered by the registry_rebuild operation, but we’re doing it by hand because it seems to be missed in some operations. The first command removes the entry for the profile from Drupal’s system table - it removes any knowledge Drupal has that there was an install profile there. The second tells Drupal that the “standard” install profile is active, and should be checked for updates.

That’s it - your site is now officially a vanilla Drupal install. Test by removing the profiles/pushtape directory, clearing caches, and browsing around your site.

NOTE: With a more complex install profile I expect to encounter more difficulty. Stay tuned for the post on extricating yourself from Commons later this year!

Jan 03 2014
Jan 03

Lots of sites are now built with the “Panels everywhere” method, using Panels and Panelizer to configure modular layouts in the Drupal GUI. These modules come with lots of great default Panes, and create even more defaults based on your existing Blocks and Views. But there’s always a case for a custom Pane.

As usual, I’ll assume that you have an empty custom module called mymodule, with only a .info and a .module file to its name.

  1. Tell CTools that you have custom code here

Ctools, like Views, needs a hook to declare the fact that you have custom code. To do this we’ll use hook_ctools_plugin_directory. This hook is invoked for all Ctools plugin types, and includes the module name as a variable. This way you can avoid eating up memory for anything except the targeted module. You also have to declare where your custom code will live. So here’s the complete content of mymodule.module:

php

/**
 * Implements hook_ctools_plugin_directory().
 */
function mymodule_ctools_plugin_directory($owner, $plugin_type) {
  if ($owner == 'ctools' && $plugin_type == 'content_types') {
    return 'plugins/content_types';
  }
}

Note: Do not confuse Ctools “Content Types” with the “Content Type” entity used elsewhere in Drupal. This is just confusing naming, but they’re totally different things. Actually the most common usage for a Ctools Content Type is a pane, just like what we’re doing now. There are other plugin types, but none that interest us in this post.

  1. Add your custom pane

Oh, did you think this would be more difficult? Now that we’ve told Ctools to look for Content Type plugins in our module’s plugins/content_types subdirectory, we just add a .inc file for each “Content Type” (aka Pane) that we want to add. Let’s do a simple one, which returns the root term of a given taxonomy term. All the following code will go in plugins/content_types/taxonomy_root_term.inc (a name I chose arbitrarily).

Right at the top of the file, we provide a $plugin array which defines the basic information about our Pane Ctools Content Type. This doesn’t go into a function or anything, it just sits at the top of the .inc file:

php

$plugin = array(
  'single' => TRUE,
  'title' => t('Taxonomy root term'),
  'description' => t('a Display of data from the root term of the given TID'),
  'category' => t('Custom Panes'),
  'edit form' => 'mymodule_taxonomy_root_term',
  'render callback' => 'mymodule_taxonomy_root_term_render',
  'admin info' => 'mymodule_taxonomy_root_term_info',
  'defaults' => array(),
  'all contexts' => TRUE,
);

As you can see, this array defines a category, title, and description for the Panels admin interface. It also declares the names of the callbacks which provide the pane’s edit form, rendered form, and admin info. “Single” means that this type has no sub-types. This is the case in every single custom pane I’ve ever seen, so it’s probably the case for yours as well.

Now we write the callbacks we named in that array. We’ll start with the edit form.


/**
 * Edit form.
 */
function mymodule_taxonomy_root_term($form, &$form_state) {
 $conf = $form_state['conf']; 

 $form['term'] = array(
   '#type' => 'textfield',
   '#title' => t('Term ID'),
   '#description' => t('The term, from which the root term will be displayed'),
   '#default_value' => $conf['term'],
 );

  $entity_info = entity_get_info('taxonomy_term');

  $options = array();
  if (!empty($entity_info['view modes'])) {
    foreach ($entity_info['view modes'] as $mode => $settings) {
      $options[$mode] = $settings['label'];
    }
  }

 $form['view_mode'] = array(
   '#type' => 'select',
   '#options' => $options,
   '#title' => t('View mode'),
   '#default_value' => $conf['view_mode'],
 );

 return $form;
}

This is a fairly standard Drupal form. It also goes through typical form validation and submission functions, so you can provide a pretty complete experience for the administrator. In our case, we just want to get the term ID of the term whose root parent should be displayed. We let the administrator enter the term ID, and the view mode which should be used to display it. We won’t worry about form validation in our example. Let’s move on to the Submit function:

/**
 * Edit form submit function.
 */
function mymodule_taxonomy_root_term_submit($form, &$form_state) {
  $form_state['conf']['term'] = $form_state['values']['term'];
  $form_state['conf']['view_mode'] = $form_state['values']['view_mode'];
}

Again, pretty simple stuff. We just make sure that the $form_state[‘conf’] has the values entered. Now, the next callback we defined in $plugin is for rendering the pane:

/**
 * Render the panel.
 */
function mymodule_taxonomy_root_term_render($subtype, $conf, $args, $contexts) {
  if ($context->empty) {
    return;
  } 
  // Get full term object for the root term.
  $term = ctools_context_keyword_substitute($conf['term'], array(), $contexts);
  $parent_array = taxonomy_get_parents_all($term);
  $root = end($parent_array);

  // Render as a block.
  $block = new stdClass();
  $block->module = 'entity';
  $block->delta = 'taxonomy_term-' . str_replace('-', '_', $conf['view_mode']);

  $entity = entity_load_single('taxonomy_term', $root->tid);
  $block->content = entity_view('taxonomy_term', array($root), $conf['view_mode']);
  return $block;

}

First we make sure there is information - ie the taxonomy term ID we need - in the pane’s context. Then we get the root term object and render it in the requested display mode. The only requirement for the return here is that it be a Drupal render array. So depending on your use case, you can return an image, a field… whatever you like. In most cases a block is a convenient wrapper for whatever you have to return, which is what I did here.

This is as far as you have to go. The admin info callback isn’t actually required, just don’t include it in the $plugin array and you’ll be fine. But if you want to make your life easier as a site admin, it’s definitely a nice to have.

/**
 * Admin info.
 */
function mymodule_taxonomy_root_term_info($subtype, $conf, $contexts) {
  if (!empty($conf)) {
    $content = '

Term ID: '</span> . $conf['term'] . '

'
; $content = '

View mode: '</span> . $conf['view_mode'] . '

'
; $block = new stdClass; $block->title = $conf['override_title'] ? $conf['override_title_text'] : ''; $block->content = $content; return $block; } }

This just provides the administrative summary which you can see in the Panels UI. Again, Panels will be happy with any render array return you throw at it, so I use a block.

This is why we have nice things

Anyone who’s worked with me knows that I’m not a huge fan of the panels everywhere approach. But I use it often, simply because it makes custom layouts and totally custom page pieces so easy to do. Duplicating even this very simple functionality in a block is actually harder than this. You’re still using about 3 functions, but you’d have to determine in advance where that TID will come from. It would certainly come out less flexible, and actually probably harder to maintain. With Ctools all your related code sits in one place, and your module structure actually helps you see what’s going on where.

If you learn how to do elements like this, you’ll find Panels creeping into more and more of your builds. And rightfully so.

Dec 26 2013
Dec 26

You’re building a View, but you can’t get that field to display the way you want it to. Or filter, or sort. Or maybe you have some data in a custom table that you want to include in the View. So you look for a contributed module, and Views PHP looks like the answer to your problem! Until you read the module page, where it says:

“…it is highly advisable to use regular handlers and plugins when available (or even to create one yourself). Take note that filtering and sorting a view using PHP always has a considerable perfomance impact.”

As of this writing, 44,497 site maintainers have read that warning and chosen to ignore it. They’ve chosen to put their PHP into a non-revisioned, difficult-to-access place, and to enable PHP input in a module that was never designed for security. They’ve left their site at risk of a very difficult to diagnose and even harder to fix WSOD.

I’m going to go out on a limb here, and suggest that in many of these cases, the decision was made because someone had the impression that writing a Views handler or Plugin was difficult. I’m here to tell you that’s not so: it’s actually quite easy.

What we’re doing

We’re going to tell Views about the structure of the data we want to display, filter, or sort - even if there’s not actually a new data source involved, that’s how you do it - and then we’ll write the function that actually does the filter/sort/etc by improving an existing field display/filter/sort that Views already includes.

This process will work for:

  • Defining a new data source for Views, ie something your module keeps in the DB.
  • Creating multiple field displays/filters/sorts for an existing field.
  • Creating a completely computed field display/filter/sort, with nothing in the DB.

I know that in 99% of use cases for Views PHP, you don’t need to define a new data source, table, adn fields. Trust me that this is the easiest way to learn it, though. I promise we’ll get to your use case before the end of the post.

I’ll assume you have a custom module built, with a .info and .module file, but nothing in there yet. We’ll call our module “mymodule” for the example.

  1. Tell Views about your module

We implement hook_views_api to let Views know that our module provides some code for Views, and what version of the Views API we’re using.

php

/**
 * Implements hook_views_api().
 */
function mymodule_views_api() {
  return array(
    'api' => 3,
    'path' => drupal_get_path('module', 'mymodule') . '/views',
  );
}

Couldn’t be simpler. We declare that we’re using Views API 3, and that our Views code will all live in the /views subdirectory of our module.

  1. Tell Views about your custom code

Now that Views knows to look in our /views directory, we should populate it. Views will look for a file called modulename.views.inc in that directory, so this is where we will put our Views hooks. There are lots of Views interventions you can do in this file, but we’re only interested in one: hook_views_data.

This hook lets you define new data sources to Views, and for each one show how to render a field, how to Filter results, and how to Sort results based on your new data source. I promised you three use cases up there though, and here’s the trick: you don’t have to have an actual data source. You can define a filter for a database field that’s already described elsewhere.

First let’s look at a real field definiton though, because it’s simpler. Here’s how we would define a real DB table as a data source. The table looks like this:

So here’s our implementation of hook_views_data:

/**
 * Implements hook_views_data().
 */
function mymodule_views_data() {
  // Build an array named after each DB table you're describing. In our case,
  // just mymodule_table.
  $data['mymodule_table'] = array(
    // First give some general information about the table as a data source.
    'table' => array(
      // The grouping for this field/filter/sort in the Views UI.
      'group' => t('Example Views Stuff'),
      'base' => array(
        'field' => 'naid', // This is the identifier field for the view.
        'title' => t('Example Views API Data'),
        'help' => t('Names provided by the Mymodule module.'),
      ),
    ),
    // Now we describe each field that Views needs to know about, starting 
    // with the identifier field.
    'naid' => array(
      'title' => t('Name ID'),
      'help' => t("The unique Name ID."),
      'field' => array(
        'handler' => 'views_handler_field_numeric',
        'click sortable' => TRUE,
      ),
      'sort' => array(
        'handler' => 'views_handler_sort',
      ),
      'filter' => array(
        'handler' => 'views_handler_filter_numeric',
      ),
    ),
    // Now the name field.
    'name' => array(
      'title' => t('Name'),
      'help' => t("The Name."),
      'field' => array(
        'handler' => 'views_handler_field',
        'click sortable' => TRUE,
      ),
      'sort' => array(
        'handler' => 'views_handler_sort',
      ),
      'filter' => array(
        'handler' => 'views_handler_filter_string',
      ),
    ),
  );
  return $data;
}

This is a pretty simple example, and I think the array structure speaks for itself. First you provide some general information about the table, then you create a sub-array for each field on the table. Each field’s array should be named after the field, and provide at least title. Of course it wouldn’t be useful if you didn’t describe the handlers for any field/sort/filter operations you want to expose. For each one of these you just provide the name of the handler. In this example I used all built-in filters that come with Views, but it’s easy enough to provide a custom handler.

Many added behaviors in Views start with hook_views_data; this only covers the basics. You can also open fields up as arguments or relationships, or even add built-in relationships. For example, if our table also contained an NID field, we could define a relationship so that node fields are always available when listing names, and vice versa. This stuff is all surprisingly easy to do, it’s just not the focus of this post.

  1. Write your custom handler

Let’s say we want to provide our own field handler for the name field. Maybe we want it to automatically separate first names. This is easy, too! You simply decide on a name for your new handler - by convention it should begin with *modulename_handler_type_*, so we’ll use *mymodule_handler_field_firstname*. Here’s the relevant part of that *$data* array from before:

...
    // Now the name field.
    'name' => array(
      'title' => t('Name'),
      'help' => t("The Name."),
      'field' => array(
        'handler' => 'mymodule_handler_field_firstname',
        'click sortable' => TRUE,
      ),
...

Not exactly rocket science, is it?

Now we create a file named after the handler, also in the /views subdirectory. Though you could write your own handler class from scratch, you’ll almost never have to. It’s much easier to just extend an existing class.

php

/**
 * @file
 * Definition of mymodule_handler_field_firstname.
 */

/**
 * Provide the first name only from the name field.
 *
 * @ingroup views_filter_handlers
 */
class mymodule_handler_field_firstname extends views_handler_field {
  /**
  * Render the name field.
  */
  public function render($values) {
    $value = $this->get_value($values);
    $return = explode(' ', $value);
    return 'First name: ' . $return['0'];
  }
}

You see the pattern we’re following: just name a handler, then extend an existing Views handler class to do what you want. You can override options forms, the admin summary… really any aspect of the way Views handles this data. And the pattern is the same for fields, sorts, filters, and arguments.

Once you’ve created your handler’s .inc file, you have to make sure your module loads it. So edit your module’s .info file thusly:

name = My Module
description = Demo module from ohthehugemanatee.org
core = 7.x

files[] = views/mymodule_handler_field_firstname.inc
  1. Multiple filters for one field

We all understand how this works for data that you’re declaring for the first time in Views. But what if you want to provide multiple handlers for a single field? Maybe there are several different ways to filter or sort it. For most use cases, you should just follow the pattern above, and simply override the Views options form in your handler class. But occasionally you really do need multiple handlers.

So let’s add a second and third field handler for our name field:

...
    // Now the name field. This is the first, and 'real' definition for this field.
    'name' => array(
      'title' => t('Name'),
      'help' => t("The Name."),
      'field' => array(
        'handler' => 'mymodule_handler_field_firstname',
        'click sortable' => TRUE,
      ),
    ),
    'name_last' => array(
      'title' => t('Last name'),
      'help' => t('The Last name, extracted from the Name field'),
      'real field' => 'name',
      'field' => array(
        'handler' => 'mymodule_handler_field_lastname',
        'click sortable' => TRUE,
      ),
    ),
    'name_backwards' => array(
      'title' => t('Evil Genius Name'),
      'help' => t('The name, reversed so it sounds like the name of an evil genius.'),
      'real field' => 'name',
      'field' => array(
        'handler' => 'mymodule_handler_field_evil',
        'click sortable' => TRUE,
        ),
      ),
...

Can you spot the difference? All you have to do is add a variable for real field, which tells Views the field name to use for the source value, and that’s it. Everything else is totally identical to a normal field. By custom we prefix the “virtual” field’s name with the name of the real field, but that’s as complicated as it gets.

If there’s one thing I want you to take away from this blog post, it’s that the Views API is actually really easy. And if you can’t find something online, take a moment to actually look at the API documentation included with the module. It’s very thorough, and easy to read. If you feel like you understand how this works, but the doco doesn’t quite cover what you’re trying to do, look for examples in the Views module itself! There are 169 handlers for every concievable kind of case, just within Views. Find something reasonable and build off of that!

With this in mind, it’s only 24 lines of simple code to provide your own handler for an existing field. After that 24 lines, you’re doing the same things you were planning to do in views_php… but now you’re doing them in a real coding environment, with a revisioning system, and where it’s easy to track down and fix errors that could otherwise crash your site. 24 lines of array definition can save you a world of hurt. I hope to see those views_php installation numbers dropping soon.

Dec 02 2013
Dec 02

One of the big advantages to using the Context module is how totally extensible it is. Not only can you use and re-use the built in conditions, you can write your own. This brings all the power of the custom PHP evaluation method of block placement, but in a structure that makes your code re-usable, contributable, versioned, and standards-based. Writing a custom Context Condition is also a great template for how to integrate custom behaviors in many of the more complex Drupal modules such as Views and Search_API. We’ll see this pattern again and again, and this is about the most basic one to demonstrate with.

My task was to determine if the displayed node was entity-referenced as being the “special” node from it’s parent organic group. It’s a weird requirement (which is exactly why a custom Condition makes sense here), so let me explain that again. On a site with Organic Groups, the Group node has an entityreference field, which marks one of the Group member nodes as special. When the user is viewing this special node, our Rules condition should evaluate to positive.

The first prerequisite is to make absolutely certain that you can’t do this using any of the built in Conditions, and something this unique definitely qualifies there. So let’s get to the implementation in our custom module. The module will be called CCC for Custom Context Condition.

{% include_code ccc.info lang:ini modules/ccc/ccc.info %}

That’s a totally normal .info file, with logical dependencies on OG, EntityReference, and Context modules. Let’s have a look at the .module file. This is probably a lot simpler than you expected.

{% codeblock lang:php ccc.module %} /**

  • Impelements hook_context_plugins(). */ function ccc_context_plugins() { $plugins = array( ‘ccc_condition_og_special_node’ => array( ‘handler’ => array( ‘path’ => drupal_get_path(‘module’, ‘ccc’) . ‘/plugins/context’, ‘file’ => ‘ccc_condition_og_special_node.inc’, ‘class’ => ‘ccc_condition_og_special_node’, ‘parent’ => ‘context_condition’, ), ), ); return $plugins; } {%endcodeblock %}

First we implement hook_context_plugins(), to declare our new condition plugin to Context. This function should return an array of plugins, keyed by plugin name (in our case, ccc_condition_og_special_node). For each plugin, you have to explain to Context some basic information about the handler you’re going to write.

  • path The path to the plugin file. By convention you should put it in your module’s directory, under /plugins/context.
  • file The filename to look for. Keep yourself sane, and name it after the plugin you’re writing.
  • class The name of the Class you’re about to write. If you’ve never written a PHP class before, this is good practice for D8 and object oriented code in general. Think of it like a function name, and again: name it after the plugin you’re writing.
  • parent The Class you are extending to create your condition. If you don’t know what to put here, just enter ‘context_condition’.

Now that Context knows about your plugin, you have to declare it to the UI in order to use it! For this we implement hook_context_registry. This function returns an array keyed by plugin type–in this case, “conditions”. For each condition (keyed by condition name), we need title, description, and plugin.

{% codeblock lang:php ccc.module %} /**

  • Implements hook_context_registry(). */ function ccc_context_registry() { $registry = array( ‘conditions’ => array( ‘ccc_condition_og_special_node’ => array( ‘title’ => t(‘OG Special Node’), ‘description’ => t(‘Set this context based on whether or not the node is the “Special Node” entityreferenced in the parent OG.'), ‘plugin’ => ‘ccc_condition_og_special_node’, ), ), ); return $registry; } {% endcodeblock %}

Now Context module knows everything it needs to know about your plugin and condition, we have to tell Drupal when to evaluate your condition. You can implement whatever hook make sense for you here, the important part is that you execute your plugin. Since our condition only makes sense after everything else has fired (ie when the OG context is well and firmly set), we’ll implement hook_context_page_reaction().

{% codeblock lang:php ccc.module %} /**

  • Implements hook_context_page_reaction().
  • Executes our OG Special Node Context Condition.
  • Gotta run on context_page_reaction, so Views and OG have a chance to
  • set/modify Group context. */ function ccc_context_page_reaction() { $group = og_context(); // Only execute the group node context condition if there is a group node // in context. if ($group) { $plugin = context_get_plugin(‘condition’, ‘ccc_condition_og_special_node’); if ($plugin) { $plugin->execute($group); } } } {% endcodeblock %}

That’s it for your module file. Just declare the plugin to Context and its UI, and find a place to actually execute the plugin. Now we’ll write the actual handler class.

Create your plugin file in the place you promised Context to find it in your hook_context_plugins() implementation. In our case, this is plugins/context/ccc_condition_og_special_node.inc . We’re going to extend Context’s basic Condition Class to provide our own functionality. Here are the contents of my ccc_condition_og_special_node.inc file:

{% include_code lang:php modules/ccc/plugins/context/ccc_condition_og_special_node.inc %}

The trickiest part of this is in the Condition settings form and values. Context assumes that your settings form will be a series of checkboxes, and does a lot of internal processing based on that assumption. We don’t want to mess any of that up, so there’s a bit of dancing around the requirement here.

First we provide the function condition_values. Context needs to know in advance what the possible values are for the Condition’s settings form, and this is where you return them. Based on this return, Context will build a settings form of checkboxes for you.

Then we override the settings form with condition_form(). I change the type of the form element to radio boxes, and set a default value.

Then I add my own submit handler, which merely takes the result of the radio box and puts it into an array, just like it would be if this was a checkbox.

Finally, we get to the good part: the execute function. If you recall, this is what we called in ccc_content_page_reaction(). Here we load the Group node, and use entity_metadata_wrapper to extract the value of the field_special_node entityreference field on that node. Then we test the current NID from the URL. Note that you never have to explicitly return FALSE; Context is only watching for TRUE returns.

When I learned how to do this, I found it surprisingly easy. The hardest part is wrestling with the Condition class to get exactly the behavior you like. Everyone ends up doing some dancing around here, so don’t feel bad about it. Context’s own Conditions are great examples. Have a look at the classes provided in context/plugins/context_condition_*.inc to get ideas for how to do this.

Apr 18 2013
Apr 18
Oh The Huge Manatee! Author Image

Thoughts and technical notes from the field. I'm Campbell Vertesi, a principal software engineering lead at Microsoft, opera singer, and parent. These are the things that go through my mind.

Jun 13 2012
Jun 13
I recently took over an aging Drupal module, Resource Conflict. It had a D6 version with a lot of outstanding bugs in the queue... I started by simply rewriting it for D7 and my specific use case, but ended up in a discussion with the module's old maintainer about a total rewrite for Entities and Rules. So that's what I did, and I just released the 7.x-3.x-dev version.

Resource Conflict USED to be a module that detected booking conflicts in resources, and threw a form error. Resources were node reference field entries, and it supported date and event modules for the date/time. Resource Conflict version 3 is built on Entities and Rules, so the behavior is now totally customizable. This lets it support a wide variety of use cases.

How to use it
Out of the box, the module lets you enable Content Types for resource conflict checking, and you select which date field to use for each enabled content type. You do all this on the Content Type's edit page.

The module comes with a sample Rule for you, which throws a form error any time you try to save a Resource Conflict enabled node with a date that overlaps another Resource Conflict enabled node.


But this is not a very realistic use case. More likely you will want to edit the rule to add some more conditions - for example, maybe a resource conflict on your site means overlapping taxonomy terms. Or entity references. Or titles. Or anything at all!

To give an example of how complex this can go - on a site I recently launched, I have two kinds of resource conflict events. "Hard" conflicts are when two bookings at the same time reference the same user or taxonomy term.  Hard conflicts throw a form error and prevent saving the node. This is just like the above case, and it's a really simple modification on the default Rule.  "Soft" conflicts are when you try to book a taxonomy term that is a parent or child of a term in an existing booking, or a user who is tagged with a term or child of a term that is already booked. For example, maybe I try to book the taxonomy term "Maintenance", but the sub-term "Plumbers" is already booked at the same time. Or maybe I try to book John Doe, who is tagged as a "Plumber", but Maintenance is already booked at that time.  Soft conflicts can be booked, but they create a node to record the conflict.

Other people have asked about using the module to integrate with Organic Groups, or to control bookings of finite resources - you could easily use Rules to build those functionalities. Best of all, it's all exposed and easy for you to maintain.

Here's a run down of the Rules components included with Resource Conflict 7:

  • EVENT: A Resource Conflict Node Form is Validated: This rule fires during node form validation on Resource Conflict enabled content types. You should use this event if you want to set form errors, or if you want to interact with Rules_forms module. It provides both a node object of the node being created/edited and a form object for use with rules_forms. This is the event trigger for the default Rule.
  • CONDITION: Contains a Resource Conflict: Evaluate a node object for conflicts. Returns TRUE if there are conflicts for the node.
  • ACTION: Load a List of Conflicting Nodes: Creates a list of nodes that conflict with the given node.
  • ACTION: Set a Form Validation Error: Stores a form validation error to be fired the next time a validation hook is called on a conflict-enabled node. This is intended for use with the "A Resource Conflict Node Form is Validated" Event.

Future plans

I'm pretty happy with this initial dev release; it solves a lot of problems quite elegantly. But there are a few remaining. Here are the items I'm planning on knocking out first.

  • Drupal 6 backport: I'll release a 6.x-3.x branch which uses Rules for reactions. The biggest issue here is that rules_forms module doesn't exist for D6, which makes it impossible to throw a form validation error from within Rules. I may have to write a custom Rules action to do this.
  • Full Entities integration: I'd like to generalize the module out to support ALL fieldable entities. 
  • Better Integration with rules_forms: rules_forms is pretty cool, and even cooler is the new version they're working on, that doesn't rely on the "form" Rules object. I'm looking forward to this, because it means we'll be able to do cooler things with resource_conflict.
Jun 08 2012
Jun 08
Drupal is big and complex, and no matter how good you are, you will need to debug. Before you get started coding for a Drupal site, you should download and enable the very useful devel module on your dev site. Go ahead, I'll wait.

This primer covers all the tools that I'll use in later tutorials on this blog. In the real world I end up using 5-6 of these guys regularly. Learn them. They help.

drupal_set_message()

Printing a message is the basic tool of any debugging, and you need to know this core function. This lets you print a message to the page, Drupal-style. Usage: drupal_set_message($message, $severity).

drupal_set_message(t('Hello %name!', array('%name' => 'John Doe')), 'warning'); 

Note that I use the t() function to process text in drupal_set_message. This is partly because the function itself doesn't do any text sanitizing, so it's good practice to insert a layer there. It's also because you can't  insert variables directly into the message, and t() is a convenient way of doing that.

Note that drupal_set_message() does not do well with large messages - it is not recommended to print big arrays directly to the message area. For that, we'll use another function:

dpm()

This sets a variable of your choice to Drupal's messaging system, in a nice compact output handled by krumo. 

Most of Drupal is understandable and learnable on the fly with just this simple tool. Usage: dpm($variable, $name = NULL).

dpm($node, 'This is the node variable');


kpr()

Just like dpm, but it prints to the page header rather than the Drupal message area. Helpful when your theme doesn't print messages. You can also have it return a string instead of printing it automatically, which comes in handy in some fairly bizarre circumstances. Usage: kpr($variable, $return = FALSE, $name = NULL). 

kpr($node, FALSE, 'This is the node variable');

dvm()

OK, you don't like krumo. Maybe you're a masochist, or maybe you have some esoteric requirements to deal with. Either way, you want dvm(). It prints variables in a more traditional format into the message area. I have used this to get variables into pastebin, very helpful when asking for help on Drupal IRC channels. :) Usage: dvm($variable, $name = NULL).

dvm($node, 'This is the node variable');

dpr()

If your theme doesn't display messages, dpr() prints variables to the page header without krumo. Same usage case as above. Just like kpr(), you can return a string with this function. Usage: dpr($variable, $return = FALSE, $name = NULL).

dpr($node, FALSE, 'This is the node variable');

dargs()

This guy shows you the arguments passed into the current function, using krumo for readability. This is my go-to function when using Drupal hooks. The API documentation is great, but there's nothing like simply seeing the variables you have to work with. Usage: dargs()

function swearing_custom_form_alter(&$form, &$form_state, $form_id) {
  dargs();
}

dd()

For those awful times when a hook doesn't produce direct output, there's always dd(). This guy prints a given variable to a file called "drupal_debug.txt" in your site's temporary files directory. This comes in very handy. Usage: dd($variable, $name = NULL)

dd($node, 'This is the node variable');

ddebug_backtrace()

This prints a backtrace for the current function in krumo, in the head of your current page. Devel module comes with the ability to enable this automatically for fatal errors, but occasionally you just want to see how things work. Usage: ddebug_backtrace().

db_queryd()

This tool is handy for testing queries. Devel also has the ability to display a query log at the bottom of every page, showing every query with a time-to-execute. That can be pretty overwhelming. db_queryd() lets you put in a single query and see if your database spits out any errors. Usage: db_queryd($query, $arguments = array())

Apr 23 2012
Apr 23
Ah, it's nice to be done with the basic structure stuff. At this point, I'm going to assume that you know the basics of how to administer a Drupal site. You understand the file structure, you know where to go to download themes and modules, and you know how to make sites out of the big pieces you can find that way. Awesome. This is where it gets fun. I'm going to assume some basic knowledge of PHP, but there's nothing here that you can't pick up with some quick glances over at w3schools, and maybe some experimentation on your own.

I mentioned in the file structure post that the whole point of Drupal is that Everything Is Override-able. If you want to customize the site beyond what you get out of the box with some contributed modules, you're going to want to do it in your own module. That's right - it's incredibly easy to write your own Drupal module to insert whatever code you want into your particular Drupal site. In fact, it's the recommended way to customize your site, because it means that your code won't get mixed up with anything in Drupal's "core". If you update Drupal, you don't have to worry about refactoring your patches or anything like that, you've got one place to look for your custom behavior. More importantly, if someone else comes along to take care of the site after you're off the project, there's only one place for them to look for customizations.

A module at its most basic consists of a directory under sites/whatever/modules , a self-named .info file giving basic information about the module, and a self-named .module file with your code in it. In this quick learning guide, we'll build a "Hello World" module that prints some text on every page load. This is a quick guide - I'll get more into how to leverage Drupal in your code in later posts.

So let's create your custom module. Consider your module's name - Drupal has a single namespace for all modules and features, and it's easy to get confused with the theme namespace as well, so it's best to pick a name that is definitely not going to be used by any modules, features, or themes. Most big shops will have a custom module for most any site they build, and they just name it "Sitename Custom" or something similar. We're going to make a module called "Swearing Custom", to follow that convention.

First create the module directory. I like to keep my custom modules all under sites/all/modules/custom . Drupal will search subdirectories, so take advantage of that fact to keep your codebase tidy! We'll create the directory sites/all/modules/swearing_custom . Note the underscore instead of a space.

Then create the module's .info file. This file informs Drupal about your module's name, description, requirements, and other fundamentals. I like to think of it as anything that will go on the site's Modules listing. This file has to be named after the module, just like the directory was. Here's my swearing_custom.info file:

name = Swearing at Computers Custom Module description = Custom tweaks and functionality for Swearing at Computers project = "swearing_custom" version = "7.x-1.x-dev" core = "7.x" package = "custom" dependencies[] = token

Most of those variables can be omitted, to be honest. All Drupal REALLY needs is name, description, project, and core. Notice how I defined a dependency, just for fun. I also like to put my custom code into a module group called "custom", so my modules list stays nice and tidy. The norm is to add to the dependencies array with one dependency per line, for readability. You can similarly have a files[] array to add your own files to the module. By default this contains just the .info and .module files, which is all we're using so I'm not adding a files[] line at all.

Now we'll create the module file itself. This is just a php file, with an opening tag. The trouble is, how do you get your code to run? You can write all the functions you want, but how should Drupal know to call your code?

Aha, now we get to the interesting bit. Drupal modules are built around a system of "hooks". Each module defines a set of hooks, places for you to insert your custom code. And this isn't an edge case - it's totally core to how Drupal works. Big modules will have tens of hooks at your disposal... but we'll cover those a bit later. For now, we're just going to use one simple hook to get our "Hello world" code displayed: hook_init().

All hooks are named hook_something. It's how you know it's a hook. All the core hooks, and many of the contributed hooks, are documented on Drupal's excellent API resource site. When you're writing your custom module, you just declare a function named after the hook, replacing the word "hook" with your module's name. Drupal will find that function and run it for you. Simple as that.

Hook_init() is called early in the page build process, on every page of your site. This is a very powerful hook, usually used to define variables that you'll use later on. But we're just going to use it to print a message to the screen. To start with, my swearing_custom.module file looks like this:

/** * Implementation of hook_init(). */ swearing_custom_init() { print "Hello world!"; }

Pretty simple, right? That comment at the top is the standard format for Drupal. It's so that you can easily find where you used particular hooks (or where other contrib modules used them!), and it's used for a lot of Drupal's automated help and API resources. It's also nice and tidy, so it's a good habit to get into.

Then you see that I just declared a function named after the hook, added my code, and got out of there. At this point, if I enable this custom module I'll see "Hello world" in unformatted text at the top of every page. Simple, but ugly.

Enter another great feature of Drupal: a million and one helper functions. These are also well documented at api.drupal.org, and they're the functions that everyone - even Drupal's core itself - uses to get things done in a standardized way. For example, I hate the way that text appears. So let's update the module to use Drupal's own messaging system. This means using drupal_set_message(). Have a look at the API page to see the arguments for this function, but 90% of your use cases will be following this example. Here's a new version of swearing_custom.module:

/** * Implementation of hook_init(). */ swearing_custom_init() { print "Hello world!"; drupal_set_message('Hello world!', 'status'); }

That second argument tells Drupal the severity of the message: 'status', 'warning', or 'error'. Status is actually the default, so I'll cut it out of the next iteration of this code. The module now prints your message in a fancy, Drupal formatted way. It looks just like all the other Drupal modules, because that's how all of Drupal displays messages.

I'm going to add one more layer of complexity here, with the t() function. You'll see t() everywhere: it's the way you declare text strings to Drupal as user-facing text. It makes the text available for translation, and makes it easy to sanitize any variables or user-entered text. An example t() string would be:

t('Welcome to %sitename', array('%sitename' => 'Swearing at Computers');

You just include replacement patterns in the text itself, and then declare them in an array. The replacement strings get sanitized, so that's the right place to include user-entered text, URLs, or just variable output in general. So here is a final version of our custom module, using drupal_set_message() the way it was intended.

/** * Implementation of hook_init(). */ swearing_custom_init() { print "Hello world!"; drupal_set_message(t('Hello world! This is %name!', array(%name => 'Swearing at Computers'))); }

That's it for this quick guide. The basics of writing a module. The hardest part here is figuring out which hooks to use. Google is your friend: try searching for things like "drupal views hooks" to see how to tweak Views at various stages, or "drupal form hooks" to see how forms work in Drupal. Or don't - I'll be covering some of those hooks in the next posts.

Apr 16 2012
Apr 16
I've been lax about posting lately, and to make this easier on myself I think I'll post a bunch of quick guides for basic coding procedures with Drupal. Simple things and concepts, explored in just enough detail for you to understand how it works, resource links so you can actually go and put it into practice.

The first thing you need to understand about Drupal is the structure. Drupal is a modular Content Management System, built on PHP and whatever-database-you-want-which-is-probably-mysql-lets-face-it. Drupal <=6.x actually requires MySQL or PostGres, and they're still the standard on 99% of installs you'll face. The thing is, Drupal is incredibly flexible and capable. And the trick to coding with Drupal is to code WITH Drupal, not against it. There are tons of helper functions built in that make your life easier, if only you'll work the way Drupal expects you to work. So this first post is about how Drupal expects you to work. Drupal itself is a complicated guy that lives in several php files and directories. You are expected to NEVER touch any of these files or directories yourself. Seriously. 16,000 coders can't be wrong, just leave it the fuck alone. You have access to OVERRIDE anything and everything that Drupal does though, and you can do that from a nice contained subdirectory where you can keep an eye on your code. This is the /sites subdirectory. See, one Drupal codebase can actually host many websites. Each one gets its own subdirectory under /sites. for example, I might have: /sites/swearingatcomputers.blogspot.com /sites/subdomain.swearingatcomputers.blogspot.com /sites/swearingatcomputers.blogspot.com.subdirectory The directory name tells Drupal what incoming requests should be forwarded to which site. You can see how subdomains and subdirectories are handled in the naming convention, too. Pretty easy stuff. There's also a special subdirectory for "default", and one for "all". Not to complicated to figure out: "all" is for code that applies to all your sites, and "default" is the sites subdirectory for when other names don't match. I'll tell you a pro secret: hardly any of the big shops ever actually use the multi-site capability. In practice it's a pain in the ass to remember which sites use which version of which module, and Drupal itself doesn't take up any significant space. So most of us just set up a separate Drupal codebase for each site, and use sites/default and sites/all for our code. Inside each site directory you'll find: * settings.php : This contains the database connection information, and any special settings that Drupal needs for similar low-level tasks. Set once, never modified. Doesn't exist in sites/all. * files : Any user-uploaded files. This one doesn't exist in the sites/all directory. * modules : any modules that are available to Drupal. This is where you'll keep the contributed modules which are the majority of Drupal's capabilities.
* themes : This contains the themes - layouts - that are available to Drupal.

I mentioned that most professional Drupal shops just use a separate codebase for each site they set up. They also don't mix and match between using sites/default and sites/all . This is because it's very easy to end up with a copy of the same module in two places, and it's hard to know which version Drupal is actually using. So everyone standardizes their own practice here. Personally, I keep modules and themes in sites/all . sites/default only contains settings.php and the files directory.

Inside the modules directory you'll find all your contributed modules that you've downloaded. But this is also where you have to keep Features (exportable collections of settings, very useful!) and any custom code that you write for your site. This presents a namespace problem. What if you create a feature called 'Views', to hold all the Drupal Views you have? You don't want to include it in the directory for the Views module... So underneath sites/all/modules , I create subdirectories custom_features and custom_modules. It's still not good practice to have duplicate module or feature names though, so it's wise to prefix your custom work with custom_ or the name of the project.

That's all you need to know about the Drupal file structure. Once again, NEVER EVER HACK CORE. Don't touch anything outside of the 'sites' directory, and you'll be OK. There is always a way to override any behavior you like from inside the 'sites' directory, and that will make your life MUCH easier down the road, trust me.

Mar 25 2012
Mar 25
I'm a big fan of the new Entities model for Drupal. I've been working a lot with it lately, and it makes a lot of sense... but my favorite part is how it keeps blowing away my old assumptions, inherited from 7 years of Drupal experience.

Here's a big one that got me recently: Organic Groups is useless, because it's basically Taxonomy with a couple of neat add-ons. Or Organic Groups is incredibly useful, however you choose to think about it.

Let's think about Taxonomy from the perspective of what it actually accomplishes. It's a fieldable (thanks D7) way to group content. Each Taxonomy Term is an object on its own, which is referenced by other content to provide groupings. There's nothing there that you can't do with a generic entity, entityreference module, and a block view on taxonomy landing pages.

In D4-D6, Taxonomy was the best way to categorize content, especially if you want that categorization to be hierarchical in nature. Period. In D7... meh. You can build an entity that replaces taxonomy incredibly easily. In this sense, Taxonomy is useless. It could be removed from Drupal core, and treated as a great example of a well fleshed out Entity that fills a common use case. I look forward to seeing taxonomy reference replaced by entityreference, which will make the transformation complete.

ON THE OTHER HAND, having this pre-built entity is incredibly useful. Because Taxonomy IS a good implementation of the Entity model, it is flexible enough to obsolete a lot of more complex modules in a lot of use cases... and saves you a fair amount of coding and configuration in the meanwhile.

Now, I am a huge fan of Organic Groups, and in D6 it was the number one way to give flexible content groupings with any kind of membership. It always bugged me how much configuration you had to do for the simplest of Group structures, though. And in D7... now you can do a lot of what Organic Groups does with the Taxonomy Entity. I didn't think this was so until I really thought it through, and had an example to work with.

So I had a site where we wanted to provide schedules for every user on the site, and allow Events to be scheduled for arbitrary groups as well. Each group should have a landing page with its own calendar for the group and all sub-groups. For example, the group calendar for "Company Executive" should include events for "Vice Presidents", "Chiefs", and "Chairpeople". Group membership is all centrally managed. And if you're a member of a group, you should receive update emails for every event that gets added to your group.

In D6 this was a perfect use case for Organic Groups. In D7, I decided to do this with taxonomy instead. Why? The Ux for hierarchy creation and management is better in taxonomy because of the vast contrib ecosystem for it. What I would have built with Views, EntityReference, and custom code for nice hierarchical Ux in OG, is 80% built for me from day 1 with Taxonomy. Just apply the "Groups" taxonomy to the user object and the content types you want, and bam - you've just put Users and Content into semantic groups together. That's what OG, at it's core, provided for us in D6... and with some pre-built Views, is all it provided for a lot of projects.

This led me down an interesting path of thought. What is the point of OG, if Taxonomy can duplicate its functionality so handily?

And it dawned on me: all Organic Groups really offers is a fieldable Entity that interacts with Permissions in a way that is common for user-created groups. We could do this with Taxonomy and some custom code: write a module that offers permission limitations based on taxonomy term association with the user object, and then include a nice UI for adding terms to your user object by clicking a link on the taxonomy term page... you've got a solid competitor for OG. And at that point, I would start saying things like "why did you bother using Taxonomy, when OG would have done it out of the box?"

And I think that's when I really got the power of entities. We've been used to seeing modules as a way to provide extra capabilities to Drupal. They can still do that, of course. But many of the best modules will just offer pre-configured Entities for popular use cases. Like an Entity that groups users and content, offers a way for users to join groups conveniently, and gives Permissions ramifications for group membership. Or an entity that lets you group other entities semantically into a hierarchy. The complex Organic Groups and the simple Taxonomy are only a couple of steps removed from each other... and both great examples of Entities in action.

May 22 2011
May 22
Varnish is a fantastic caching proxy, commonly used for CMSes. It's not uncommon to see benchmarks boasting 300-500 page loads per second - I've seen benches up to 5000 hits per second. That's faster than serving flat HTML from Apache; we're talking about a serious benefit to your server load here.

Part of Varnish's tremendous speed comes from how lean it is. At only 58,000 lines of code, it's very lightweight. Unfortunately, this necessitates a no-frills approach. And SSL is a frill.

I think it's very well put by Poul Henning-Kamp (lead developer on the Varnish project) in this mailing list post:

I have two main reservations about SSL in Varnish:

1. OpenSSL is almost 350.000 lines of code, Varnish is only 58.000,
Adding such a massive amount of code to Varnish footprint, should
result in a very tangible benefit.

Compared to running a SSL proxy in front of Varnish, I can see
very, very little benefit from integration. Yeah, one process
less and only one set of config parameters.

But that all sounds like "second systems syndrome" thinking to me,
it does not really sound lige a genuine "The world would become
a better place" feature request.

But I do see some some serious drawbacks: The necessary changes
to Varnish internal logic will almost certainly hurt varnish
performance for the plain HTTP case. We need to add an inordinate
about of overhead code, to configure and deal with the key/cert
bits.

2. I have looked at the OpenSSL source code, I think it is a catastrophe
waiting to happen. In fact, the only thing that prevents attackers
from exploiting problems more actively, is that the source code is
fundamentally unreadable and impenetrable.

Unless those two issues can be addressed, I don't see SSL in Varnish
any time soon.


Ouch. But that doesn't help those of us who want Varnish's speed with SSL's security. Really the only solution is to set up an SSL proxy in front of Varnish. There are lots of ways to do this. I will show you what I think is the easiest option: Pound and Varnish.

1) Set up Varnish


I assume that you've already got a running Apache installation going. So now we have to put Varnish in front of it. The first step is to get Apache off of port 80 - that's where Varnish is going to live. In order to do this, we have to find the "Listen" line in Apache's configuration. On a standard install, it reads something like:Listen 0.0.0.0:80
You want to change that to another port. 8080 is a popular one, but it can really be anything above 1024. In Debian systems you can find this line in /etc/apache2/ports.conf . In CentOS it's in /etc/httpd/conf/httpd.conf . If you're not sure where it is, try grepping for the standard help text around it: grep "Change this to Listen on specific IP addresses" /etc/apache2/* -r. You also want to make sure it only serves pages to localhost, so outsiders can't attack your Apache directly. Modify the line to look like this:Listen 127.0.0.1:8080
Now let's install and configure varnish. On Debian/Ubuntu you can install it from apt repositories: apt-get install varnish. On CentOS, you first have to add the right repository for yum. You can install the "Extra Packages for Enterprise Linux" (EPEL) repo via RPM - get your version-and-architecture-appropriate link from the EPEL site. I used:sudo rpm -Uvh http://fr2.rpmfind.net/linux/epel/5/x86_64/epel-release-5-4.noarch.rpm
sudo yum install varnish

Varnish is configured in two places. General command line options that are passed directly to the daemon are set in /etc/sysconfig/varnish , and specific behaviors for the proxy are configured in a .vcl file stored in /etc/varnish.

Varnish is extremely configurable and tune-able, but this guide will focus on the basics you need for Drupal 7 (Drupal 6 only works if you use Pressflow rather than vanilla Drupal, but that's well documented elsewhere). First, edit the daemon options at /etc/sysconfig/varnish . The default file gives you four alternative configurations to choose from - we want configuration 2, the first one that uses a .vcl . Uncomment the DAEMON_OPTS lines there, and change the "listen" port to 80, and name your own .vcl file. Here's my mostly default daemon_opts .

DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/swearingatcomputers.com.vcl \
-u varnish -g varnish \
-s file,/var/lib/varnish/varnish_storage.bin,1G"

Save the file. Now we'll set up the .vcl file to configure the proxy itself. This is my .vcl , you can pretty safely just dump it into the .vcl you named in the DAEMON_OPTS above:
backend default {
.host = "127.0.0.1";
.port = "8080";
}

sub vcl_recv {
# // Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
# // Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
# // Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}

# // fix compression per http://www.varnish-cache.org/trac/wiki/FAQ/Compression
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "MSIE") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}

}

sub vcl_hash {
if (req.http.Cookie) {
set req.hash += req.http.Cookie;
}
}


The bulk of this file is occupied with making sure that cookies aren't cached, and solving a problem with compression. The only part that you should be concerned with editing is the bit at the top, backend default {. This is where you tell Varnish about all the back ends for which it should cache. Varnish is a great load balancer, so if you have 5 systems on the back end which are all serving content, you can list them here. Each one would get it's own "Backend" declaration. If you want to load balance, see a different guide. We're just interested in the caching for now. So set the .host and .port variables to match your setup - very likely you want to keep them the same.

Now test the whole thing by restarting apache and starting varnish.

sudo service httpd restart
sudo service varnish restart

If you don't see any errors, you're good to go! If you just get a generic [FAILED] for Varnish, without any error messages, there's probably a syntax problem with your VCL.

2) Set up your SSL certs for Pound


Create your server's private key and certificate request. I get confused easily between the different certs, so I name them in an idiot proof way that you might want to copy:openssl req -new -newkey rsa:2048 -nodes -keyout swearingatcomputers.com.private.key -out swearingatcomputers.com.certreq.pem
Traditionally, your private key should go in /etc/ssl/private on Debian/Ubuntu , or /etc/pki/tls/private on CentOS. It really doesn't matter, but this gives you a nice central place to store your certs.

Now use that certificate request to get a signed cert. I get mine on the cheap from Godaddy ($50/yr is hard to beat!), but if you just want to test, you can make a locally-signed cert like this:

openssl x509 -req -days 365 -in swearingatcomputers.com.certreq.pem -signkey swearingatcomputers.com.private.key -out swearingatcomputers.com.selfsigned.crt
The signed cert typically goes in /etc/ssl/certs .

For a normal SSL setup, this is all you need. But Pound likes both the certificates in a single file, so we're going to have to make a special combined version for pound.


openssl x509 -in /etc/ssl/certs/swearingatcomputers.com.crt -out /etc/ssl/private/swearingatcomputers.com.combined.pem
openssl rsa -in /etc/ssl/private/swearingatcomputers.com.private.key >> /etc/ssl/private/swearingatcomputers.com.combined.pem

Now we're ready to set up Pound.

3) Set up the Pound SSL proxy


This part surprised me with how easy it is. Pound is a great system that is very simple to configure! Install it using apt-get or yum: yum install pound, then configure it at /etc/pound.cfg .

First comment out or delete the ListenHTTP section. We don't want Pound to listen on port 80 at all.

Then we'll set up the ListenHTTPS section. Apart from telling it to listen on all devices' port 443 and giving it the cert, we're going to make sure it sets a special header to notify Drupal that it's been forwarded from an HTTPS proxy. We're also going to make sure that GET and PUT operations are supported. Then at the end, we will tell it where to find the back end (Varnish, in our case) - port 80. Here's my pound config:


User "pound"
Group "pound"
Control "/var/lib/pound/pound.cfg"

#ListenHTTP
# Address 0.0.0.0
# Port 80
#End

ListenHTTPS
Address 0.0.0.0
Port 443
Cert "/etc/ssl/certs/swearingatcomputers.com.crt.pem"

# set X-Forwarded-Proto so D7 knows we're behind an HTTPS proxy.
HeadRemove "X-Forwarded-Proto"
AddHeader "X-Forwarded-Proto:https"

#Allow PUT and DELETE too
xHTTP 0
End

Service
BackEnd
Address 127.0.0.1
Port 80
End
End


Save the config file, and start pound with service pound start. There you go, you've got an HTTPS forwarder.

4) Make Drupal HTTPS aware


One big problem with the setup so far, is that Drupal doesn't know that it's serving HTTPS content. Remember, as far as Apache is concerned, it's just HTTP served in the clear to Varnish. Even Varnish doesn't really know about the HTTPS on the front end. We're going to follow this X-Forwarded-Proto:https header back through the stack to make sure that every level interprets it properly.

First we deal with Varnish. Let's make sure that the X-forwarded-proto header is delivered to Apache intact. Find the sub vcl_hash section of your .vcl file, /etc/varnish/swearingatcomputers.com.vcf, and add these lines:


if (req.http.x-forwarded-proto) {
set req.hash += req.http.x-forwarded-proto;
}

If you're using my template above, the whole section will look like this:
sub vcl_hash {
if (req.http.Cookie) {
set req.hash += req.http.Cookie;
}
if (req.http.x-forwarded-proto) {
set req.hash += req.http.x-forwarded-proto;
}
}

You'll have to restart Varnish after making this change.

Now let's make sure that Drupal knows to look for this header. D7 has some variables for this in it's settings.php , just waiting to be uncommented. You can walk through the explanations in the file itself and uncomment the relevant lines, or just add this at the end:


# Settings for Varnish - tell Drupal that it's behind a reverse proxy

$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('127.0.0.1');

$conf['page_cache_invoke_hooks'] = FALSE;

# Settings for HTTPS cache - tell Drupal that forwarded https is the real thing
if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) &&
$_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') {
$_SERVER['HTTPS'] = 'on';
}


5) Test and brag


That's it - you should have your proxy configured! You can do a simple test to make sure it's working by watching varnishlog for cache hits. Simply varnishlog |grep hit in your terminal, and try refreshing the frontpage of your site. You should see a few lines of hits pop up in the log. (If not, you might want to try grepping for "pass" or "miss" to help work out what's happening)

Now let's see how this caching holds up under load. After all, that's the whole point, right? I like a simple ab test

ab -c 40 -n 5000 -q http://swearingatcomputers.com/</code>
This will simulate 5000 hits on the frontpage, at a rate of 40 per second. Look for "Requests per second", that's my favorite statistic here. On my "playing around" Amazon Micro instance, I pull about 650 hits per second. In theory, this smallest of VPS servers could handle over 2 million hits per hour!

I love Varnish.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web