Feeds

Author

Jan 07 2019
Jan 07

Note: This article is a re-post from Mateu's personal blog.

I have been very vocal about the JSON:API module. I wrote articles, recorded videos, spoke at conferences, wrote extending software, and at some point, I proposed to add JSON:API into Drupal core. Then Wim and Gabe joined the JSON:API team as part of their daily job. That meant that while they took care of most of the issues in the JSON:API queue, I could attend the other API-First projects more successfully. I have not left the JSON:API project by any means, on the contrary, I'm more involved than before. However, I have just transitioned my involvement to feature design and feature sign-off, sprinkled with the occasional development. Wim and Gabe have not only been very empathic and supportive with my situation, but they have also been taking a lot of ownership of the project. JSON:API is not my baby anymore, instead we now have joint custody of our JSON:API baby.

As a result of this collaboration Gabe, Wim and I have tagged a stable release of the second version of the JSON:API module. This took a humongous amount of work, but we are very pleased with the result. This has been a long journey, and we are finally there. The JSON:API maintainers are very excited about it.

I know that switching to a new major version is always a little bit scary. You update the module and hope for the best. With major version upgrades, there is no guarantee that your use of the module is still going to work. This is unfortunate as a site owner, but including breaking changes is often the best solution for the module's maintenance and to add new features. The JSON:API maintainers are aware of this. I have gone through the process myself and I have been frustrated by it. This is why we have tried to make the upgrade process as smooth as possible.

What Changed?

If you are a long-time Drupal developer you have probably wondered how do I do this D7 thing in D8? When that happens, the best solution is to search a change record for Drupal core to see if it change since Drupal 7. The change records are a fantastic tool to track the things changed in each release. Change records allow you to only consider the issues that have user-facing changes, avoiding lots of noise of internal changes and bug fixes. In summary, they let users understand how to migrate from one version to another.

Very few contributed modules use change records. This may be because module maintainers are unaware of this feature for contrib. It could also be because maintaining a module is a big burden and manually writing change records is yet another time-consuming task. The JSON:API module has comprehensive change records on all the things you need to pay attention when upgrading to JSON:API 2.0.

Change Records

As I mentioned above, if you want to understand what has changed since JSON:API 8.x-1.24 you only need to visit the change records page for JSON:API. However, I want to highlight some important changes.

Config Entity Mutation is now in JSON:API Extras

This is no longer possible only using JSON:API. This feature was removed because Entity API does a great job ensuring that access rules are respected, but the Configuration Entity API does not support validation of configuration entities yet. That means the responsibility of validation falls on the client, which has security and data integrity implications. We felt we ought to move this feature to JSON:API Extras, given that JSON:API 2.x will be added into Drupal core.

No More Custom Field Type Normalizers

This is by far the most controversial change. Even though custom normalizers for JSON:API have been strongly discouraged for a while, JSON:API 2.x will enforce that. Sites that have been in violation of the recommendation will now need to refactor to supported patterns. This was driven by the limitations of the serialization component in Symfony. In particular, we aim to make it possible to derive a consistent schema per resource type. I explained why this is important in this article.

Supported patterns are:

  • Create a computed field. Note that a true computed field will be calculated on every entity load, which may be a good or a bad thing depending on the use case. You can also create stored fields that are calculated on entity presave. The linked documentation has examples for both methods.
  • Write a normalizer at the Data Type level, instead of field or entity level. As a benefit, this normalizer will also work in core REST!
  • Create a Field Enhancer plugin like these, using JSON:API Extras. This is the most similar pattern, it enforces you to define the schema of the enhancer.

File URLs

JSON:API pioneered the idea of having a computed url field for file entities that an external application can use without modifications. Ever since this feature has made it into core, with some minor modifications. Now the url is no longer a computed field, but a computed property on the uri field.

Special Properties

The official JSON:API specification reserves the type and id keys. These keys cannot exist inside of the attributes or relationships sections of a resource object. That's why we are now prepending {entity_type}_ to the key name when those are found. In addition to that, internal fields like the entity ID (nid, tid, etc.) will have drupal_internal__ prepended to them. Finally, we have decided to omit the uuid field given that it already is the resource ID.

Final Goodbye to _format

JSON:API 1.x dropped the need to have the unpopular _format parameter in the URL. Instead, it allowed the more standard Accept: application/vnd.api+json to be used for format negotiation. JSON:API 2.x continues this pattern. This header is now required to have cacheable 4XX error responses, which is an important performance improvement.

Benefits of Upgrading

You have seen that these changes are not very disruptive, and even when they are, it is very simple to upgrade to the new patterns. This will allow you to upgrade to the new version with relative ease. Once you've done that you will notice some immediate benefits:

  • Performance improvements. Performance improved overall, but especially when using filtering, includes and sparse fieldsets. Some of those with the help of early adopters during the RC period!
  • Better compatibility with JSON:API clients. That's because JSON:API 2.x also fixes several spec compliance edge case issues.
  • We pledge that you'll be able to transition cleanly to JSON:API in core. This is especially important for future-proofing your sites today.

Benefits of Starting a New Project with the Old JSON:API 1.x

There are truly none. Version 2.x builds on top of 1.x so it carries all the goodness of 1.x plus all the improvements.

If you are starting a new project, you should use JSON:API 2.x.

JSON:API 2.x is what new installs of Contenta CMS will get, and remember that Contenta CMS ships with the most up-to-date recommendations in decoupled Drupal. Star the project in GitHub and keep an eye on it here, if you want.

What Comes Next?

Our highest priority at this point is the inclusion of JSON:API in Drupal core. That means that most of our efforts will be focused on responding to feedback to the core patch and making sure that it does not get stalled.

In addition to that we will likely tag JSON:API 2.1 very shortly after JSON:API 2.0. That will include:

  1. Binary file uploads using JSON:API.
  2. Support for version negotiation. Allows latest or default revision to be retrieved. Supports the Content Moderation module in core. This will be instrumental in decoupled preview systems.

Our roadmap includes:

  1. Full support for revisions, including accessing a history of revisions. Mutating revisions is blocked on Drupal core providing a revision access API.
  2. Full support for translations. That means that you will be able to create and update translations using JSON:API. That adds on top of the current ability to GET translated entities.
  3. Improvements in hypermedia support. In particular, we aim to include extension points so Drupal sites can include useful related links like add-to-cart, view-on-web, track-purchase, etc.
  4. Self-sufficient schema generation. Right now we rely on the Schemata module in order to generate schemas for the JSON:API resources. That schema is used by OpenAPI to generate documentation and the Admin UI initiative to auto-generate forms. We aim to have more reliable schemas without external dependencies.
  5. More performance improvements. Because JSON:API only provides an HTTP API, implementation details are free to change. This already enabled major performance improvements, but we believe it can still be significantly improved. An example is caching partial serializations.

How Can You Help?

The JSON:API project page has a list of ways you can help, but here are several specific things you can do if you would like to contribute right away:

  1. Write an experience report. This is a Drupal.org issue in the JSON:API queue that summarizes the things that you've done with JSON:API, what you liked, and what we can improve. You can see examples of those here. We have improved the module greatly thanks to these in the past. Help us help you!
  2. Help us spread the word. Tweet about this article, blog about the module, promote the JSON:API tooling in JavaScript, etc.
  3. Review the core patch.
  4. Jump into the issue queue to write documentation, propose features, author patches, review code, etc.

Photo by Sagar Patil on Unsplash.

Jul 19 2018
Jul 19

Though it seems like yesterday, Contenta CMS got the first stable release more than a year ago. In the meantime, Contenta CMS started using Media in core, improved Open API support, provided several fixes for the Schemata module, written and introduced JSON RPC, and made plans to transition to the Umami content model from Drupal core. A lot has happened behind the scenes. I’m inspired to hear of each new instance where Contenta CMS is being used both out-of-the-box and as part of a custom decoupled Drupal architecture. Both use cases were primary goals for the project. In many cases, Drupal, and hence Contenta CMS, is only part of the back-end. Most decoupled projects require a nodejs back-end proxy to sit between the various front-end consumers and Drupal. That is why we started working on a nodejs starter kit for your decoupled Drupal projects. We call this Contenta JS.

Until now, each agency had their own nodejs back-end template that they used and evolved in every project. There has not been much collaboration in this space. Contenta JS is meant to bring consistency and collaboration—a set of common practices so agencies can focus on creating the best software possible with nodejs, just like we do with Drupal. Through this collaboration, we will be able to get features that we need in every project, for free. Today Contenta JS already comes with many of these features:

  • Automatic integration with the API exposed by your Contenta CMS install. Just provide the URL of the site and everything is taken care of for you.
    • JSON API integration.
    • JSON RPC integration.
    • Subrequests integration.
    • Open API integration.
  • Multi-threaded nodejs server that takes advantage of all the cores of the server’s CPU.
  • A Subrequests server for request aggregation. Learn more about subrequests.
  • A Redis integration via the optional @contentacms/redis.
  • Type safe development environment using Flow.
  • Configurable CORS.
Diagram of Contenta JSDiagram of Contenta JS

Watch the introduction video for Contenta JS (6 minutes).

Videos require iframe browser support.

Combining the community’s efforts, we can come up with new modules that do things like React server-side rendering with one command, or a Drupal API customizer, or aggregate multiple services in a pluggable way, etc.

Join the #contenta Slack channel if this is something you are passionate about and want to collaborate on it. You can also create an issue (or a PR!) in the GitHub project. Together, we can make a holistic decoupled Drupal backend from start to end.

Originally published at humanbits.es on July 16, 2018.

Jul 12 2018
Jul 12

It may sound surprising to hear about brand consistency from a back-end developer. This is traditionally a topic for UX and marketing experts. Nevertheless, brand consistency is a powerful trend that’s affecting how we architect content APIs.

One of the ways I contribute to the Drupal API-First Initiative, aside from all the decoupled modules, is by providing my point of view from the implementation side. Some would call that real world™ experience with client projects. This means that I need to maintain a pragmatic point of view to make sure that we can do with Drupal what clients need from us. While being vigilant on the trends affecting our industry, I have discovered that there is a strong tendency for digital projects to aim for brand consistency. How does that impact implementation?

What I mean by brand consistency

When I talk about brand consistency, I only refer to a small part of it. Picture, for a moment, the home screen of Netflix on your TV. Now picture Netflix on your browser and on the app for your phone. They all look the same, don’t they? This is intentional.

The first time I installed Netflix on my wife’s iPad I immediately knew how to use the app. It took me about a second to learn how to use a complex and powerful application on a device that was foreign to me. I am an Android person but I was able to transition from using Netflix on my phone while on the bus to my wife's iPad and from there to the living room TV. I didn’t even realize that I was doing it. Everything was seamless because all the different devices running Netflix had a consistent design and user experience.

If you are interested in the concept of brand consistency and its benefits you can learn more from actual experts on the subject. I will focus on the implications for API design.

It changes the approach to decoupled projects

For the last few years, I have been speaking at events and writing about the imperious necessity for your back end to be presentation agnostic. Consumers can have radically different data needs. You don’t want your back end to favor a particular consumer because that will lead to re-coupling, which leads to high maintenance costs for the consumers that you turned your back on.

When the UX and designs are consistent across consumers, then the statement ‘the consumers can have radically different data needs’ may no longer apply. If they really are consistent, why would the data they need be radically different? You cannot be consistent and radically different at the same time.

Many constraints, API design tips, and recommendations are based on the assumption of presentation agnosticism. While this holds true for most projects, a significant number of projects have started to require consistency across consumers. So the question is: if we no longer need to be presentation agnostic in our API design, what can we optimize given that we have a single known presentation? We made many compromises. What did we give up, and how do we get it back?

How I approached the problem

The first time that I encountered this need for unified UX across all consumers in a client project my inherent pragmatism was triggered. My brain was flooded with potential optimizations. Together with the rest of the client team, I took a breath and started analyzing this new problem space. On this occasion, the client had suggested the BFF pattern from the start. Instead of having a general-purpose API back end to serve all of your downstream consumers, you have one back end per user experience. Hence the moniker ‘Backend for Frontend’ or BFF. This was a great suggestion that we carefully analyzed and soon embraced.

What is a BFF?

Think of a BFF as a server-side service that takes care of the orchestration and processing of the different interactions with the API (or even multiple APIs or microservices) on behalf of the consumers. In short, it does what each consumer would do against your presentation agnostic API, and consolidates it on the server for presentation. The BFF produces a render-ready JSON object.

In other words, we will build a consumer in the back end, but instead of outputting HTML, CSS, and JavaScript (using the web consumer as an example) we will output a JSON document.

BFF output example

You can see in the code above that the shape of the JSON response is heavily influenced by the single design and the components in the frontend. This implies some rigidness on front-end differences, but we agreed that’s OK for our case. For your completely different design, the JSON output would look completely different.

How we implemented BFFs

After requirements are settled, we decide that we will have a single Backend For Frontend that will power all the consumer applications. Instead of having one BFF for each consumer, as Netflix used to do it, we will only have one. The reason is that with one we ensure brand consistency. Also, as Lee Byron puts it:

The concern of duplicating logic across different BFFs is more than just maintaining two repositories of similar code rather than one. The concern is the endless fight against accidental divergence.

Additionally, we don’t have those requirements, but the BFF is also the best place to add global restrictions like authentication, request filters, rate limits, etc.

Our team decided to implement this as a set of rigid endpoints in a Serverless [LINK] application written in NodeJS. As you can imagine, you can implement this pattern with the tools and the stack you prefer. Since this will be so specific to your project’s designs you will likely need to start from scratch.

How consumers deal with BFFs

We create this consumer in the backend in order to simplify all the possible front ends. We move the complexity of building a consumer into a central service that can be reused by all the consumers. That way we can call the consumers, dumb clients. This is because the consumers no longer need to craft complex queries (JSON API, GraphQL, or whatever else); they don’t need to aggregate 3rd party services; and they don’t need to normalize the data from the different APIs, etc. In fact, all the data is ready to render.

In our particular case, we have been able to reduce the consumers to renderers. A consumer only needs to:

  1. Process an incoming request and then determine what screen to grab from the BFF. Additionally, extract any parameters from the request, like the entity ID. In addition to that any global parameters, like the user ID from the device, are added to the parameter bag.
  2. With the name of the screen and the extracted parameters the consumer makes a single HTTP request to the BFF.
  3. The BFF responds with all the data needed for rendering in a shape ready for rendering. The consumer takes that and renders all the components.
  4. The consumer finally adds all the business logic that is exclusive of the front end on top of the rendered output. This includes ads, analytics, etc.

Pros and cons

The pros of this approach are stated throughout the document, but to summarize they are:

  • Massive simplification of the consumers. Those complex interactions with the API are in a central place, instead of having each consumer team write them, again and again, in their native language.
  • Code reuse across consumers. Bug-fixes, changing requirements, improvements, and documentation efforts apply to all consumers since much of the logic lies in the BFF now.
  • Increased performance. The backend can be optimized in numerous ways since it does not need to enable every possible design. This can mean denormalized documents in Elastic Search with the pre-computed responses, increased cache hit ratios in calls to APIs now that we control how those are made, faster server-to-server communications for 3rd party API aggregation, etc.
  • Frontend flexibility. We can ship new features faster when front ends are dumb clients and just render the BFF output. Unless we need to render new components or change the way something is rendered there are few reasons to require an app update. Bear in mind that some platforms don’t support automatic updates, and when they do not all users have them turned on. With this re-coupled pattern, we can ship new features to old consumers.

On the other hand, there are some cons:

  • Requires a dedicated back-end team. You cannot just install an API generator, like Contenta CMS, that is configured in the UI and serves a flexible JSON API with zero configuration. Now you need a dedicated backend team to build your BFF. However, chances are that your project already has a dedicated back-end team.
  • Brings back the bikeshedding. In DrupalCon Baltimore, I talked about how the JSON API module stops the bikeshedding. In this new paradigm, we are back to discussing things like the shape of the response, the names in it, how to expose these responses, etc.
  • It requires cross-consumer collaboration. This is because you want to design a BFF that works well for all current consumers and future ones. Collaboration across different teams can be a challenge depending on the organization.

To summarize

An organization that can make the compromise of a consistent design across consumers can simplify their omni-channel strategy. One way to do that is to move the complexity from several consumers to a single one, that lives in the back end.

Some organizations have used the BFF pattern successfully to achieve these goals in the past. Using this pattern, the different consumers can be simplified to dumb clients, leaving the business logic to the BFF. That, in turn, will allow for better performance, less code to maintain, and smaller time to market for new features.

Photo by Andrew Ridley on Unsplash

May 30 2018
May 30

Note: This article was originally published on November 29, 2017. Following DrupalCon Nashville, we are republishing (with updates) some of our key articles on decoupled or "headless" Drupal as the community as a whole continues to explore this approach further. Comments from the original will appear unmodified.

As part of the Decoupled Hard Problems series, in this fourth article, I'll discuss some of the challenges surrounding routing, custom paths and URL aliases in decoupled projects. 

Decoupled Routing

It's a Wednesday afternoon, and I'm using the time that Lullabot gives me for professional development to contribute to Contenta CMS. Someone asks me a question about routing for a React application with a decoupled Drupal back-end, so I decide to share it with the rest of the Contenta Slack community and a lengthy conversation ensues. I realize the many tendrils that begin when we separate our routes and paths from a more traditional Drupal setup, especially if we need to think about routing across multiple different consumers. 

It's tempting to think about decoupled Drupal as a back-end plus a JS front-end application. In other words, a website. That is a common use case, probably the most common. Indeed, if we can restrict our decoupled architecture to a single consumer, we can move as many features as we want to the server side. Fantastic, now the editors who use the CMS have many routing tools at their disposal. They can, for instance, configure the URL alias for a given node. URL aliases allow content editors to specify the route of a web page that displays a piece of content. As Drupal developers, we tend to make no distinction between such pieces of content and the web page that Drupal automatically generates for it. That's because Drupal hides the complexity involved in making reasonable assumptions:

  •  It assumes that we need a web page for each node. Each of those has a route node/<nid> and they can have a custom route (aka URL alias).
  •  It means that it is okay to add presentation information in the content model. This makes it easy to tell the Twig template how to display the content (like field_position = 'top-left') in order to render it as the editor intended.

Unfortunately, when we are building a decoupled back-end, we cannot assume that our pieces of content will be displayed on a web page, even if our initial project is a website. That is because when we eventually need a second consumer, we will need to make amends all over the project to undo those assumptions before adding the new consumer.

Understand the hidden costs of decoupling in full. If those costs are acceptable—because we will take advantage of other aspects of decoupling—then a rigorous separation of concerns that assigns all the presentation logic to the front-end will pay off. It takes more time to implement, but it will be worth it when the time comes to add new consumers. While it may save time to use the server side to deal with routing on the assumption that our consumer will be a single website,  as soon as a new consumer gets added those savings turn into losses. And, after all, if there is only a website, we should strongly consider a monolithic Drupal site.

The cost of re-couplingThe cost of re-coupling

After working with Drupal or other modern CMSes, it's easy to assume that content editors can just input what they need for SEO purposes and all the front-ends will follow. But let's take a step back to think about routes:

  • Routes are critical only for website clients. Native applications can also benefit from them, but they can function with just the resource IDs on the API.
  • Routes are important for deep linking in web and native applications. When we use a web search engine in our phone and click a link, we expect the native app to open on that particular content if we have it installed. That is done by mapping the web URL to the app link.
  • Links are a great way to share content. We want users to share links, and then let the appropriate app on the recipient's mobile device open if they have it installed.

It seems clear that even non-browser-centric applications care about the routes of our consumers. Luckily, Drupal considers the URL alias to be part of the content, so it's available to the consumers. But our consumers' routing needs may vary significantly.

Routing From a Web Consumer

Let's imagine that a request to http://cms.contentacms.io/recipes/4-hour-lamb-stew hits our React application. The routing component will know that it needs to use the recipes resource and find the node that has a URL alias of /4-hour-lamb-stew. Contenta can handle this request with JSON API and Fieldable Path—both part of the distribution. With the response to that query, the React app builds all the components and displays the results to the user.

It is important to note the two implicit assumptions in this scenario. The first is that the inbound URL can be tokenized to extract the resource to query. In our case, the URL tells us that we want to query the /api/recipes resource to find a single item that has a particular URL alias. We know that because the URL in the React side contains /recipes/... What happens if the SEO team decides that the content should be under https://cms.contentacms.io/4-hour-lamb-stew? How will React know that it needs to query the /api/recipes resource and not /api/articles?

The second assumption is that there is a web page that represents a node. When we have a decoupled architecture, we cannot guarantee a one-to-one mapping between nodes and pages. Though it's common to have the content model aligned with the routes, let's explore an example where that's not the case. Suppose we have a seasonal page in our food magazine for the summer season (accessible under /summer). It consists of two recipes, and an article, and a manually selected hero image. We can build that easily in our React application by querying and rendering the content. However, everything—except for the data in the nodes and images—lives in the react application. Where does the editor go to change the route for that page?

On top of that, SEO will want it so that when a URL alias changes (either editorially or in the front-end code) a redirect occurs, so people using the old URL can still access the content. Note that a change in the node title could trigger a change in the URL alias via Pathauto. That is a problem even in the "easy" situation. If the alias changes to https://cms.contentacms.io/recipes/four-hour-stewed-lamb, we need our React application to still respond to the old https://cms.contentacms.io/recipes/4-hour-lamb-stew. The old link may have been shared in social networks, linked to from other sites, etc. The problem is that there is no recipe with an alias of /recipes/4-hour-lamb-stew anymore, so the Fieldable Path solution will not cover all cases.

Possible Solutions

In monolithic Drupal, we'd solve the aforementioned SEO issue by using the Redirect module, which keeps track of old path aliases and can respond to them with a redirect to the new one. In decoupled Drupal, we can use that same module along with the new Decoupled Router module (created as part of the research for this article).

The Contenta CMS distribution already includes the Decoupled Router module for routing as we recommend this pattern for decoupled routing.

Pages—or visualizations—that comprise a disconnected selection of entities—our /summer page example—are hard to manage from the back-end. A possible solution could be to use JSON API to query the entities generated by Page Manager. Another possible solution would be to create a content type, with its corresponding resource, specific for that presentation in that particular consumer. Depending on how specific that content type is for the consumer, that will take us to the Back-end For Front-end pattern, which incurs other considerations and maintenance costs.

For the case where multiple consumers claim the same route but have that route resolve to different nodes, we can try the Contextual Aliases module.

The Decoupled Router

Decoupled Router is an endpoint that receives a front-end path and tries to resolve it to an entity. To do so it follows as many redirects and URL aliases as necessary. In the example of /recipes/four-hour-stewed-lamb it would follow the redirect down to /recipes/4-hour-lamb-stew and resolve that URL alias to node:1234. The endpoint provides some interesting information about the route and the underlying entity.

Using the decoulped routerA request for /recipes/four-hour-stewed-lamb resolves into the JSON API resource resolving one redirection and one alias

In a previous post, we discussed how multiple requests degrade performance significantly. With that in mind, making an extra request to resolve the redirects and aliases seems less attractive. We can solve this problem using the Subrequests module. Like we discussed in detail, we can use response tokens to combine several requests in one.

Imagine that we want to resolve /bread and display the title and image. However, we don’t know if /bread will resolve into an article or a recipe. We could use Subrequests to resolve the path and the JSON API entity in a single request.

Subrequests using the decoupled routerWe use the decoupled router to resolve the JSON API URL and then we request that using the {{[email protected]$.jsonapi.individual}} token

In the request above, we provide the path we want to resolve. Then we get the following response.

Response to subrequests using the decoupled routerResponse to the subrequests blueprint that resolves the path and gets the JSON API data (with sparse fieldsets and includes) in a single request

To summarize, we can use Decoupled Router in combination with Subrequests to resolve multiple levels of redirects and URL aliases and get the JSON API data all in a single request. This solution is generic enough that it serves in almost all cases.

Conclusion

Routing in decoupled applications becomes challenging because of three factors:

  • Instead of one route, we have to think about (at least) two, one for the front-end and one for the back-end. We can mitigate this by keeping them both in sync.
  • Multiple consumers may decide different routing patterns. This can be mitigated by reaching an agreement among consumers. Another alternative is to use Contextual Aliases along with Consumers. When we want back-end changes that only affect a particular consumer, we can use the Consumers module to make that dependency explicit. See the Consumer Image Styles module—explained in a previous article—for an example of how to do this.
  • Some visualizations in some of the consumers don’t have a one-to-one correspondence with an entity in the data model. This is solved by introducing dedicated content types for those visualizations. That implies that we have access to both back-end and front-end. A custom resource based on Page Manager could work as well.

In general, whenever we need editorial control we'll have to turn to the back-end CMS. Unfortunately, the back-end affects all consumers, not just one. That may or may not be acceptable, depending on each project. We will need to make sure to consider this when thinking through paths and aliases on our next decoupled Drupal project.

Lucky for us, every project has constraints we can leverage. That is true even when working on the most challenging back-end of all—a public API that powers an unknown number of 3rd-party consumers. For the problem of routing, we can leverage these constraints to use the mitigations listed above.

Hopefully, this article will give you some solutions for your Decoupled Drupal Hard Problems.

Photo by William Bout on Unsplash.

May 23 2018
May 23

Note: This article was originally published on November 3, 2017. Following DrupalCon Nashville, we are republishing (with updates) some of our key articles on decoupled or "headless" Drupal as the community as a whole continues to explore this approach further. Comments from the original will appear unmodified.

The Schemata module is our best approach so far in order to provide schemas for our API resources. Unfortunately, this solution is often not good enough. That is because the serialization component in Drupal is so flexible that we can’t anticipate the final form our API responses will take, meaning the schema that our consumers depend on might be inaccurate. How can we improve this situation?

This article is part of the Decoupled hard problems series. In past articles, we talked about request aggregation solutions for performance reasons, and how to leverage image styles in decoupled architectures.

TL;DR

  • Schemas are key for an API's self-generated documentation
  • Schemas are key for the maintainability of the consumer’s data model.
  • Schemas are generated from Typed Data definitions using the Schemata module. They are expressed in the JSON Schema format.
  • Schemas are statically generated but normalizers are determined at runtime.

Why Do We Need Schemas?

A database schema is a description of the data a particular table can hold. Similarly, an API resource schema is a description of the data a particular resource can hold. In other words, a schema describes the shape of a resource and the datatype of each particular property.

Consumers of data need schemas in order to set their expectations. For instance, the schema tells the consumer that the body property is a JSON object that contains a value that is a string. A schema also tells us that the mail property in the user resource is a string in the e-mail format. This knowledge empowers consumers to add client-side form validation for the mail property. In general, a schema will help consumers to have a prior understanding of the data they will be fetching from the API, and what data objects they can write to the API.

We are using the resource schemas in the Docson and Open API to generate automatic documentation. When we enable JSON API and  Open API you get a fully functional and accurately documented HTTP API for your data model. Whenever we make changes to a content type, that will be reflected in the HTTP API and the documentation automatically. All thanks to the schemas.

A consumer could fetch the schemas for all the resources it needs at compile time or fetch them once and cache them for a long time. With that information, the consumer can generate its models automatically without developer intervention. That means that with a single implementation once, all of our consumers’ models are done forever. Probably, there is a library for our consumer’s framework that does this already.

More interestingly, since our schema comes with type information our schemas can be type safe. That is important to many languages like Swift, Java, TypeScript, Flow, Elm, etc. Moreover, if the model in the consumer is auto-generated from the schema (one model per resource) then minor updates to the resource are automatically reflected in the model. We can start to use the new model properties in Angular, iOS, Android, etc.

In summary, having schemas for our resources is a huge improvement for the developer experience. This is because they provide auto-generated documentation of the API and auto-generated models for the consumer application.

How Are We Generating Schemas In Drupal?

One of Drupal 8's API improvements was the introduction of the Typed Data API. We use this API to declare the data types for a particular content structure. For instance, there is a data type for a Timestamp that extends an Integer. The Entity and Field APIs combine these into more complex structures, like a Node.

JSON API and REST in core can expose entity types as resources out of the box. When these modules expose an entity type they do it based on typed data and field API. Since the process to expose entities is known, we can anticipate schemas for those resources.

In fact, assuming resources are a serialization of field API and typed data is the only thing we can do. The base for JSON API and REST in core is Symfony's serialization component. This component is broken into normalizers, as explained in my previous series. These normalizers transform Drupal's inner data structures into other simpler structures. After this transformation, all knowledge of the data type, or structure is lost. This happens because the normalizer classes do not return the new types and new shapes the typed data has been transformed into. This loss of information is where the big problem lies with the current state of schemas.

The Schemata module provides schemas for JSON API and core REST. It does it by serializing the entity and typed data. It is only able to do this because it knows about the implementation details of these two modules. It knows that the nid property is an integer and it has to be nested under data.attributes in JSON API, but not for core REST. If we were to support another format in Schemata we would need to add an ad-hoc implementation for it.

The big problem is that schemas are static information. That means that they can't change during the execution of the program. However, the serialization process (which transforms the Drupal entities into JSON objects) is a runtime operation. It is possible to write a normalizer that turns the number four into 4 or "four" depending if the date of execution ends in an even minute or not. Even though this example is bizarre, it shows that determining the schema upfront without other considerations can lead to errors. Unfortunately, we can’t assume anything about the data after its serialized.

We can either make normalization less flexible—forcing data types to stay true to the pre-generated schemas—or we can allow the schemas to change during runtime. The second option clearly defeats the purpose of setting expectations, because it would allow a resource to potentially differ from the original data type specified by the schema.

The GraphQL community is opinionated on this and drives the web service from their schema. Thus, they ensure that the web service and schema are always in sync.

How Do We Go Forward From Here

Happily, we are already trying to come up with a better way to normalize our data and infer the schema transformations along the way. Nevertheless, whenever a normalizer is injected by a third party contrib module or because of improved normalizations with backward compatibility the Schemata module cannot anticipate it. Schemata will potentially provide the wrong schema in those scenarios. If we are to base the consumer models on our schemas, then they need to be reliable. At the moment they are reliable in JSON API, but only at the cost of losing flexibility with third-party normalizers.

One of the attempts to support data transformations and the impact they have on the schemas are Field Enhancers in JSON API Extras. They represent simple transformations via plugins. Each plugin defines how the data is transformed, and how the schema is affected. This happens in both directions, when the data goes out and when the consumers write back to the API and the transformation needs to be reversed. Whenever we need a custom transformation for a field, we can write a field enhancer instead of a normalizer. That way schemas will remain correct even if the data change implies a change in the schema.

Field Enhancers in JSON API ExtrasField Enhancers in JSON API Extras are aware of schema changes.

We are very close to being able to validate responses in JSON API against schemas when Schemata is present. It will only happen in development environments (where PHP’s asserts are enabled). Site owners will be able to validate that schemas are correct for their site, with all their custom normalizers. That way, when a site owner builds an API or makes changes they'll be able to validate the normalized resource against the purported schema. If there is any misalignment, a log message will be recorded.

Ideally, we want the certainty that schemas are correct all the time. While the community agrees on the best solution, we have these intermediate measures to have reasonable certainty that your schemas are in sync with your responses.

Join the discussion in the #contenta Slack channel or come to the next API-First Meeting and show your interest there!

Hero photo by Oliver Thomas Klein on Unsplash.

May 16 2018
May 16

Note: This article was originally published on October 25, 2017. Following DrupalCon Nashville, we are republishing (with updates) some of our key articles on decoupled or "headless" Drupal as the community as a whole continues to explore this approach further. Comments from the original will appear unmodified.

As part of the API-First Drupal initiative, and the Contenta CMS community effort, we have come up with a solution for using Drupal image styles in a decoupled setup. Here is an overview of the problems we sought to solve:

  • Image styles are tied to the designs of the consumer, therefore belonging to the front-end. However, there are technical limitations in the front-end that make it impossible to handle them there.
  • Our HTTP API serves an unknown number of consumers, but we don't want to expose all image styles to all consumers for all images. Therefore, consumers need to declare their needs when making API requests.
  • The Consumers and Consumer Image Styles modules can solve these issues, but it requires some configuration from the consumer development team.

Image Styles Are Great

Drupal developers are used to the concept of image styles (aka image derivatives, image cache, resized images, etc.). We use them all the time because they are a way to optimize performance on our Drupal-rendered web pages. At the theme layer, the render system will detect the configuration on the image size and will crop it appropriately if the design requires it. We can do this because the back-end is informed of how the image is presented.

In addition to this, Drupal adds a token to the image style URLs. With that token, the Drupal server is saying I know your design needs this image style, so I approve the use of it. This is needed to avoid a malicious user to fill up our disk by manually requesting all the combinations of images and image styles. With this protection, only the combinations that are in our designs will be possible because Drupal is giving a seal of approval. This is transparent to us so our server is protected without even realizing this was a risk.

The monolithic architecture allows us to have the back-end informed about the design. We can take advantage of that situation to provide advanced features.

The Problem

In a decoupled application your back-end service and your front-end consumer are separated. Your back-end serves your content, and your front-end consumer displays and modifies it. Back-end and front-end live in different stacks and are independent of each other. In fact, you may be running a back-end that exposes a public API without knowing which consumers are using that content or how they are using it.

In this situation, we can see how our back-end doesn't know anything about the front-end(s) design(s). Therefore we cannot take advantage of the situation like we could in the monolithic solution.

The most intuitive solution would be to output all the image styles available when requesting images via JSON API (or REST core). This will only work if we have a small set of consumers of our API and we can know the designs for those. Imagine that our API serves to three, and only three, consumers A, B and C. If we did that, then when requesting an image from consumer A we would output all the variations for all the image styles for all the consumers. If each consumer has 10 - 15 image styles, that means 30 - 45 image styles URLs, where only one will be used.

many images

This situation is not ideal because a malicious user can still generate 45 images in our disk for each image available in our content. Additionally, if we consider adding more consumers to our digital experience we risk making this problem worse. Moreover, we don't want the presentation from one consumer sipping through another consumer. Finally, if we can't know the designs for all our consumers, then this solution is not even on the table because we don't know what image styles we need to add to our back-end.

On top of all these problems regarding the separation of concerns of front-end and back-end, there are several technical limitations to overcome. In the particular case of image styles, if we were to process the raw images in the consumer we would need:

  • An application runner able to do these operations. The browser is capable of this, but other more challenged devices won't.
  • A powerful hardware to compute image manipulations. APIs often serve content to hardware with low resources.
  • A high bandwidth environment. We would need to serve a very high-resolution image every time, even if the consumer will resize it to 100 x 100 pixels.

Given all these, we decided that this task was best suited for a server-side technology.

In order to solve this problem as part of the API-First initiative, we want a generic solution that works even in the worst case scenario. This scenario is an API served by Drupal that serves an unknown number of 3rd party applications over which we don't have any control.

How We Solved It

After some research about how other systems tackle this, we established that we need a way for consumers to declare their presentation dependencies. In particular, we want to provide a way to express the image styles that consumer developers want for their application. The requests issued by an iOS application will carry a token that identifies the consumer where the HTTP request originated. That way the back-end server knows to select the image styles associated with that consumer.

no consumer leaks

For this solution, we developed two different contributed modules: Consumers, and Consumer Image Styles.

The Consumers Project

Imagine for a moment that we are running Facebook's back-end. We defined the data model, we have created a web service to expose the information, and now we are ready to expose that API to the world. The intention is that any developer can join Facebook and register an application. In that application record, the developer does some configuration and tweaks some features so the back-end service can interact optimally with the registered application. As the manager of Facebook's web services, we are not to take special request from any of the possible applications. In fact, we don't even know which applications integrate with our service.

The Consumers module aims to replicate this feature. It is a centralized place where other modules can require information about the consumers. The front-end development teams of each consumer are responsible for providing that information.

This module adds an entity type called Consumer. Other modules can add fields to this entity type with the information they want to gather about the consumer. For instance:

  • The Consumer Image Styles module adds a field that allows consumer developers to list all the image styles their application needs.
  • Other modules could add fields related to authentication, like OAuth 2.0.
  • Other could gather information for analytic purposes.
  • Maybe even configuration to integrate with other 3rd party platforms, etc.

The Consumer Image Styles Project

Internally, the Consumers module takes a request containing the consumer ID and returns the consumer entity. That entity contains the list of image styles needed by that consumer. Using that list of image styles Consumer Image Styles integrates with the JSON API module and adds the URLs for the image after applying those styles. These URLs are added to the response, in the meta section of the file resource. The Consumers project page describes how to provide the consumer ID in your request.

{

   "data": {

       "type": "files",

       "id": "3802d937-d4e9-429a-a524-85993a84c3ed"

       "attributes": { … },

       "relationships": { … },

       "links": { … },

       "meta": {

           "derivatives": {

               "200x200": "https://cms.contentacms.io/sites/default/files/styles/200x200/public/boyFYUN8.png?itok=Pbmn7Tyt",

               "800x600": "https://cms.contentacms.io/sites/default/files/styles/800x600/public/boyFYUN8.png?itok=Pbmn7Tyt"

           }

       }

   }

}

To do that, Consumer Image Styles adds an additional normalizer for the image files. This normalizer adds the meta section with the image style URLs.

Conclusion

We recommend having a strict separation between the back-end and the front-end in a decoupled architecture. However, there are some specific problems, like image styles, where the server needs to have some knowledge about the consumer. In these very few occasions the server should not implement special logic for any particular consumer. Instead, we should have the consumers add their configuration to the server.

The Consumers project will help you provide a unified way for app developers to include this information on the server. Consumer Image Styles and OAuth 2.0 are good examples where that is necessary, and examples of how to implement it.

Further Your Understanding

If you are interested in alternative ways to deal with image derivatives in a decoupled architecture. There are other alternatives that may incur extra costs, but still worth checking: Cloudinary, Akamai Image Converter, and Origami.

Hero Image by Sadman Sakib. Also thanks to Daniel Wehner for his time spent on code and article reviews.

May 01 2018
May 01

At this point, you may have read several DrupalCon retrospectives. You probably know that the best part of DrupalCon is the community aspect. During his keynote, Steve Francia, made sure to highlight how extraordinary the Drupal community is in this regard.

One of the things I, personally, was looking forward to was getting together with the API-First initiative people. I even printed some pink decoupled t-shirts for our joint presentation on the state of the initiative. Wim brought Belgian chocolates!

Mateu, Wim and Gabe in decoupled T-Shirts (with multiple consumers)Mateu, Wim and Gabe in decoupled T-Shirts (with multiple consumers)

I love that at DrupalCon, if you have a topic of interest in an aspect of Drupal, you will find ample opportunity to talk about it with brilliant people. Even if you are coming into DrupalCon without company, you will get a chance to meet others in the sprints, the BoFs, the social events, etc.

During this week, the API-First initiative team discussed an important topic that has been missing from the decoupled Drupal ecosystem: RPC requests. After initial conversations in a BoF, we decided to start a Drupal module to implement the JSON-RPC specification.

The decision log after finishing the BoF at DrupalConThe decision log after finishing the BoF at DrupalCon

Wikipedia defines RPC as follows:

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared network), which is coded as if it were a normal (local) procedure call, without the programmer explicitly coding the details for the remote interaction.

The JSON API module in Drupal is designed to only work with entities because it relies heavily on the Entity Query API and the Entity subsystem. For instance, it would be nearly impossible to keep nested filters that traverse non-entity resources. On the other hand, core’s REST collections based on Views, do not provide pagination, documentation or discoverability. Additionally, in many instances, Views will not have support for what you need to do.

We need RPC in Drupal for decoupled interactions that are not solely predicated on entities. We’re missing a way to execute actions on the Drupal server and expose data that is not based on entities for read and write. For example, we may want to allow an authenticated remote agent to clear caches on a site. I will admit that some interactions would be better represented in a RESTful paradigm, with CRUD actions in a stateless manner on resources that represent Drupal’s internals. However because of Drupal’s idiosyncrasies sometimes we need to use JSON-RPC. At the end of the day, we need to be pragmatic and allow other developers to resolve their needs in a decoupled project. For instance, the JS initiative needs a list of permissions to render the admin UI, and those are stored in code with a special implementation.

Why the current ecosystem was not enough

After the initial debate, we came to the realization that you can do everything you need with the current ecosystem, but it is error-prone. Furthermore, the developer experience leaves much to be desired.

Custom controllers

One of the recommended solutions has been to just create a route and execute a controller that does whatever you need. This solution has the tendency to lead to a collection of unrelated controllers that are completely undocumented and impossible to discover from the front-end consumer perspective. Additionally, there is no validation of the inputs and outputs for this controller, unless you implement said validation from scratch in every controller.

Custom REST resources

Custom REST resources have also been used to expose this missing non-entity data and execute arbitrary actions in Drupal. Custom REST resources don’t get automatic documentation. They are also not discoverable by consumers. On top of that, collection support is rather limited given that you need to build a custom Views integration if it’s not based on an entity. Moreover, the REST module assumes that you are exposing REST resources. Our RPC endpoints may not fit well into REST resources.

Custom GraphQL queries and mutators

GraphQL solves the problem of documentation and discovery, given it covers schemas as a cornerstone of the implementation. Nevertheless, the complexity to do this both in Drupal and on the client side is non-trivial. Most important, bringing in all the might of GraphQL for this simple task seems excessive. This is a good option if you are already using GraphQL to expose your entities.

The JSON-RPC module

Key contributor Gabe Sullice (Acquia OCTO) and I discussed this problem at length and in the open in the #contenta Slack channel. We decided that the best way to approach this problem was to introduce a dedicated and lightweight tool.

The JSON-RPC module will allow you to create type-safe RPC endpoints that are discoverable and automatically documented. All you need to do is to create a JsonRpcMethod.

Each plugin will need to declare:

  • A method name. This will be the plugin ID. For instance: plugins.list to list all the plugins of a given type.
  • The input parameters that the endpoint takes. This is done via annotations in the plugin definition. You need to declare the schema of your parameters, both for validation and documentation.
  • The schema of the response of the endpoint.
  • The PHP code to execute.
  • The required access necessary to execute this call.

This may seem a little verbose, but the benefits clearly surpass the annoyances. What you will get for free by providing this information is:

  • Your API will follow a widely-used standard. That means that your front-end consumers will be able to use JSON-RPC libraries.
  • Your methods are discoverable by consumers.
  • Your input and outputs are clearly documented, and the documentation is kept up to date.
  • The validation ensures that all the input parameters are valid according to your schema. It also ensures that your code responds with the output your documentation promised.
  • The module takes care of several contrived implementation details. Among those are: error handling, bubbling the cacheability metadata, specification compliance, etc.

As you can see, we designed this module to help Drupal sites implement secure, maintainable, understandable and reliable remote procedure calls. This is essential because custom endpoints are often the most insecure and fragile bits of code in a Drupal installation. This module aims to help mitigate that problem.

Usage

The JSON-RPC module ships with a sub-module called JSON-RPC Core. This sub-module exposes some necessary data for the JS modernization initiative. It also executes other common tasks that Drupal core handles. It is the best place to start learning more about how to implement your plugin.

Let's take a look at the plugins.list endpoint.

/**
 * Lists the plugin definitions of a given type.
 *
 * @JsonRpcMethod(
 *   id = "plugins.list",
 *   usage = @Translation("List defined plugins for a given plugin type."),
 *   access = {"administer site configuration"},
 *   params = {
 *     "page" = @JsonRpcParameterDefinition(factory = "\Drupal\jsonrpc\ParameterFactory\PaginationParameterFactory"),
 *     "service" = @JsonRpcParameterDefinition(schema={"type"="string"}),
 *   }
 * )
 */
class Plugins extends JsonRpcMethodBase {

In the code, you will notice the @JsonRpcMethod annotation. That contains important metadata such as the method's name, a list of permissions and the description. The annotation also contains other annotations for the input parameters. Just like you use @Translation, you can use other custom annotations. In this case, each parameter is a @JsonRpcParameterDefinition annotation that takes either a schema or a factory key.

If a parameter uses the schema key, it means that the input is passed as-is to the method. The JSON schema will ensure validation. If a parameter uses the factory key that class will take control of it. One reason to use a factory over a schema is when you need to prepare a parameter. Passing an entity UUID and upcasting it to the fully-loaded entity would be an example. The other reason to choose a factory is to provide a parameter definition that can be reused in several RPC plugins. An example of this is the pagination parameter for lists of results. The class contains a method that exposes the JSON schema, again, for input validation. Additionally, it should have a ::doTransform() method that can process the input into a prepared parameter output.

The rest of the code for the plugin is very simple. There is a method that defines the JSON schema of the output. Note that the other schemas define the shape of the input data, this one refers to the output of the RPC method.

  /**
   * {@inheritdoc}
   */
  public static function outputSchema() {
    // Learn more about JSON-Schema
    return [
      'type' => 'object',
      'patternProperties' => [
        '.{1,}' => [
          'class' => [ 'type' => 'string' ],
          'uri' => [ 'type' => 'string' ],
          'description' => [ 'type' => 'string' ],
          'provider' => [ 'type' => 'string' ],
          'id' => [ 'type' => 'string' ],
        ],
      ],
    ];
  }

Finally, the ::execute() method does the actual work. In this example, it loads the plugins of the type specified in the service parameter.

  /**
   * {@inheritdoc}
   *
   * @throws \Drupal\jsonrpc\Exception\JsonRpcException
   */
  public function execute(ParameterBag $params) {
    // [Code simplified for the sake of the example]
    $paginator = $params->get('page');
    $service = $params->get('service');
    $definitions = $this->container->get($service)->getDefinitions();
    return array_slice($definitions, $paginator['offset'], $paginator['limit']);
  }

Try it!

The following is a hypothetical RPC method for the sake of the example. It triggers a backup process that uploads the backup to a pre-configured FTP server.

Visit JSON-RPC to learn more about the specification and other available options.

To trigger the backup send a POST request to /jsonrpc in your Drupal installation with the following body:

{
    "jsonrpc": "2.0",
    "method": "backup_migrate.backup",
    "params": {
        "subjects": ["database", "files"],
        "destination": "sftp_server_1"
    },
    "id": "trigger-backup"
}

This would return with the following response:

{
    "jsonrpc": "2.0",
    "id": "trigger-backup",
    "result": {
        "status": "success",
        "backedUp": ["database", "files"]
        "uploadedTo": "/…/backups/my-site-20180524.tar.gz"
    }
}

This module is very experimental at the moment. It’s in alpha stage, and some features are still being ironed out. However, it is ready to try. Please report findings in the issue queue; that's a wonderful way to contribute back to the API-First initiative.

Many thanks to Gabe Sullice, co-author of the module, and passionate debater, for tag teaming on this module. I hope that this module will be instrumental in coming improvements to the user experience both in core's admin UI and actual Drupal sites. This module will soon be part of Contenta CMS.

Header photo by Johnson Wang.

Oct 12 2017
Oct 12

In my previous post, Modern Decoupling is More Performant, we discussed how saving HTTP round-trips has a very positive impact on performance. In particular, we demonstrated how the JSON API module could help your application by returning multiple entities in a single request. Doing so eliminates the need for making an individual request per entity. However, this is only possible when fetching entities, not when writing data and only if those entities are related to the entry point (a particular entity or collection).

Sometimes you can solve this problem by writing a custom resource in the back-end every time, but that can lead to many custom resources, which impacts maintainability and is tiresome. If your API is public and you don’t have prior knowledge of what the consumers are going to do with it, it’s not even possible to write these custom endpoints.

The Subrequests module completes that idea by allowing ANY set of requests to be aggregated together. It can aggregate them even when one of them depends on a previous response. The module works with any request, it's not limited to REST or any other constraint. For simplicity, all the examples here will make requests to JSON API.

Why Do We Need It?

The main concept of the Subrequests module is that instead of sending multiple requests to your Drupal instance we will only send a single request. In this master request, we will provide the information about the requests we need to make in a JSON document. We call this document blueprint.

A blueprint is a JSON document containing the instructions for Drupal to make all those requests in our name. The blueprint document contains a list of subrequest objects. Each subrequest object contains the information about a single request being aggregated in the blueprint.

Imagine that our consumer application has a decoupled editorial interface. This editorial interface contains a form to create an article. As part of the editorial experience, we want the form to create the article and a set of tags in the Drupal back-end.

Without using Subrequests, the consumer application should execute the following requests when the form is submitted:

  • Query Drupal to find the UUID for the tags vocabulary.
  • Query Drupal to find the UUID of the user, based on the username present in the editorial app.
  • Create the first tag in the form using the vocabulary UUID.
  • Create the second tag in the form using the vocabulary UUID.
  • Create the article in the form using the user UUID and the newly created tags.

We can query for the user and the vocabulary in parallel. Once that is done, and using the information in the vocabulary response, we can create the tag entities. Once those are created, we can finally create the article. In total, we would be making five requests at three sequential levels. And, this is not even a complex example!

Sequential requestsSequential requests introduce a performance penalty.

A JavaScript pseudo-code for the form submission handler could look like:

console.log('Article creation started…');
Promise.all([
  httpRequest('GET', 'https://cms.contentacms.io/api/vocabularies?filter[vid-filter][condition][path]=vid&filter[vid-filter][condition][value]=tags'),
  httpRequest('GET', 'https://cms.contentacms.io/api/users?filter[admin][condition][path]=name&filter[admin][condition][value]=admin'),
])
  .then(res => {
    const [vocab, user] = res;
    return Promise.all([
      Promise.resolve(user),
      httpRequest('POST', 'https://cms.contentacms.io/api/tags', bodyForTag1, headers),
      httpRequest('POST', 'https://cms.contentacms.io/api/tags', bodyForTag2, headers),
    ])
  })
  .then(res => {
    const [user, tag1, tag2] = res;
    const body = buildBodyForArticle(formData, user, tag1, tag2);
    return httpRequest('POST', 'https://cms.contentacms.io/api/articles', body, headers);
  })
  .then(() => {
    console.log('Article creation finished!');
  });

Using Subrequests

Our goal is to have JavaScript pseudo-code that looks like:

console.log('Article creation started…');
const blueprint = buildBlueprint(formData);
httpRequest('POST', 'https://cms.contentacms.io/api/subrequests?_format=json', blueprint, headers)
  .then(() => {
    console.log('Article creation finished!');
  });

We've reduced our application code to a single POST request that contains a blueprint in the request body. We have reduced the problem to the blueprint creation. That is a big improvement in the developer experience of consumer applications.

Sequential requests as processed by SubrequestsSequential requests as processed by Subrequests avoid unnecessary round trips.

Parallel Requests

In our current task we need to perform two initial HTTP requests that can be run in parallel:

  • Query Drupal to find the UUID for the tags vocabulary.
  • Query Drupal to find the UUID of the user based on the username in the editorial app.

That translates to the following blueprint:

[
  {
    "requestId": "vocabulary",
    "action": "view",
    "uri": "/api/vocabularies?filter[vid-filter][condition][path]=vid&filter[vid-filter][condition][value]=tags",
    "headers": ["Accept": "application/vnd.application+json"]
  },
  {
    "requestId": "user",
    "action": "view",
    "uri": "/api/users?filter[admin][condition][path]=name&filter[admin][condition][value]=admin",
    "headers": ["Accept": "application/vnd.application+json"]
  }
]

For each subrequest, we can observe that we are providing four keys:

  • requestId A string used to identify the subrequest. This is an arbitrary value generated by the consumer application.
  • action Identifies the action being performed. A "view" action will generate a GET request. A "create" action will generate a POST request, etc.
  • uri The URL where the subrequest will be sent .
  • headers An object containing the headers specific for this subrequest.

The response to this blueprint (after adjusting the permissions in Drupal to view users and vocabularies) will return the response to both subrequests:

{
    "vocabulary": {
        "headers": {
            "content-id": ["<vocabulary>"],
            "status": [200]
        },
        "body": "{\"data\":[{\"type\":\"vocabularies\",\"id\":\"47ce8895-0df6-44a4-af43-9ef3b2a924dd\",\"attributes\":{\"status\":true,\"dependencies\":{\"module\":[\"recipes_magazin\"]},\"_core\":\"HJlsFfKP4PFHK1ub6QCSNFmzAnGiBG7tnx53eLK1lnE\",\"name\":\"Tags\",\"vid\":\"tags\",\"description\":\"Use tags to group articles on similar topics into categories.\",\"hierarchy\":0,\"weight\":0},\"links\":{\"self\":\"http:\\/\\/localhost\\/api\\/vocabularies\\/47ce8895-0df6-44a4-af43-9ef3b2a924dd\"}}],\"links\":{\"self\":\"http:\\/\\/localhost\\/api\\/vocabularies?filter%5Bvid-filter%5D%5Bcondition%5D%5Bpath%5D=vid\\u0026filter%5Bvid-filter%5D%5Bcondition%5D%5Bvalue%5D=tags\"}}"
    },
    "user": {
        "headers": {
            "content-id": ["<user>"],
            "status": [200]
        },
        "body": "{\"data\":[{\"type\":\"users\",\"id\":\"a0b7af80-e319-4271-899f-f151d3fbfc8e\",\"attributes\":{\"internalId\":1,\"name\":\"admin\",\"mail\":\"[email protected]\",\"timezone\":\"Europe\\/Madrid\",\"isActive\":true,\"createdAt\":\"2017-09-15T15:47:26+0200\",\"updatedAt\":\"2017-09-15T20:06:15+0200\",\"access\":1505565434,\"lastLogin\":\"2017-09-15T20:06:07+0200\"},\"relationships\":{\"roles\":{\"data\":[]}},\"links\":{\"self\":\"http:\\/\\/localhost\\/api\\/users\\/a0b7af80-e319-4271-899f-f151d3fbfc8e\"}}],\"links\":{\"self\":\"http:\\/\\/localhost\\/api\\/users?filter%5Badmin%5D%5Bcondition%5D%5Bpath%5D=name\\u0026filter%5Badmin%5D%5Bcondition%5D%5Bvalue%5D=admin\"}}"
    }
}

In the (simplified) response above we can see that for each subrequest, we have one key in the response object. That key is the same as our requestId in the blueprint. Each one of the subresponses contains the information about the response headers and the response body. Note how the response body is an escaped JSON object.

This blueprint is not sufficient to create an article with two tags, but it's a great start. Let's build on top of that to create the tags and the article.

Dependent Requests

The next task we need to execute is the creation of the two tag entities:

  • Create the first tag in the form using the vocabulary UUID.
  • Create the second tag in the form using the vocabulary UUID.

To do this, we will need to expand the blueprint. However, we don't know the vocabulary UUID at the time we are writing the blueprint. What we do know is that the vocabulary UUID will be in the subresponse to the vocabulary subrequest. In particular, we can find the UUID in data[0].id.

We will use that information to create a blueprint that can create tags. Since we don't know the actual value of the vocabulary UUID, we will use a replacement token. At some point, during the blueprint processing by Drupal, the token will be resolved to the actual UUID value.

Replacement Tokens

We can use replacement tokens anywhere in the body or the URI of our subrequests. For those to be resolved, a token needs to be formatted in the following way:

{{<requestId>.<"body"|"headers">@<json-path-expression>}}

In particular, the replacement token for our vocabulary UUID will be:

{{[email protected]$.data[0].id}}

What this replacement says is:

  1. Use the subresponse for the vocabulary subrequest.
  2. Take the body from that subresponse.
  3. Extract the string under data[0].id, by executing the JSON Path expression $.data[0].id. You can execute any JSON Path expression as long as it returns a string. JSON Path is a very powerful way to extract data from an arbitrary JSON object, in our case the body in subresponse to the vocabulary subrequest.

This is what our blueprint looks like after adding the subrequests to create the tag entities. Note the presence of the replacement tokens:

[
  {
    "requestId": "vocabulary",
    "action": "view",
    "uri": "/api/vocabularies?filter[vid-filter][condition][path]=vid&filter[vid-filter][condition][value]=tags",
    "headers": {"Accept": "application/vnd.api+json"}
  },
  {
    "requestId": "user",
    "action": "view",
    "uri": "/api/users?filter[admin][condition][path]=name&filter[admin][condition][value]=admin",
    "headers": {"Accept": "application/vnd.api+json"}
  },
  {
    "action": "create",
    "requestId": "tags-1",
    "body": "{\"data\":{\"type\":\"tags\",\"attributes\":{\"name\":\"My First Tag\"},\"relationships\":{\"vocabulary\":{\"data\":{\"type\":\"vocabularies\",\"id\":\"{{[email protected]$.data[0].id}}\"}}}}}",
    "uri": "/api/tags",
    "headers": {"Content-Type": "application/vnd.api+json"},
    "waitFor": ["vocabulary"]
  },
  {
    "action": "create",
    "requestId": "tags-2",
    "body": "{\"data\":{\"type\":\"tags\",\"attributes\":{\"name\":\"My Second Tag\",\"description\":null},\"relationships\":{\"vocabulary\":{\"data\":{\"type\":\"vocabularies\",\"id\":\"{{[email protected]$.data[0].id}}\"}}}}}",
    "uri": "/api/tags",
    "headers": {"Content-Type": "application/vnd.api+json"},
    "waitFor": ["vocabulary"]
  }
]

Note that to use a replacement token in a subrequest, we need to add a dependency on the subresponse that contains the information. That's why we added the waitFor key in our tag subrequests.

Finishing the Blueprint

Subrequests process

Using the same principles that we used for the tags we can add the subrequest for:

  • Create the article in the form using the user UUID and the newly created tags.

That will leave our completed blueprint as:

[
  {
    "requestId": "vocabulary",
    "action": "view",
    "uri": "/api/vocabularies?filter[vid-filter][condition][path]=vid&filter[vid-filter][condition][value]=tags",
    "headers": {"Accept": "application/vnd.api+json"}
  },
  {
    "requestId": "user",
    "action": "view",
    "uri": "/api/users?filter[admin][condition][path]=name&filter[admin][condition][value]=admin",
    "headers": {"Accept": "application/vnd.api+json"}
  },
  {
    "action": "create",
    "requestId": "tags-1",
    "body": "{\"data\":{\"type\":\"tags\",\"attributes\":{\"name\":\"My First Tag\"},\"relationships\":{\"vocabulary\":{\"data\":{\"type\":\"vocabularies\",\"id\":\"{{[email protected]$.data[0].id}}\"}}}}}",
    "uri": "/api/tags",
    "headers": {"Content-Type": "application/vnd.api+json"},
    "waitFor": ["vocabulary"]
  },
  {
    "action": "create",
    "requestId": "tags-2",
    "body": "{\"data\":{\"type\":\"tags\",\"attributes\":{\"name\":\"My Second Tag\",\"description\":null},\"relationships\":{\"vocabulary\":{\"data\":{\"type\":\"vocabularies\",\"id\":\"{{[email protected]$.data[0].id}}\"}}}}}",
    "uri": "/api/tags",
    "headers": {"Content-Type": "application/vnd.api+json"},
    "waitFor": ["vocabulary"]
  },
  {
    "action": "create",
    "requestId": "article",
    "headers": {"Content-Type": "application/vnd.api+json"},
    "body": "{\"data\":{\"type\":\"articles\",\"attributes\":{\"body\":\"Custom value\",\"default_langcode\":\"1\",\"langcode\":\"en\",\"promote\":\"1\",\"status\":\"1\",\"sticky\":\"0\",\"title\":\"Article Created via Subrequests!\"},\"relationships\":{\"tags\":{\"data\":[{\"id\":\"{{[email protected]$.data.id}}\",\"type\":\"tags\"},{\"id\":\"{{[email protected]$.data.id}}\",\"type\":\"tags\"}]},\"type\":{\"data\":{\"id\":\"article\",\"type\":\"contentTypes\"}},\"owner\":{\"data\":{\"id\":\"{{[email protected]$.data[0].id}}\",\"type\":\"users\"}}}}}",
    "uri": "/api/articles",
    "waitFor": ["user", "tags-1", "tags-2"]
  }
]

More Powerful Replacements

Imagine that instead of creating an article for a single user, we wanted to create an article for each one of the users on the site. We cannot write a simple blueprint, like the one above, since we don't know how many users there are in the Drupal site. Hence, we cannot write an article creation subrequest for each user.

To solve this problem we can tweak the user subrequest, so instead of returning a single user it returns all the users in the site:

[
  …
  {
    "requestId": "user",
    "action": "view",
    "uri": "/api/users",
    "headers": {"Accept": "application/vnd.api+json"}
  },
  …
]

Then in our replacement tokens, we can write a JSON Path expression that will return a list of user UUIDs, instead of a single string. Subrequests will accept JSON Path expressions that return either strings or an array of strings for the replacement tokens.

In our article creation subrequest we will need to change {{[email protected]$.data[0].id}} by {{[email protected]$.data[*].id}}. The Subrequests module will create a duplicate of the article subrequest for each replacement item. In our case this will have the effect of having a copy of the article creation subrequest per each available user in the user subresponse.

The Final Response

The modified blueprint that generates one article per user will have a response like:

Six articles are returned from a single subrequestSix articles are returned from a single subrequest.

We can see how a single subrequest can generate n subresponses, and we can use each one of those to generate n other subresponses, etc. This highlights how powerful this technique is. In addition, we have seen that we can combine different type of operations. In our example, we mixed GET and POST in a single blueprint (to get the vocabulary and create the new tags).

Conclusion

Sub requests is a great way to fetch or write many resources in a single HTTP request. This allows us to improve performance significantly while maintaining almost the same flexibility that custom code provides.

Further Your Understanding

If you want to know more about the blueprint format you can read the specification. The Subrequests module comes with a JSON schema that you can use to validate your blueprint. You can find the schema here.

The hero image was downloaded from Frankenphotos and use without modifications with a CC BY 3.0 license.

Mar 01 2017
Mar 01

In my previous article about the serializer component, I touched on the basic concepts involved when serializing an object. To summarize, serialization is the combination of encoding and normalization. Normalizers simplify complex objects, like User or ComplexDataDefinition. Denormalizers perform the reverse operation. Using a structured array of data, they generate complex objects like the ones listed above.

In this article, I will focus on the Drupal integration of the Symfony serializer component. For this, I will guide you step-by-step through a module I created as an example. You can find the module at https://github.com/e0ipso/entity_markdown. I have created a different commit for each step of the process, and this article includes a link to the code in each step at the beginning of each section. However, you can use GitHub UI to browse the code at any time and see the diff.

Browse the module by commit.Browse the module by commit.

When this module is finished, you will be able to transform any content entity into a Markdown representation of it. Rendering a content entity with Markdown might be useful if you wanted to send an email summary of a node, for instance, but the real motivation is to show how serialization can be important outside the context of web services.

Add a new normalizer service

These are the changes for this step. You can browse the state of the code at this step in here.

Symfony’s serializer component begins with a list of normalizer classes. Whenever an object needs to be normalized or serialized the serializer will loop through the available normalizers to find one that declares support for the type of object at hand, in our case a content entity. If you want to add a class to the list of eligible normalizers you need to create a new tagged service.

A tagged service is just a regular class definition that comes with an entry in the mymodule.services.yml so the service container can find it and instantiate it whenever appropriate. For a service to be a tagged as a service you need to add a tags property with a name. You can also add a priority integer to convey precedence with respect to services tagged with the same name. For a normalizer to be recognized by the serialization module, you need to add the tag name normalizer.

When Drupal core compiles the service container, our newly created tagged service will be added to the serializer list in what it’s called a compiler pass. This is the place in Drupal core where that happens. The service container is then cached for performance reasons. That is why you need to clear your caches when you add a new normalizer.

Our normalizer is an empty class at the moment. We will fix that in a moment. First, we need to turn our attention to another collection of services that need to be added to the serializer, the encoders.

Include the encoder for the Markdown format

These are the changes for this step. You can browse the state of the code at this step in here.

Similarly to a normalizer, the encoder is also added to the serialization system via a tagged service. It is crucial that this service implements `EncoderInterface`. Note that at this stage, the encoder does not contain its most important method encode(). However, you can see that it contains supportsEncoding(). When the serializer component needs to encode an structured array, it will test all the encoders available (those tagged services) by executing supportsEncoding() and passing the format specified by the user. In our case, if the user specifies the 'markdown' format. Then, our encoder will be used to transform the structured array into a string, because supportsEncoding() will return TRUE. To do the actual encoding it will use the encode() method. We will write that method later. First, let me describe the normalization process.

Normalize content entities

The normalization will differ each time. It depends on the format you want to turn your objects into, and it depends on the type of objects you want to transform. In our example, we want to turn a content entity into a Markdown document.

For that to happen, the serializer will need to be able to:

  1. Know when to use our normalizer class.
  2. Normalize the content entity.
  3. Normalize any field in the content entity.
  4. Normalize all the properties in every field.

Discover our custom normalizer

These are the changes for this step. You can browse the state of the code at this step in here.

For a normalizer to be considered a good fit for a given object it needs to meet two conditions:

  • Implement the `NormalizerInterface`.
  • Return `TRUE` when calling `supportsNormalization()` with the object to normalize and the format to normalize to.

The process is nearly the same as the one we used to determine which encoder to use. The main difference is that we also pass the object to normalize to the supportsNormalization() method. That is a critical part since it is very common to have multiple normalizers for the same format, depending on the type of object that needs to be normalized. A Node object will have different code that turns it into an structured array when compared to an HttpException. We take that into account in our example by checking if the object being normalized is an instance of a ContentEntityInterface.

Normalize the content entity

These are the changes for this step. You can browse the state of the code at this step in here.

This step contains a first attempt to normalize the content entity that gets passed as an argument to the normalize() method into our normalizer.

Imagine that our requirements are that the resulting markdown document needs to include an introductory section with the entity label, entity type, bundle and language. After that, we need a list with all the field names and the values of their properties. For example, the body field of a node will result in the name field_body and the values for format, summary and value. In addition to that any field can be single or multivalue, so we will take that into consideration.

To fulfill these requirements, I've written a bunch of code that deals with the specific use case of normalizing a content entity into an structured array ready to be encoded into Markdown. I don’t think that the specific code is relevant to explain how normalization works, but I've added code comments to help you follow my logic.

You may have spotted the presence of a helper method called normalizeFieldItemValue() and a comment that says Now transform the field into a string version of it. Those two are big red flags suggesting that our normalizer is doing more than it should, and that it’s implicitly normalizing objects that are not of type ContentEntityInterface but of type FieldItemListInterface and FieldItemInterface. In the next section we will refactor the code in ContentEntityNormalizer to defer that implicit normalization to the serializer.

Recursive normalization

These are the changes for this step. You can browse the state of the code at this step in here.

When the serializer is initialized with the list of normalizers, for each one it checks if they implement SerializerAwareInterface. For the ones that do, the serializer adds a reference to itself into them. That way you can serialize/normalize nested objects during the normalization process. You can see how our ContentEntityNormalizer extends from SerializerAwareNormalizer, which implements the aforementioned interface. The practical impact of that is that we can use $this->serializer->normalize() from within our ContentEntityNormalizer. We will use that to normalize all the field lists in the entity and the field items inside of those.

First turn your focus to the new version of the ContentEntityNormalizer. You can see how the normalizer is divided into parts that are specific to the entity, like the label, the entity type, the bundle, and the language. The normalization for each field item list is now done in a single line: $this->serializer->normalize($field_item_list, $format, $context);.  We have reduced the LOC to almost half, and the cyclomatic complexity of the class even further. This has a great impact on the maintainability of the code.

All this code has now been moved to two different normalizers:

  • FieldItemListNormalizer contains the code that deals with normalizing single and multivalue fields. It uses the serializer to normalize each individual field item.
  • FieldItemNormalizer contains the code that normalizes the individual field items values and their properties/columns.

You can see that for the serializer to be able to recognize our new `FieldListItemNormalizer` and `FieldItemNormalizer` objects we need to add them to the service container, just like we did for the ContentEntityIterface normalizer.

A very nice side effect of this refactor, in addition to the maintainability improvement, is that a third party module can build upon our code more easily. Imagine that this third party module wants to make all field labels bold. Before the refactor they would need to introduce a normalizer for content entities—and play with the service priority so it gets selected before ours. That normalizer would contain a big copy and paste of a big blob of code in order to be able to make the desired tweaks. After the refactor, our third party would only need to have a normalizer for the field item list (which outputs the field label) with more priority than ours. That is a great win for extensibility.

Implement the encoder

As we said above the most important part of the encoder is encapsulated in the `encode()` method. That is the method in charge of turning the structured array from the normalization process into a string. In our particular case we treat each entry of the normalized array as a line in the output, then we append any suffix or prefix that may apply.

Further development

At this point the Entity Markdown module is ready to take any entity and turn it into a Markdown document. The only question is how to execute the serializer. If you want to execute it programmatically you only need do:

\Drupal::service(‘serializer’)->serialize(Node::load(1), ‘markdown’);

However there are other options. You could declare a REST format like the HAL module so you can make an HTTP request to http://example.org/node/1?_format=markdown and get a Markdown representation of the node in response (after configuring the corresponding REST settings).

Conclusion

The serialization system is a powerful tool that allows you to reform an object to suit your needs. The key concepts that you need to understand when creating a custom serialization are:

  • Tagged services for discovery
  • How a normalizer and an encoder get chosen for the task
  • How recursive serialization can improve maintainability and limit complexity.

Once you are aware and familiar with the serializer component you will start noticing use cases for it. Instead of using hard-coded solutions with poor extensibility and maintainability, start leveraging the serialization system.

Feb 15 2017
Feb 15

As part of the API first initiative I have been working a lot with the serialization module. This module is a key member of the web-service-oriented modules present both in core and contrib.

The main focus of the serialization module is to encapsulate Symfony's serialization component. Note that there is no separate deserialization component. This single component is in charge of serializing and deserializing incoming data.

When I started working with this component the first question that I had was "What does serialize mean? And how is it different from deserializing?". In this article I will try to address this question and give a brief introduction on how to use it in Drupal 8.

Serializers encoders and normalizers

Serialization is the process of normalizing and then encoding an input object. Similarly, we refer to deserialization as the process of decoding and then denormalizing an input string. Encoding and decoding are the reverse processes of one another, just like normalizing and denormalizing are.

In simple terms, we want to be able to turn an object of class MyClass into a particular string representation, and then be able to turn that string back into the original object.

An encoder is in charge of converting simple data—a set of scalars, arrays and stdClass objects—into a string. The resulting string is a convenient way to store or transport the original object. A decoder performs the opposite function; it will take that encoded string and transform it into an array that’s ready to use. json_encode and json_decode are good examples of a commonly used (de)encoder. XML is another example of a format to encode to. Note that for an object to be correctly encoded it needs to be normalized first. Consider the following example where we encode and decode an object without any normalization or denormalization.

class MyClass {}
$obj = new MyClass();
var_dump($obj); // Outputs: object(MyClass) (0) {}
var_dump(json_decode(json_encode($obj))); // Outputs: object(stdClass) (0) {}


You can see in the code above that the composition of the two inverse operations is not the same original object of type MyClass. This is because the encoding operation loses information if the input data is not a simple set of scalars, arrays, and stdClass objects. Once that information is lost, the decoder cannot get it back.

Serialization diagram.Serialization diagram from the serializer component documentation page.

One of the reasons why we need normalizers and denormalizers is to make sure that data is correctly simplified before being turned into a string. It also needs to be upcast to a typed object after being parsed from a string. Another reason is that different (de)normalizers allow us to work with different formats of the data. In the REST subsystem we have different normalizers to transform a Node object into the JSON, HAL or JSON API formats. Those are JSON objects with different shapes, but they contain the same information. We also have different denormalizers that will take a simplified JSON, HAL or JSON API payload and turn it into a Node object.

(De)Normalization in Drupal

The normalization of content entities is a very convenient way to express the content in a particular format and shape. So formatted, the data can be exported to other systems, stored as a text-based document, or served via an HTTP request. The denormalization of content entities is a great way to import content into your Drupal site. Normalization and denormalization can also be combined to transform a document from one format to another. Imagine that we want to transform a HAL document into a JSON API document. To do so, you need to denormalize the HAL input into a Node object, and then normalize it into the desired JSON API document.

A good example of the normalization process is the Data Model module. In this case instead of normalizing content entities such as nodes, the module normalizes the Typed Data definitions. The typed data definitions are the internal Drupal objects that define the schemas of the data for things like fields and properties. An integer field will contain a property (the value property) of type IntegerData. The Data Model module will take object definitions and simplify (normalize) them. Then they can be converted to a string following the JSON Schema format to be used in external tools such as beautiful documentation generators. Note how a different serialization could turn this typed data into a Markdown document instead of JSON Schema string.

Adding a new (de)normalizer to the system

In order to add a new normalizer to the system you need to create a new tagged service in custom_module.services.yml.

 serializer.custom_module.my_class_normalizer:
   class: Drupal\custom_module\Normalizer\MyClassNormalizer
   tags:
     - { name: normalizer, priority: 25 }

The class for this service should implement the normalization interface in the Symfony component Symfony\Component\Serializer\Normalizer\NormalizerInterface. This normalizer service will be in charge of declaring which types of objects it knows how to normalize and denormalize—that would be MyClass in our previous example. This way the serialization module uses it when an object of type MyClass needs to be (de)normalized. Since multiple modules may provide a service that supports normalizing MyClass objects, the serialization module will use the priority key in the service definition to resolve the normalizer to be used.

As you would expect, in Drupal you can alter and replace existing normalizers and denormalizers so they provide the output you need. This is very useful when you are trying to alter the output of the JSON API, JSON or HAL web services.

In a next article I will delve deeper into how to create a normalizer and a denormalizer from scratch, by creating an example module that (de)normalizes nodes.

Conclusion

The serialization component in Symfony allows you to deal with the shape of the data. It is of the utmost importance when you have to use Drupal data in an external system that requires the data to be expressed in a certain way. With this component, you can also perform the reverse process and create objects in Drupal that come from a text representation.

In a following article I will show you an introduction on how to actually work with (de)normalizers in Drupal.

Sep 21 2016
Sep 21

Little did I know that I was about to embark on a series of projects that would teach me not only about decoupled Drupal, but also the subtleties of designing proper APIs to maximize performance and minimize roundtrips to the server.

Embedding resources

I was lucky enough that the client that I was working with at the time—The Tonight Show with Jimmy Fallon—decided on a decoupled approach. I was involved in the Drupal HTTP API server implementation. The project went on to win an Emmy Award for Outstanding Interactive Program.

The idea of a content repository that could be accessed from anywhere via HTTP, and leverage all the cool technologies 2014 had to offer, was—if not revolutionary—forward-looking. I was amazed by the possibilities the approach opened. The ability to expose Drupal’s data to an external team that could work in parallel using the front-end technologies that they were proficient with meant work could begin immediately. Nevertheless, there were drawbacks to the approach. For instance, we observed a lot of round trips between the consumer of the data—the client—and the server.

As it turns out, The Tonight Show with Jimmy Fallon was only the first of several decoupled projects that I undertook in rapid succession. As a result, I authored version 2.x of the RESTful module to support the JSON API spec in Drupal 7. One of the strong points of this specification is resource embedding. Embedding resources—also called resource composition—is a technique where the response to a particular entity also contains the contents of the entities it is related to. Embedding resources for relationships is one of the most effective ways to reduce the number of round trips when dealing with REST servers. It’s based on the idea that the consumer requests the relationships that it wants embedded in the response. This same idea is used in many other specifications like GraphQL.

In JSON API, the consumer can interrogate the data with a single query, tracing relationships between objects and returning the desired data in one trip. Imagine searching for a great grandparent with a genealogy system that would only let you find the name of a single family member at a time versus a system that could return every ancestor going back three generations with a single request. To do so, the client appends an include parameter in the URL. For example: 

?include=relationship1,relationship2.nestedRelationship1 

The response will include information about four entities:

  • The entity being requested (the one that contains the relationships).
  • The entity that relationship1 points to. This may be an entity reference field inside of the entity being requested.
  • The entity that relationship2 points to.
  • The entity that nestedRelationship1 points to. This may be an entity reference field inside of the entity that relationship2 is pointing to.

A single request from a consumer can return multiple entities. Note that for the same API different consumers may follow different embedding patterns, depending on the designs being implemented.

The landscape for JSON API and Drupal nowadays seems bright. Dries Buytaert, the product lead of the Drupal project, hopes to include the JSON API module in core. Moreover, there seems to be numerous articles about decoupling techniques for Drupal 8.

But does resource embedding offer a performance edge over multiple round-trip requests? Let’s quantitatively compare the two.

Performance comparison

This performance comparison uses a clean Drupal installation with some automatically generated content. Bear in mind, performance analysis is tightly coupled to the content model and merits case-by-case study. Nevertheless, let’s analyze the response times to test our hypothesis: that resource embedding provides a performance improvement over traditional REST approaches.

Our test case will involve the creation of an article detail page that comes with the Standard Drupal profile. I also included the profile image of a commenter to make things a bit more complex.

The test articleFigure 1: The test article.

In Figure 1, I’ve visually indicated the “levels” of relationships between the article itself and each accompanying chunk of content necessary to compose the “page.” Using traditional REST, a particular consumer would need to make the following requests:

  • Request the given article (node/2410).
  • Once the article response comes back it will need to request, in parallel:
    • The author of the article.
      • The profile image of the author of the article.
    • The image for the article.
    • The first tag for the article.
    • The second tag for the article.
    • The first comment on the article.
      • The author of the first comment on the article.
        • The profile image of the author of the first comment of the article.
    • The second comment of the article.
      • The author of the second comment of the article.

In contrast, using the JSON API module (or any other with resource composition), will only require a single request with the include query parameter set to 

?include=uid,uid.field_image,field_tags,comments,comments.uid,comments.uid.field_image

When the server gets such a request it will load all the requested entities and return them back in a single swoop. Thus, the front-end framework for your decoupled app gets all of its data requirements in a JSON document in a single request instead of many.

For simplicity I will assume that the overall response time of the REST-based approach will be the one with the longest path (four levels deep). Having four parallel requests that happen at the same time will not have a big impact on the final response time. In a more realistic performance analysis, we would take into account that having four parallel calls degrades the overall performance. Even in this handicapped scenario the resource embedding should have a better response time.

Once the request reaches the server, if the response to it is ready in the different caching layers, it takes the same effort to retrieve a big JSON document for the JSON API request than to retrieve a small JSON document for one of the REST requests. That indicates that the big effort is in bootstrapping Drupal to a point where it can serve a cached response. That is true for anonymous and authenticated traffic, via the Page Cache and Dynamic Page Cache core modules.

The graphic above shows the response time for each approach. Both approaches are cached in the page cache, so there is a constant response time to bootstrap Drupal and grab the cache. For this example the response time for every request was ~7 ms every time.

It is obvious that the more complex the interconnections between your data are the greater the advantage of using JSON API's resource embedding. I believe that even though this example is extremely simple, we were able to cut response time by 75%.

If we now introduce latency between the consumer and the server, you can observe that the JSON API response still takes 75% less time. However, the total response time is degraded significantly. In the following chart, I have assumed an optimistic, and constant, transport time of 75 ms.

Jun 22 2016
Jun 22

I have already talked about design patterns in general and the decorator pattern in particular, and today I will tell you about the Template Method pattern. These templates have nothing to do with Drupal’s templates in the theme system.

Imagine that we are implementing a social media platform, and we want to support posting messages to different networks. The algorithm has several common parts for posting, but the authentication and sending of actual data are specific to each social network. This is a very good candidate for the template pattern, so we decide to create an abstract base class, Network, and several specialized subclasses, Facebook, Twitter, …

In the Template Method pattern, the abstract class contains the logic for the algorithm. In this case we have several steps that are easily identifiable:

  1. Authentication. Before we can do any operation in the social network we need to identify the user making the post.
  2. Sending the data. After we have a successful authentication with the social network, we need to be able to send the array of values that the social network will turn into a post.
  3. Storing the proof of reception. When the social network responds to the publication request, we store the results in an entity.

The first two steps of the algorithm are very specific to each network. Facebook and Instagram may have a different authentication scheme. At the same time, Twitter and Google+ will probably have different requirements when sending data. Luckily, storing the proof of reception is going to be generic to all networks. In summary, we will have two abstract methods that will authenticate the request and send the data plus a method that will store the result of the request in an entity. More importantly, we will have the posting method that will do all the orchestration and call all these other methods.

One possible implementation of this (simplified for the sake of the example) could be:

<?php

namespace Drupal\template;

use Drupal\Component\Serialization\Json;


abstract class Network implements NetworkInterface {

  
  protected $entityTypeManager;

  
  public function post(PostInterface $post) {
    
    
    $this->authenticate();
    
    $receipt = $this->sendData($post->getData());
    
    $saved = $this->storeReceipt($receipt);
    return $saved == SAVED_NEW || $saved == SAVED_UPDATED;
  }

  
  abstract protected function authenticate();

  
  abstract protected function sendData(array $values);

  
  protected function storeReceipt($receipt) {
    if ($receipt['status'] > 399) {
      
      throw new NetworkException(sprintf(
        '%s could not process the data. Receipt: %s',
        get_called_class(),
        Json::encode($receipt)
      ));
    }
    return $this->entityTypeManager->getStorage('network_receipts')
      ->create($receipt)
      ->save();
  }

}

The post public method shows how you can structure your posting algorithm in a very readable way, while keeping the extensibility needed to accommodate the differences between different classes. The specialized class will implement the steps (abstract methods) that make it different.

<?php

namespace Drupal\template;


class Facebook extends Network {


  
  protected function authenticate() {
    
  }


  
  protected function sendData(array $values) {
    
  }

}

After implementing the abstract methods, you are done. You have successfully implemented the template method pattern! Now you are ready to start posting to all the social networks.


$message = 'I like the new article about design patterns in the Lullabot blog!';
$post = new Post($message);


$network = new \Drupal\template\Facebook();
$network->post($post);
$network = new \Drupal\template\Twitter();
$network->post($post);

As you can see, this is a behavioral pattern very useful to deal with specialization in a subclass for a generic algorithm.

To summarize, this pattern involves a parent class, the abstract class, and a subclass, called the specialized class. The abstract class implements an algorithm by calling both abstract and non-abstract methods.

  • The non-abstract methods are implemented in the abstract class, and the abstract methods are the specialized steps that are subsequently handled by the subclasses. The main reason why they are declared abstract in the parent class is because the subclass handles the specialization, and the generic parent class knows nothing about how. Another reason is because PHP won’t let you instantiate an abstract class (the parent) or a class with abstract methods (the specialized classes before implementing the methods), thus forcing you to provide an implementation for the missing steps in the algorithm.
  • The design pattern doesn’t define the visibility of these methods, you can declare them public or protected. If you declare these methods public, then you can surface them in an interface to make the base class abstract.

In one typical variation of the template pattern, one or more of the abstract methods are not declared abstract. Instead they are implemented in the base class to provide a sensible default. This is done when there is a shared implementation among several of the specialized classes. This is called a hook method (note that this has nothing to do with Drupal's hooks).

Coming back to our example, we know that most of the Networks use OAuth 2 as their authentication method. Therefore we can turn our abstract authenticate method into an OAuth 2 implementation. All of the classes that use OAuth 2 will not need to worry about authentication since that will be the default. The authenticate method will only be implemented in the specialized subclasses that differ from the common case. When we provide a default implementation for one of the (previously) abstract methods, we call that a hook method.

At this point you may be thinking that this is just OOP or basic subclassing. This is because the template pattern is very common. Quoting Wikipedia's words:

The Template Method pattern occurs frequently, at least in its simplest case, where a method calls only one abstract method, with object oriented languages. If a software writer uses a polymorphic method at all, this design pattern may be a rather natural consequence. This is because a method calling an abstract or polymorphic function is simply the reason for being of the abstract or polymorphic method.

You will find yourself in many situations when writing Drupal 8 applications and modules where the Template Method pattern will be useful. The classic example would be annotated plugins, where you have a base class, and every plugin contains the bit of logic that is specific for it.

I like the Template Method pattern because it forces you to structure your algorithm in a very clean way. At the same time it allows you to compare the subclasses very easily, since the common algorithm is contained in the parent (and abstract) class. All in all it's a good way to have variability and keep common features clean and organized.

Mar 30 2016
Mar 30

A couple of months ago I made a case in favor of unit tests in a series of articles. Today I have good news for you, your plugins are good candidates for testing! Before you get carried away by overexcitement, it's likely that your plugins depend on other parts of the system, and that complicates things a little bit. In these cases, it is a good idea to inject the services that include the dependencies you need. Dependency injection is an alternative to the static \Drupal::service. If you don't inject your services you will have a hard time writing unit tests for your plugin's code.

There are, at least, two widely spread patterns in Drupal 8 in order to inject services into your objects. The first uses a service to inject services. The second uses a factory method that receives Drupal's container. Both of these involve the dependency injection container, although you can still inject services using other manual techniques.

Injection via services

When you declare a new service you can also specify the new service’s dependencies by using the arguments property in the YAML file. You can even extend other services by using the parent property, having the effect of inheriting all the dependencies from that one. There is thorough documentation about writing services on drupal.org. The following example defines a service that receives the entity manager:

services:
  plugin.manager.network.processor:
    class: Drupal\my_modyle\Plugin\MyPluginManager
    arguments: ['@container.namespaces', '@cache.discovery', '@module_handler', '@entity.manager']

You cannot use this pattern directly to inject services to a plugin, since your plugin cannot be a service. This is because services are one-instance classes, a global object of sorts. Plugins are –by their definition– multiple configurable objects of a given class. However, the plugin manager is a great candidate to be a service. If you declare your plugin manager as a service, and inject other services to it, then you are only one step away from injecting those services into the actual plugin instances. To do so you only need to do something similar to:

class MyPluginManager extends DefaultPluginManager {
  protected $entityManager;
  
  …

  public function createInstance($plugin_id, array $configuration = array()) {
    $instance = parent::createInstance($plugin_id, $configuration);
    $instance->setEntityManager($this->entityManager);
    return $instance;
  }

}

As you can see (aside from the lack of docblocks for brevity), every time that your plugin manager creates an instance of a plugin it will set the entity manager for that particular instance. in this scenario you only need to write setEntityManager in your plugin. This strategy is a mix of service injection and setter injection.

Factory injection

The other big injection pattern in Drupal 8 involves adding all your dependencies as arguments in the constructor for your class (also known as the constructor injection pattern). This pattern is very straightforward, you just pass in all the dependencies upon object creation. The problem arises for objects that Drupal instantiates for you, meaning that you don't get to do new MyClass($service1, $service2, ...). How do you pass the services to the constructor then? There is a convention in the Drupal community to use a factory method called create. To implement it you need to write create static method that will receive the dependency injection container as the first parameter. The create method should use the container to extract the services from it, and then call the constructor. This looks like:

class MyOtherClass {
  …
  
  public function __construct($service1, $service2, ...) {
    // Store the services in class properties.
  }

  public static function create(ContainerInterface $container) {
    // new static() means "Call the constructor of the current class".
    // Check PHP’s late static binding.
    return new static(
      $container.get('service1'),
      $container.get('service2'),
      …
    );
  }

}

There is the remaining question of "who calls the create method with the container as the first parameter?". If Drupal is instantiating that object for you, then the system should know about the _create method pattern_ for that particular object.

The default plugin manager (\Drupal\Core\Plugin\DefaultPluginManager) will use the container factory (\Drupal\Core\Plugin\Factory\ContainerFactory) to create your plugin objects. It is very likely that your new plugin manager extends from the default plugin manager –via code scaffolding or manual creation–. That means that by default if you add a create method to your plugin, it will be ignored. In order to have your new plugin manager use the create pattern, the only thing your plugins need is to implement the ContainerFactoryPluginInterface. In that case your plugin will look like:

class MyPluginBase extends PluginBase implements PluginInspectionInterface, ContainerFactoryPluginInterface {
  protected $entityManager;
  
  …
  public function __construct(array $configuration, $plugin_id, $plugin_definition, EntityManagerInterface $entity_manager) {
    parent::__construct($configuration, $plugin_id, $plugin_definition);

    $this->entityManager = $entity_manager;
  }

  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    return new static(
      $configuration,
      $plugin_id,
      $plugin_definition,
      $container->get('entity.manager')
    );
  }

}

The plugin manager will detect that the object that it's about to create implements that interface and will call the create method instead of the basic constructor.

The plugin system is so extensible that you may run into corners where those techniques are not enough. Just know that there are other ways to achieve this. Some of those, for instance, involve using a different factory in the plugin manager, but that is out of the scope of this article.

Sometimes the create factory method pattern can be considered a bit more robust, since it does not involve injecting the services to an object that doesn't need them –the plugin manager– only to pass them down to the plugin. Regardless of the approach that you use, you are now in a good position to start writing unit tests for your plugin class.

Mar 16 2016
Mar 16
Oriented Software.

These generic solutions are not snippets of code, ready to be dropped in your project, nor a library that can be imported and reused. Instead they are a templated solution to the common software challenges in your project. Design patterns can also be seen as best practises when encountering an identified problem.

With Drupal 8’s leap into modern PHP, design patterns are going to be more and more relevant to us. The change from (mostly) procedural code to (a vast majority of) object oriented code is going to take the Drupal community at large through a journey of adaptation to the new programming paradigm. Quoting the aforementioned Design Patterns: Elements of Reusable Object-Oriented Software:

[…] Yet experienced object-oriented designers do make good designs. Meanwhile new designers are overwhelmed by the options available and tend to fall back on non-object-oriented techniques they've used before. It takes a long time for novices to learn what good object-oriented design is all about. Experienced designers evidently know something inexperienced ones don't. What is it?

Even if you don’t know what they are, you have probably been using design patterns by appealing to common sense. When you learn what they are you’ll be thinking “Oh! So that thing that I have been doing for a while is called an adapter!”. Having a label and knowing the correct definition will help you communicate better.

The decorator

Although there are many design patterns you can learn, today I want to focus in one of my favorites: the decorator.

The decorator pattern allows you to do unobtrusive behavior inheritance. We can have many of the benefits of inheritance and polymorphism without creating a new branch in the class’ inheritance tree. That sounds like a mouth full of words, but this concept is especially interesting in the Drupal galaxy.

In Drupal, when you are writing your code, you cannot possibly know what other modules your code will run with. Imagine that you are developing a feature that enhances the entity.manager service, and you decide to swap core’s entity.manager service by your enhanced version. Now when Drupal uses the manager, it will execute your manager, which extends core’s manager and overrides some methods to add your improvements. The problem arises when there is another module that does the same thing. In that situation either you are replacing that module’s spiced entity manager or that module is replacing your improved alternative. There is no way to get both improvements at the same time.

This is the type of situations where the decorator pattern comes handy. You cannot have this new module inheriting from your manager, because you don’t know if all site builders want both modules enabled at the same time. Besides, there could be an unknown number of modules –even ones that are not written yet– that may create the conflict again and again.

Using the decorator

In order to create our decorator we’ll make use of the interface of the object we want to decorate. In this case it is EntityManagerInterface. The other key component is the object that we are decorating, let’s call it the subject. In this case our subject will be the existing object in the entity.manager service.

Take for instance a decorator that does some logging every time the getStorage() method is invoked, for debugging purposes. To create a decorator you need to create a pristine class that implements the interface, and receives the subject.

class DebuggableEntityManager implements EntityManagerInterface {

 protected $subject;

 public function __construct(EntityManagerInterface $subject) {

   $this->subject = $subject;

 }

}

The key concept for a decorator is that we are going to delegate all method calls to the subject, except the ones we want to override.

class DebuggableEntityManager implements EntityManagerInterface {

 protected $subject;

 // ...

 public function getViewModeOptions($entity_type_id) {

   return $this->subject->getViewModeOptions($entity_type_id);

 }

 // …

 public function getStorage($entity_type) {

   // We want to add our custom code here and then call the “parent” method.

   $this->myDebugMethod($entity_type);

   // Now we use the subject to get the actual storage.

   return $this->subject->getStorage($entity_type);

 }

}

As you have probably guessed, the subject can be the default entity manager, that other module’s spiced entity manager, etc. In general you’ll take whatever the entity.manager service holds. You can use any object that implements the EntityManagerInterface.

Another nice feature is that you can decorate an already decorated object. That allows you to have multiple decorators adding different features without changing the inheritance. You can now have a decorator that adds logging to every method the entity manager executes, on top of a decorator that adds extra definitions when calling getFieldDefinitions(), on top of …

I like the coffee making example in the decorator pattern entry in Wikipedia, even if it’s written in Java instead of PHP. It’s a simple example of the use of decorators and it reminds me of delicious coffee.

Benefits and downsides

One of the downsides of using the decorator pattern is that you can only act on public methods. Since you are –intentionally– not extending the class of the decorated object, you don’t have access to any private or protected properties and methods. There are a couple of situations similar to this one where the decorator pattern may not be the best match.

The business logic you want to override is contained in a protected method, and that method is reused in several places. In this case you would end up overriding all the public methods where that protected one is called.

You are overriding a public method that is executed in other public methods. In such scenario you would not want to delegate the execution to the subject, because in that delegation your overridden public method would be missed.

If you don’t find yourself in one of those situations, you’ll discover that the decorator has other additional benefits:

  • It favors the single responsibility principle.
  • It allows you to do the decoration during run-time, whereas subclassing can only be done in compile-time.
  • Since the decorator pattern is one of the commonly known design patterns, you will not have to thoroughly describe your implementation approach during the daily scrum. Instead you can be more precise and just say “I’m going to solve the problem using the decorator pattern. Tada!”.

Write better designs

Design patterns are a great way to solve many complex technical problems. They are a heavily tested and discussed topic with lots of examples and documentation in many programming languages. That does not imply that they are your new golden hammer, but a very solid source of inspiration.

In particular, the decorator pattern allows you to add features to an object at run-time while maintaining the object’s interface, thus making it compatible with the rest of the code without a single change.

Sep 28 2015
Sep 28

In this article, I will explain how you can organize these operations in order to avoid the pitfalls related to them. I created a GitHub repository with the code of every step in this article.

You will see how we can use update hooks, drush commands or Drupal queues to solve this. Depending on the situation you’ll learn to use one or the other.

Scenario

The UX team at the Great Shows TV channel has come up with an idea to improve the user experience on their Drupal website. They are partnering with Nuts For TV, an online database with lots of reviews of TV show episodes, fan art, etc. The idea is that whenever an episode is created –or updated– in the Great Shows website, all the information available will be downloaded from a specific URL stored in Drupal fields. Also, they have gone ahead and manually updated all of the existing episode nodes to add the URL to the new field_third_party_uri field.

Your job as the lead back-end developer at Great Shows is to import the episode information from Nuts For TV. After writing the preceptive hook_entity_presave that will call _bg_process_perform_expensive_task, you end up with hundreds of old episode nodes that need to be processed. Your first approach may be to write an update hook to loop through the episode content and run _bg_process_perform_expensive_task.

The example repo focuses on the strategies to deal with massive operations. All the code samples are written for educational purposes, and not for their direct use.

Time expensive operations are many times expensive in memory resources as well. You want to avoid the update hook to fail because the available memory has been exhausted.

Do not run out of memory

With an update hook you can have the code deployed to every environment and run database updates as part of your deploy process. This is the approach taken in the first step in the example repo. You will take the entities that have the field_third_party_uri attached to them and process them with _bg_process_perform_expensive_task.

/**
 * Update from a remote 3rd party web service.
 */
function bg_process_update_7100() {
 // All of the entities that need to be updated contain the field.
 $field_info = field_info_field(FIELD_URI);
 // $field_info['bundles'] contains information about the entities and bundles
 // that have this particular field attached to them.
 $entity_list = array();  

 // Populate $entity_list
 // Something like:
 // $entity_list = array(
 //   array('entity_type' => 'node', 'entity_id' => 123),
 //   array('entity_type' => 'node', 'entity_id' => 45),
 // );

 // Here is where we process all of the items:
 $succeeded = $errored = 0;
 foreach ($entity_list as $entity_item) {
    $success = _bg_process_perform_expensive_task($entity_item['entity_type'], $entity_item['entity_id']);
    $success ? $succeeded++ : $errored++;
 }
 return t('@succeeded entities were processed correctly. @errored entities failed.', array(
   '@succeeded' => $sandbox['succeeded'],
   '@errored' => $sandbox['errored'],
 ));
}

This is when you realize that the update hook never completes due to memory issues. Even if it completes in your local machine, there is no way to guarantee that it will finish in all of the environments in which it needs to be deployed. You can solve this using batch update hooks. So that's what we are going to do in Step 2.

Running updates in batches

There is no exact way of telling when you will need to perform your updates in batches, but if you answer any of these questions with a yes, then you should do batches:

  • Did the single update run out of memory in your local?
  • Did you wonder if the update was dead when running a single batch?
  • Are you loading/updating more than 20 entities at a time?

While these provide a good rule of thumb, every situation deserves to be evaluated separately.

When using batches, your episodes update hook will transform into:

/**
 * Update from a remote 3rd party web service.
 * 
 * Take all the entities that have FIELD_URI attached to
 * them and perform the expensive operation on them.
 */
function bg_process_update_7100(&$sandbox) {
  // Generate the list of entities to update only once.
  if (empty($sandbox['entity_list'])) {
    // Size of the batch to process.
    $batch_size = 10;
    // All of the entities that need to be updated contain the field.
    $field_info = field_info_field(FIELD_URI);
    // $field_info['bundles'] contains information about the entities and bundles
    // that have this particular field attached to them.
    $entity_list = array();
    foreach ($field_info['bundles'] as $entity_type => $bundles) {
      $query = new \EntityFieldQuery();
      $results = $query
        ->entityCondition('entity_type', $entity_type)
        ->entityCondition('bundle', $bundles, 'IN')
        ->execute();
      if (empty($results[$entity_type])) {
        continue;
      }
      // Add the ids with the entity type to the $entity_list array, that will be
      // processed later.
      $ids = array_keys($results[$entity_type]);
      $entity_list += array_map(function ($id) use ($entity_type) {
        return array(
          'entity_type' => $entity_type,
          'entity_id' => $id,
        );
      }, $ids);
    }
    $sandbox['total'] = count($entity_list);
    $sandbox['entity_list'] = array_chunk($entity_list, $batch_size);
    $sandbox['succeeded'] = $sandbox['errored'] = $sandbox['processed_chunks'] = 0;
  }
  // At this point we have the $sandbox['entity_list'] array populated:
  // $entity_list = array(
  //   array(
  //     array('entity_type' => 'node', 'entity_id' => 123),
  //     array('entity_type' => 'node', 'entity_id' => 45),
  //   ),
  //   array(
  //     array('entity_type' => 'file', 'entity_id' => 98),
  //     array('entity_type' => 'file', 'entity_id' => 640),
  //     array('entity_type' => 'taxonomy_term', 'entity_id' => 74),
  //   ),
  // );

  // Here is where we process all of the items:
  $current_chunk = $sandbox['entity_list'][$sandbox['processed_chunks']];
  foreach ($current_chunk as $entity_item) {
    $success = _bg_process_perform_expensive_task($entity_item['entity_type'], $entity_item['entity_id']);
    $success ? $sandbox['succeeded']++ : $sandbox['errored']++;
  }
  // Increment the number of processed chunks to see if we finished.
  $sandbox['processed_chunks']++;

  // When we have processed all of the chunks $sandbox['#finished'] will be 1.
  // Then the update runner will consider the job finished.
  $sandbox['#finished'] = $sandbox['processed_chunks'] / count($sandbox['entity_list']);

  return t('@succeeded entities were processed correctly. @errored entities failed.', array(
    '@succeeded' => $sandbox['succeeded'],
    '@errored' => $sandbox['errored'],
  ));
}

Note how the $sandbox array will be shared among all the batch iterations. That is how you can detect that this is the first iteration –by doing empty($sandbox['entity_list'])– and how you signal Batch API that the update is done. The $sandbox is also used to keep track of the chunks that have been processed already.

By running your episode updates in batches your next release will be safer, since you will have decreased the chances of memory issues. At this point, you observe that this next release will take two extra hours because of these operations running as part of the deploy process. You decide that you will write a drush command that will take care of updating all your episodes, that will decouple the data import from the deploy process.

Writing a custom drush command

With a custom drush command you can run your operations in every environment, and you can do it at any time and as many times as you need. You have decided to create this drush command so Matt (the release manager at Great Shows) can run it as part of the production release. That way he can create a release plan that is not blocked by a 2 hours update hook.

Drush runs in your terminal, and that means that it will be running under PHP CLI. This allows you to have different configurations to run your drush commands, without affecting your web server. Thus, can set a very high memory limit for PHP CLI to run your expensive operations. Check out Karen Stevenson’s article to test your custom drush commands with different drush versions.

To create a drush command from our original update hook in Step 1 we just need to create the drush file and implement the following methods:

  • hook_drush_command declares the command name and options passed to it.
  • drush_{MODULE}_{COMMANDNAME}. This is the main callback function, the action will happen here.

This results in:

/**
 * Main command callback.
 *
 * @param string $field_name
 *   The name of the field in the entities to process.
 */
function drush_bg_process_background_process($field_name = NULL) {
  if (!$field_name) {
    return;
  }
  // All of the entities that need to be updated contain the field.
  $field_info = field_info_field($field_name);
  $entity_list = array();
  foreach ($field_info['bundles'] as $entity_type => $bundles) {
  // Some of the code has been omitted for brevity’s sake. See the example repo
  // for the complete code.

  // At this point we have the $entity_list array populated.
  // Something like:
  // $entity_list = array(
  //   array('entity_type' => 'node', 'entity_id' => 123),
  //   array('entity_type' => 'file', 'entity_id' => 98),
  // );
  // Here is where we process all of the items:
  $succeeded = $errored = 0;
  foreach ($entity_list as $entity_item) {
    $success = _bg_process_perform_expensive_task($entity_item['entity_type'], $entity_item['entity_id']);
    $success ? $succeeded++ : $errored++;
  }
}

Some of the code above has been omitted for brevity’s sake. Please look at the complete example.

After declaring the drush command there is almost no difference between the update hook in Step 1 and this drush command.

With this code in place, you will have to run drush background-process field_third_party_uri in an environment to be able to QA the updated episodes. Drush also introduces some additional flexibility.

As the dev lead for Great Shows, you know that even though you can configure PHP CLI separately, you still want to run your drush command in batches. That will save some resources and will not rely on the PHP memory_limit setting.

A batch drush command

The transition to a batch drush command is also straightforward. The callback for the command will be responsible for preparing the batches. A new function will be written to deal with every batch, which will be very similar to our old command callback.

Looking at the source code for the batch command you can see how drush_bg_process_background_process is reponsible for getting the array of pairs containing entity types and entity IDs for all of the entities that need to be updated. That array is then chunked, so every batch will only process one of the chunks.

The last step is creating the operations array. Every item in the array will describe what needs to be done for every batch. With the operations array populated we can set some extra properties to the batch, like a callback that runs after all batches, and a progress message.

The drush command to add the extra data to the episodes uses two helper functions in order to have more readable code. _drush_bg_callback_get_entity_list is a helper function that will find all of the episodes that need to be updated, and return the entity type and entity ID pairs. _drush_bg_callback_process_entity_batch will update the episodes in the batch.

It is common to need to run a callback on a list of entities in a batch drush command.  Entity Process Callback is a generic drush command that lets you select the entities to be updated and apply a specified callback function to them. With that you only need to write a function that takes an entity type and an entity object and pass the name of that function to drush epc node _my_custom_callback_function. For our example, all the drush code is simplified to:

/**
 * Helper function that performs an expensive operation for EPC.
 */
function _my_custom_callback_function($entity_type, $entity) {
  list($entity_id,,) = entity_extract_ids($entity_type, $entity);
  _bg_process_perform_expensive_task($entity_type, $entity_id);
}

Running drush batch commands is a very powerful and flexible way of executing your expensive back-end operations. However, it will run all of the operations sequentially in a single run. If that becomes a problem you can leverage Drupal’s built-in queue system.

Drupal Queues

Sometimes you don’t care if your operations are executed immediately, you only need to execute the operations at some point in the near future. In those cases, you may use Drupal queues.

Instead of updating the episodes immediately, there will be an operation per episode waiting in the queue to be executed. Each one of those operations will update an episode. All of the episodes will be updated only when all of the queue items –the episode update operations– have been processed.

You will only need an update hook to insert a queue item to the queue with all the information for the episode to be updated later. First, create the new queue that will hold the episode information. Then, insert the entity type and entity ID in the queue.

At this point you have created a queue and inserted a bunch of entity type and ID pairs, but there is nothing that is processing those items. To fix that you need to implement hook_cron_queue_info so queue elements get processed during cron runs. The 'worker callback' key holds the function that is executed for every queue item. Since we have been inserting an array for the queue item, that is what _bg_process_execute_queue_item –your worker callback– will receive as an argument. All that your worker needs to do is to execute the expensive operation.

There are several ways to process your queue items.

  • Drupal core ships with the aforementioned cron processing. This is the basic method, and the one used by Great Shows for their episode updates.
  • Similar to that, drush comes with drush queue-list and drush queue-run {queue name} to trigger your cron queues manually.
  • Fellow Lullabot Dave Reid wrote Concurrent Queue to process your queue operations in parallel and decrease the execution time.
  • The Advanced Queue module will give you extra niceties for your queues.
  • Another alternative is Queue Runner. This daemon will be monitoring your queue to process the items as soon as possible.

There are probably even more ways to deal with the queue items that are not listed here.

Conclusion

In this article, we started with a very naive update hook to execute all of our expensive operations. Resource limitations made us turn that into a batch update hook. If you need to detach these operations from the release process, you can turn your update hooks into a drush command or a batch drush command. A good alternative to that is to use Drupal’s queue system to prepare your operations and execute them asynchronously in the (near) future.

Some tasks will be better suited for one approach than others. Do you use other strategies when dealing with this? Share them in the comments!

Aug 10 2015
Aug 10

In this article I show you how this can be applied to a real life example.

TL;DR

I encourage you to start testing your code. Here are the most important points of the article:

Dependency Injection and Service Container

Jeremy Miller defines dependency injection as:

[...] In a nutshell, dependency injection just means that a given class or system is no longer responsible for instantiating their own dependencies.

In our MyClass we avoided instantiating CacheController by passing it through the constructor. This is a basic form of dependency injection. Acoording to Martin Fowler:

There are three main styles of dependency injection. The names I'm using for them are Constructor Injection, Setter Injection, and Interface Injection.

As long as you are injecting your dependencies, you will be able to swap those objects out with  their test doubles in your unit tests.

An effective way to pass objects into other objects is by using dependency injection via a service container. The service container will be in charge of giving the receiving class all the needed objects. Then, the receiving object will only need to get the service container. In our System Under Test (SUT), the service container will yield the actual objects, while in the unit test domain it will deliver mocked objects. Using a service container can be a little bit confusing at first, or even daunting, but it makes your API more stable and robust.

Using the service container, our example is changed to:

class MyClass implements MyClassInterface {
  // ...
  public function __construct(ContainerInterface $service_container) {
    $this->cacheController = $service_container->get('cache_controller');
    $this->anotherService = $service_container->get('my_services.another_one');
  }
  // ...
  public function myMethod() {
    $cache = $this->cacheController->cacheGet('cache_key');
    // Here starts the logic we want to test.
    // ...
  }
  // ...
}

Note that if you need to use a new service called 'my_services.another_one', the constructor signature remains unchanged. The services need to be declared separately in the service providers.

Dependency injection and service encapsulation is not only useful for mocking purposes, but also to help you to encapsulate your components –and services–. Borrowing, again, Jeremy Miller’s words:

Making sure that any new code that depends on undesirable legacy code uses Dependency Injection leaves an easier migration path to eliminate the legacy code later with all new code.

If you encapsulate your legacy dependencies you can ultimately write a new version and swap them out. Just like you do for your tests, but with the new implementation.

Just like with almost everything, there are several modules that will help you with these tasks:

  • Registry autoload will help you to structure your object oriented code by giving you autoloading if you follow the PSR-0 or PSR-4 standards.
  • Service container will provide you with a service container, with the added benefit that is very similar to the one that Drupal 8 will ship with.
  • XAutoload will give you both autoloading and a dependency injection container.

With these strategies, you will write code that can have it’s dependencies mocked. In the previous article I showed how to use fake classes or dummies for that. Now I want to show you how you can simplify that by using Mockery.

Mock your objects

Mockery is geared towards providing even more flexibility when creating mocks. Mockery is not tied to any test framework which makes it useful even if you decided to move away from PHPUnit.
In our previous example the test case would be:

// Called from the test case.
$fake_cache_response = (object) array('data' => 1234);
$cache_controller_fake = \Mockery::mock('CacheControllerInterface');
$cache_controller_fake->shouldReceive('cacheGet')->andReturn($fake_cache_response);
$object = new MyClass($cache_controller_fake);
$object->myMethod();

Here, I did not need to write a CacheControllerFake only for our test, I used Mockery instead.
PHPUnit comes with a great mock builder as well. Check its documentation to explore the possibilities. Sometimes you will want to use one or the other depending on how you want to mock your dependency, and the tools both frameworks offer. See the same example using PHPUnit instead of Mockery:

// Called from the test case.
$fake_cache_response = (object) array('data' => 1234);
$cache_controller_fake = $this
  ->getMockBuilder('CacheControllerInterface')
  ->getMock();
$cache_controller_fake->method('cacheGet')->willReturn($fake_cache_response);
$object = new MyClass($cache_controller_fake);
$object->myMethod();

Mocking your dependencies can be hard –but valuable– work. An alternative is to include the real dependency, if it does not break the test runner. The next section explains how to save some time using Drupal Unit Autoload.

Cutting corners

Sometimes writing tests for your code makes you realize that you need to use a class from another Drupal module. The first instinct would be, «no problem, let’s create a mock and inject it in place of the real object». That is a very good approach. However, it can be tedious to create and maintain all these mocks, especially for classes that don’t depend on a bootstrap. That code could just be required in your test case.

Since your unit test can be considered a standalone PHP script that executes some code –and makes some assertions– you could just use the require_once statement. This would include the code that contains the class definitions that your code needs. However, a better way of achieving this is by using Composer’s autoloader. An example composer.json in your tests directory would be:

{
  "require-dev": {
    "phpunit/phpunit": "4.7.*",
    "mockery/mockery": "0.9.*"
  },
  "autoload": {
    "psr-0": {
      "Drupal\\Component\\": "lib/"
    },
    "psr-4": {
      "Drupal\\typed_entity\\": "../src/"
    }
  }
}

With the previous example, your unit test script would know how to load any class in the Drupal\Component and Drupal\typed_entity namespaces. This will save you from writing test doubles for classes that you don’t have to mock.

At this point, you will be tempted to add classes from your module’s dependencies. The big problem is that every drupal module can be installed in a different location, so a simple ../../contrib/modulename will not do. That would only work for your installation, but not for others. This is one of the reasons why I wrote with Christian Lopez (penyaskito) the Drupal Unit Autoload. By adding Drupal Unit Autoload to your composer.json you can add references to Drupal core and other contributed modules. The following example speaks for itself:

{
  "require-dev": {
    "phpunit/phpunit": "4.7.*",
    "mockery/mockery": "0.9.*",
    "mateu-aguilo-bosch/drupal-unit-autoload": "0.1.*"
  },
  "autoload": {
    "psr-0": {
      "Drupal\\Component\\": "lib/",
      "Symfony\\": ["DRUPAL_CONTRIB<service_container>/lib"]
    },
    "psr-4": {
      "Drupal\\typed_entity\\": "../src/",
      "Drupal\\service_container\\": ["DRUPAL_CONTRIB<service_container>/src"]
    },
    "class-location": {
      "\\DrupalCacheInterface": "DRUPAL_ROOT/includes/cache.inc",
      "\\ServiceContainer": "DRUPAL_CONTRIB<service_container>/lib/ServiceContainer.php"
    }
  }
}

We added mateu-aguilo-bosch/drupal-unit-autoload to the testing setup, so we can include Drupal aware autoloading options to our composer.json.

Striving for the green

One of the most interesting possibilities that PHPUnit offers is code coverage. Code coverage is a measure used to describe the degree to which the methods are tested. Having a high coverage reduces the number of bugs in your code greatly. Moreover, adding new code with test coverage will help you ensure that you are not introducing any bugs along with the new code.

PhpStorm coverage integration.PhpStorm coverage integration.

A test harness with coverage information is a valuable tool to include in your CI tool. One way to execute all your PHPUnit cases is by adding a phpunit.xml file describing where the tests are located and other integrations. Running the phpunit command in that folder will execute all the tests.

Another good 3rd party service is coveralls. It will tell your CI tool how the coverage of your code will change with the pull request –or patch– in question; since Coveralls knows about most of the major CI tools almost no configuration is needed. Coveralls also provides a web UI to see what parts of the code are covered and the ones that are not.

Coveralls.io dashboard.Coveralls.io dashboard.

Write tests until you get 100% test coverage, or a satisfactory number. The higher the number the higher the confidence that the code is bug free.

Read the next section to see all these tools in action in contributed Drupal 7 module.

A real life example

I applied the tips of this article to the TypedEntity module. TypedEntity is a nice little module that helps you get your code organized around your Drupal entities and bundles, as first class PHP objects. This module will help you to change your mindset.
Make sure to check the contents of the tests/ directory. In there you will see real life examples of a composer.json and test cases. To run the tests follow these steps:

  1. Download the latest version of the module in your Drupal site.
  2. Navigate to the typed_entity module directory.
  3. Execute tests/run-tests.sh in your terminal. You will need to have composer installed to run them.

This module executes both PHPUnit and Simpletest tests on every commit using Travis CI configured through .travis.yml and phpunit.xml.

Another great example, this one with a very high test coverage, is the Amazon S3 module. A new release of this useful module was created recently by fellow Lullabot Andrew Berry, and he added unit tests!

Do you do things differently in your Drupal 7 modules? Leave your thoughts in the comments!

Aug 05 2015
Aug 05

Large software projects lead to many features being developed over time. Building new features on top of an existing system always risks regressions and bugs. One of the best ways to ensure that you catch those before your code hits production is by adding unit tests.

In this article series I will guide you through adding PHPUnit tests to your Drupal 7 modules. This article explores what unit tests are and why they are useful. And then looks at how to structure your Drupal 7 code in a way that is conducive to writing unit tests later on.

TL;DR

I encourage you to start testing your code. Here are the most important points of the article:

  • Start writing object-oriented code for Drupal 7 today. In your hooks you can just create the appropriate object and call the method for that hook.
  • Fake your methods to remove unmet dependencies.
  • Leverage PHPUnit to ease writing tests.

Unit tests to the rescue

In Drupal 7 core, Simpletest was added so everyone could write tests for their modules. While this has been a great step forward, executing those integration tests is very slow. This is a problem when you want to do Test Driven Development, or if you want to run those tests often as part of your workflow.

A part of Simpletest is a way to write unit tests instead of integration tests. You just need to create your test class by inheriting from DrupalUnitTestCase, but its drawback is that most of Drupal isn’t available. Most Drupal code needs a bootstrap, and it’s very difficult to test your code without a full (slow) Drupal installation being available. Since you don’t have a database available, you can’t call common functions like node_load() or variable_get(). In fact, you should think about your unit tests as standalone scripts that can execute chunks of code. You will see in the next section how PHPUnit can help you to create these testing scripts.

PHPUnit has you covered

In the greater PHP community, one of the leading unit test frameworks is PHPUnit, by Sebastian Bergmann and contributors. This framework is widely used in the community, attracting many integrations and extra features, compared to the aforementioned DrupalUnitTestCase.

Daniel Wehner comments on these integrations saying:

Since http://drupal.org/node/1567500 Drupal 8 started to use PHPUnit as it's unit test framework. One advantage of PHPUnit is that there are tools around which support it already.

Here is a screenshot of PhpStorm where you can see how you can execute your tests from the IDE:

Running PHPUnit tests from PhpStormRunning PHPUnit tests from PhpStorm

PHPUnit is the PHP version of the xUnit testing framework family. Therefore, by learning it, you will be able to leverage that knowledge in other languages. Besides there’s a big documentation and support base for xUnit architectures.

The best part of PHPUnit is that it allows you to write easy-to-read test classes efficiently, and it has many best practices and helper tools –like the test harness XML utility–. PHPUnit also comes with some handy tools to mock your objects and other developer experience improvements to help you write your tests in a clearer and more efficient way.

To use PHPUnit with Drupal 7 you need to write object-oriented code. The next section will show you an example of the dependency problem, and one way to solve it using OOP with the fake object strategy.

A change in your mindset

The hardest part of unit testing your Drupal code is changing your mindset. Many developers are getting ready to use object oriented PHP for Drupal 8, but they keep writing procedural code in their current work. The fact that Drupal 7 core is not as object oriented as it might have been does not imply that custom code you write must also be procedural and untestable.

In order to start unit testing your code, you need to start coding using OOP principles. Only loosely coupled code can be easily tested. Usually this starts by having small classes with clearly defined responsibilities. This way, you can create more complex objects that interact with those small pieces. Done correctly, this allows you to write unit tests for the simple classes and have those simple classes mocked to test the complex ones.

Unit testing is all about testing small and isolated parts of the code. You shouldn’t need to interact with the rest of the codebase or any elements in your system such as the database. Instead, all the code dependencies should be resolved through the use of mock objects, fake classes, dummies, stubs or test doubles.

Mock objects avoid dependencies by getting called instead of the real domain objects. See the Guzzle’s documentation for an example.

Gerard Meszaros writes about test doubles in these terms:

Sometimes it is just plain hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren't available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT. When we are writing a test in which we cannot (or chose not to) use a real depended-on component (DOC), we can replace it with a Test Double. The Test Double doesn't have to behave exactly like the real DOC; it merely has to provide the same API as the real one so that the SUT thinks it is the real one!

In typical Drupal 7 modules, which is our System Under Test (SUT), there are many parts of the code that we want to test that rely on external dependencies –our depended-on component (DOC). Good examples of those dependencies are Drupal core, other contributed modules, or remote web services. The fact that a method calls a Drupal function, such as cache_get(), makes it very difficult for the test runner to execute that code, since that function will not be defined during the test. Even if we manually included includes/cache.inc, the cache_get() function might require other include files or even an active database connection.

Consider the following custom class:

class MyClass implements MyClassInterface {
  // ...
  public function myMethod() {
    $cache = cache_get('cache_key');
    // Here starts the logic we want to test.
    // ...
  }
  // ...
}

When we call myMethod() we will need to have the database ready, because it is calling to cache_get().

// Called from some hook.
$object = new MyClass();
$object->myMethod();

Therefore, myMethod(), or any code that uses it, is not unit testable. To fix this, we wrap cache_get() in a class. The big advantage of this is that we now have a CacheController object that deals with all of our cache needs by interacting with the Drupal API.

class CacheController implements CacheControllerInterface {
 
  /**
   * Wraps calls to cache_get.
   *
   * @param string $cid
   *   The cache ID of the data to retrieve.
   * @param string $bin
   *   The cache bin to store the data in.
   *
   * @return mixed
   *   The cache object or FALSE on failure.
   */
   public function cacheGet($cid, $bin = 'cache') {
     return cache_get($cid, $bin);
   }

}

And the custom class becomes:

class MyClass implements MyClassInterface {
  // ...
  public function __construct(CacheControllerInterface $cache_controller) {
    $this->cacheController = $cache_controller;
  }
  // ...
  public function myMethod() {
    $cache = $this->cacheController->cacheGet('cache_key');
    // Here starts the logic we want to test.
    // ...
  }
  // ...
}

The calling code stays the same.

Our test class will execute myMethod() with a fake cache controller that doesn’t need the bootstrap or the database.

// Called from the PHPUnit test case class.
$cache_controller_fake = new CacheControllerFake();
$object = new MyClass($cache_controller_fake);
$object->myMethod();

What our fake cache controller looks like:

class CacheControllerFake implements CacheControllerInterface {

  /**
   * Cache array that doesn't need the database.
   *
   * @var array
   */
   protected $cache = array();

  /**
   * Wraps calls to cache_get.
   *
   * @param string $cid
   *   The cache ID of the data to retrieve.
   * @param string $bin
   *   The cache bin to store the data in.
   *
   * @return mixed
   *   The cache object or FALSE on failure.
   */
   public function cacheGet($cid, $bin = 'cache') {
     return isset($this->cache[$bin][$cid]) ? $this->cache[$bin][$cid] : NULL;
   }

}

The key is that the test will create a fake object for our CacheController and pass it to the SUT. Remember that you are not testing cache_get() but how the code that depends on it is working.

In this example, we have removed the dependency on includes/cache.inc and the existence of the database to test a method that calls to cache_get(). Using similar techniques you can test all your classes in your module.

The next article of the series will get deeper into the matter by covering:

  • Mocking tools like: PHPUnit mocking objects and the Mockery project.
  • Dependency injection in Drupal 7 to pass your dependencies easily.
  • Drupal Unit Autoload to reduce the number of classes to mock.
  • A real life example that applies all these concepts.

Do you add unit tests to your Drupal 7 modules? Share your experience in the comments!

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web