Mar 02 2018
Mar 02

Drupal is a very popular open source Web Content Management system. One of its key characteristics is that it owns both the back-end repository where content is stored and the front-end where content is rendered. In CMS parlance this is typically called a “coupled” CMS because the front-end and the back-end are coupled together.

Historically, the coupled nature of Drupal was a benefit most of the time because it facilitated a fast time-to-market. In many cases, customers could just install Drupal, define their content types, install or develop a theme, and they had a web site up-and-running that made it easy for non-technical content editors to manage the content of that web site.

But as architectural styles have shifted to “API-first” and Single Page Applications (SPAs) written in client-side frameworks like Angular and React and with many clients finding themselves distributing content to multiple channels beyond web, having a CMS that wants to own the front-end becomes more of a burden than a benefit, hence the rise of the “headless” or “de-coupled” CMS. Multiple SaaS vendors have sprung up over the last few years, creating a Content-as-a-Service market which I’ve blogged about before.

Drupal has been able to expose its content and other operations via a RESTful API for quite a while. But in those early days it was not quite as simple as it could be. If you have a team, for example, that just wants to model some content types, give their editors a nice interface for managing instances of those types, and then write a front-end that fetches that content via JSON, you still had to know a fair amount about Drupal to get everything working.

Last summer, Acquia, a company that provides enterprise support for Drupal headed up by Drupal founder, Dries Buytaert, released a new distribution of Drupal called Reservoir that implements the “headless CMS” use case. Reservoir is Drupal, but most of the pieces that concern the front-end have been removed. Reservoir also ships with a JSON API module that exposes your content in a standard way.

I was curious to see how well this worked so I grabbed the Reservoir Docker image and fired it up.

The first thing I did was create a few content types. Article is a demo type provided out-of-the-box. I added Job Posting and Team Member, two types you’d find on just about any corporate web site.

My Team Member type is simple. It has a Body field, which is HTML text, and a Headshot field, which is an image. My Job Posting type has a plain text Body field, a Date field for when the job was posted, and a Status field which has a constrained list of values (Open and Closed).

With my types in place I started creating content…

Something that jumped out at me here was that there is no way to search, filter, or sort content. That’s not going to work very well as the number of content items grows. I can hear my Drupal friends saying, “There’s a module for that!”, but that seems like something that should be out-of-the-box.

Next, I jumped over to the API tab and saw that there are RESTful endpoints for each of my content types that allow me to fetch a list of nodes of a given type, specific nodes, and the relationships a node has to other nodes in the repository. POST, PATCH, and DELETE methods are also supported, so this is not just a read-only API.

Reservoir uses OAuth to secure the API, so to actually test it out, I grabbed the “Demo app” client UUID, then went into Postman and did a POST against the /oauth/token endpoint. That returned an access token and a refresh token. I grabbed the access token and stuck it in the authorization header for future requests.

Here’s an example response for a specific “team member” object.

My first observation is that the JSON is pretty verbose for such a simple object. If I were to use this today I’d probably write a Spring Boot app that simplifies the API responses further. As a front-end developer, I’d really prefer for the JSON that comes back to be much more succinct. The front-end may not need to know about the node’s revision history, for example.

Another reason I might want my front-end to call a simplified API layer rather than call Drupal directly is to aggregate multiple calls. For example, in the response above, you’ll notice that the team member’s headshot is returned as part of a relationship. You can’t get the URL to the headshot from the Team Member JSON.

If you follow the field_headshot “related” link, you’ll get the JSON object representing the headshot:

?

The related headshot JSON shown above has the actual URL to the headshot image. It’s not the end of the world to have to make two HTTP calls for every team member, but as a front-end developer, I’d prefer to get a team member object that has exactly what I need in a single response.

One of the things that might help improve this is support for GraphQL. Reservoir says it plans to support GraphQL, but in the version that ships on the Docker image, if you try to enable it, you get a message that it is still under development. There is a GraphQL Drupal module so I’m sure this is coming to Reservoir soon.

Many of my clients are predominantly Java shops–they are often reluctant to adopt technology that would require new additions to their toolchain, like PHP. And they don’t always have an interest in hiring or developing Drupal talent. Containers running highly-specialized Drupal distributions, like Reservoir, could eventually make both of these concerns less of an issue.

In addition to Acquia Reservoir, there is another de-coupled Drupal Distribution called Contenta, so if you like the idea of running headless Drupal, you might take a look at both and see which is a better fit.

Jul 07 2010
Jul 07

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web