Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
Jul 07 2010
Jul 07

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

Mar 22 2010
Mar 22

A lot of people have been asking for the files we used to integrate Alfresco CMIS with Drupal Open Atrium (See ecmarchitect.com blog post). I’ve happily mailed those to whomever asked. I’ve had the intention of testing them with the latest version, cleaning them up, and putting somewhere more appropriate like the Open Atrium feature server, or at the very least, Google Code or GitHub. But it hasn’t happened yet so I figured I’d make them available here and appeal to the Community to give them a good home.

The zip includes a readme file with (very) rough install/config directions.

Good luck!

Feb 16 2010
Feb 16

Fellow Optaros colleague, Chris Fuller, and I want to present on the Alfresco-Drupal integration at Drupalcon in San Francisco (April 19-21). If you’re interested in Alfresco, Drupal, and CMIS (any or all of the above), please vote for our session.

Oct 13 2009
Oct 13

UPDATE: Screencast now lives here:

[embedded content]

I recorded a quick screencast of a simple integration we did to show Open Atrium leveraging Alfresco as a formal document repository via CMIS. This leverages the CMIS Alfresco module we developed and released on Drupal.org.

As I point out in the screencast, there’s not much to the integration from a technical standpoint. Open Atrium is Drupal and the CMIS module already has a CMIS repository browser. So, all we had to do was expose the module as a “feature”, which is something Open Atrium uses to bundle modules together that create a given chunk of functionality.

Readers familiar with Alfresco Share will instantly recognize the Open Atrium concepts. Instead of “sites” Atrium uses “groups”. Instead of “pages” or “tools”, Atrium uses “features”. The overall purpose, self-provisioned team-based collaboration, is the same and many of the tools/features are the same (blog, calendar, member directory). I’m not advocating using one over the other–as usual, what works best for you depends on a lot of factors. I just thought Atrium provided a nice way to show yet another example of Drupal and Alfresco together (post).

Apr 28 2009
Apr 28

I’ll be in Chicago tomorrow for the Alfresco Meetup. I’ll be speaking during the Barcamp on Alfresco and Drupal integration with CMIS (module, screencast). I’ll also have the Alfresco-Django integration running on my laptop. I may not have time to show Alfresco-Django during my slot, but I’ll be happy to stick around and do informal demos and talk about either integration if you’re interested because I’d like your feedback on it.

Apr 20 2009
Apr 20

People often need to build a custom user interface on top of the Alfresco repository and I see a lot of people asking general questions about how to do it. There are lots of options to consider. Here are four options for creating a user interface on top of Alfresco, at a high level:

Option 1: Use your favorite programming language and/or framework to talk to Alfresco via REST or Web Services. PHP? Python? Java? Flex? Whatever, it’s up to you. The REST API is nice because if you can’t find a URL that does what you need it to out-of-the-box, you can always roll-your-own with the web script framework. This option offers the most flexibility and creative freedom, but of course you might end up building constructs or components that you may have gotten “for free” from a higher-level framework. Optaros‘ streamlined web client, DoCASU, built on Ext-JS, is one freely-available example of a custom UI on top of Alfresco but there are others.

Option 2: Use Alfresco’s Surf framework. Alfresco’s Surf framework is just that–it’s a framework. Don’t confuse it with Alfresco Share which is a team-centric collaboration client built on top of Surf. And, don’t assume that just because a piece of functionality is in Share it is available to you in the lower-level Surf framework. You may have to do some extra work to get some of the cool stuff in Share to work in your pure Surf app. Also realize that Surf is brand new and still maturing. You’ll be quickly disappointed if you hold it to the same standard as a more widely-used, well-established framework like Seam or Django. Surf is a good option for quick, Alfresco-centric solutions, especially if you think you might want to leverage Alfresco’s browser-based site assembly tool, Web Studio, at some point in the future. (See Do-it-yourself Alfresco Surf Code Camp).

Option 3: Customize the Alfresco “Explorer” web client. There are varying degrees to which you can customize the web client. On one end of the spectrum you’ve got Freemarker “presentation templates” followed closely by XML configuration. On the other end of the spectrum you’ve got more elaborate enhancements you can make using JavaServer Faces (JSF). Customizing the Alfresco Explorer web client should only be considered if you can keep your enhancements to an absolute minimum because:

  1. Alfresco is moving away from JSF in favor of Surf-based clients. The Explorer client will continue to be around, but I wouldn’t expect major efforts to be focused on that client going forward.
  2. JSF-based customizations of the web client can be time-consuming and potentially complex, particularly if you are new to JSF.
  3. For most solutions, you’ll get more customer satisfaction bang out of your coding buck by building a purpose-built, eye-catching, UI designed with your specific use cases in mind than you will by starting with the general-purpose web client and extending from there.

Option 4: Use a portal, community, or WCM platform. This includes PHP-based projects like Drupal (Drupal CMIS Screencast) or Joomla as well as Java-based projects like Liferay and JBoss Portal. This is a good option if you have requirements that match up well with the built-in (or easily added-on) capabilities of those platforms.

It’s worth talking about Java portal servers specifically. I think people are struggling a bit to find The Best Way to integrate Alfresco with a portal. Of course there probably is no single approach that will fit every situation but I think Alfresco (with help from the community) could do more to provide best practices.

Here are the options you have when integrating with a portal:

Portal Option 1: Configure Alfresco to be the replacement JSR-170 repository for the portal. This option seems like more trouble than it is worth. If all you need is what you can get out of JSR-170, you might as well use the already-integrated Jackrabbit repository that most open source portals ship with these days unless you have good reasons not to. I’m open to having my mind changed on this one, but it seems like if you want to use Alfresco and a portal, you’ve got bigger plans that are probably going to require custom portlets anyway.

Portal Option 2: Run Alfresco and the portal in the same JVM (post). This is NOT recommended if you need to scale beyond a small departmental solution and, really, I think with the de-coupling of the web script engine we should consider this one deprecated at this point.

Portal Option 3: Run the Alfresco web script engine and the portal in the same JVM. Like the previous option, this gives you the ability to write web scripts that are wrapped in a portlet but it cuts down on the size of the web app significantly and it frees up your portal to scale independently of the Alfresco repository tier. It’s a fast development cycle once you get it set up. But I haven’t seen great instructions for setting it up yet. Alfresco should document this on their wiki if they are going to support this pattern.

Portal Option 4: Write your own portlets that make services calls. This is the “cleanest” approach because it treats Alfresco like any other back-end you might want to integrate with from the portal. You write custom portlets and have them talk to Alfresco via REST or SOAP. You’ll have to decide how you want to handle authentication with Alfresco.

What about CMIS?

CMIS fits under the “Option 1: Use your favorite programming language” and “Portal Option 4: Write your own portlets” categories. You can make CMIS calls to Alfresco using both REST and SOAP from your own custom code, portlet or otherwise. The nice thing about CMIS is that you can use it to abstract the underlying repository so that (in theory) your front-end code will work with different CMIS-compliant back-ends. Just realize that CMIS isn’t a fully-ratified standard yet and although a CMIS implementation is in the Enterprise version of Alfresco, it isn’t clear to me whether or not you’d be supported if you had a problem. (The last response I saw on this specific question was a Peter Monks tweet saying, “I don’t think so”).

The CMIS standard should be approved by the end-of-the-year and if Alfresco’s past performance is an indicator of the future, they’ll be the first to market with a production-ready, fully-supported CMIS implementation based on the final spec.

Pick your poison

Those are the options as I see them. Each one has trade-offs. Some may become more or less attractive over time as languages, frameworks, and the state of the art evolve. Ultimately, you’re going to have to evaluate which one fits your situation the best. You may have a hard time making a decision, but you have to admit that having to choose from several options is a nice problem to have.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web