Category: Zope

Powerful, scalable, cheap, and easy to code.

Alfresco, NOSQL, and the Future of ECM

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

InfoWorld reviews five open source CMS offerings

InfoWorld published a review of Alfresco, DotNetNuke, Plone, Drupal, and Joomla. Heck ranks Alfresco the highest out of the five, which is a good data point for people evaluating these products, but most folks should consider deeply the scenarios they will use the package for when making a decision because each package has a “fitness to purpose” that’s more important to consider than just “fit” alone.

For example, although the article gives a good high-level description of the pro’s and con’s of each package, there’s a more fundamental characteristic of Alfresco that makes comparison to the others an apples-to-oranges exercise. That characteristic is that unlike the others in the list, Alfresco isn’t focused on community-centric functionality. Can you build a community site that is managed by Alfresco and/or uses Alfresco as the back-end repository? Of course. And the new REST framework makes that even easier than it used to be. But you won’t find consumer-facing wiki, blog, or forum functionality out-of-the-box with Alfresco. In fact, you can take your entire web site, as-is, and manage it with Alfresco without any changes to the front-end code. That’s a fundamentally different model than the other packages evaluated.

So you should read the article. But when people ask you to compare Alfresco to Drupal, back them up a bit and instead, figure out the purpose and goal of the site and the business processes needed to manage it (the “how”) and then talk about the open source CMS options.

Open source document management white paper

Optaros has followed up their fairly recent open source WCM white paper with an overview of the leading open source document management solutions called, “Unleashing the Power of Open Source in Document Management”.

The paper puts Alfresco in the “magic quadrant” with the highest enterprise readiness and application capabilities. Plone is given a close second but has a much larger and more active community.

People new to document management but not necessarily considering open source might still want to take a look at this. Almost half of the paper is on general document management.

Optaros summarizes 15 open source content management projects

Seth Gottlieb has published an excellent whitepaper summarizing 15 different open source projects. He also includes a summary of how to go about evaluating open source offerings.

Seth offers a short list of offerings for each of the following usage scenarios:

    Brochure Site
    Online Periodical
    Collaborative Workspace
    Wiki as Collaborative Workspace
    Online Community

The format is similar to the presentation he gave at KM World but the format obviously lets him go into much greater detail.

Alfresco and Plone comparison

Seth Gottlieb has posted a good Alfresco overview with a brief comparison to Plone. Seth concludes,

I would use Alfresco for a targeted document management solution that would fit into a larger enterprise content management architecture…I would use Plone to build an all-in-one intranet or extranet where I wanted to mix article, page, and file content and opportunistically deploy new features to improve collaboration and retention.

I agree with Seth’s conclusion and the point he makes in the article that the lack of LDAP integration in the community edition of Alfresco limits it to departmental use. I’m sure they’ll move that feature down at some point–when they do it should boost adoption.

Alfresco, Plone make EContent 100

EContent Magazine has posted the EContent 100, an annual list of “…companies that matter most in the digital content industry”.

Notable newcomers include open source ECM and Portal platforms Alfresco and Plone. Wiki software provider, SocialText, returns for a second year.

EMC Software (Documentum), a long-time EContent 100 stalwart, having appeared every year since 2001, did not make the 2005 list, but Autonomy, now with a three-peat, did.

“Our goal was to be sure that those who make the list again and again don’t do so out of habit or mindshare, but rather because they continue to innovate and deliver products and services that further the evolution of digital content,” said Michelle Manafy, Editor of EContent magazine.