Tag: Content Management

Using Elasticsearch to more effectively target dynamic content

Photo credit: viZZZual.com
Photo credit: viZZZual.com

One of my clients came to me with a problem: Despite being a much-admired Fortune 500 company that leads its competitors in the travel industry in customer satisfaction and profitability, their web site, through which the vast majority of their revenues flow, was still mostly static. That by itself is not a huge problem, but they felt like they weren’t able to target content based on their customers’ needs and interests as well as they could with a more dynamic content engine.

It just so happened they were about to re-implement their site from mostly server-side to mostly client-side which is a huge undertaking. They figured that would be a pretty good time to add a dynamic content service to the mix, so they called me.

From Static to Dynamic

The diagram below depicts the high-level setup before the introduction of the content service.

Original ArchitectureThis is pretty standard for sites like this. The Marketing Team edits content in a Content Management System (CMS), which in this case is Interwoven. Through various processes, binary files (mostly images), system data (things like lists of destinations and hotels), and content fragments are published out of Interwoven to destinations accessible by the e-commerce application.

A content fragment is literally a piece of content. It might be a promotion of some sort. Or it could be some text that gets used as part of a banner. The challenge using this setup is that content fragments are static files that live on the file system. If you want to show a different fragment based on something you know about the user you have to generate every permutation you might want ahead-of-time, publish them all, then use logic in the application to decide which one to use.

One obvious way to address this is to publish content fragments in a relational database and then code the front-end app to query for the right content. That wasn’t appropriate here for a few reasons:

  1. The front-end is being migrated to a collection of Single Page Applications (SPA’s) written in JavaScript. It’s easier for those pages to call a RESTful API to get JSON back. Yes, you could still do that with a relational database and a service tier, but the client was looking for something a little more JSON-native.
  2. The structure of the content changes over time. We wanted to be able to accept any kind of content fragment the Marketing Team or SPA developers could think of and not have to worry about migrating database schemas.
  3. The anticipated style of queries needed to find appropriate content fragments was more like what you’d expect from a search engine and less like what you might put in a SQL query–we needed to be able to say, “Here is some context, now return the most appropriate set of content fragments for the situation,” and be able to use relevancy scoring to help determine what comes back.

So relational databases were ruled out in favor of document-oriented NoSQL repositories. Ultimately, Elasticsearch was selected because of its ease of clustering, high performance, unified REST API, availability of commercial support, and add-ons such as Shield, Marvel, and Watcher that make it easier to integrate with the rest of the enterprise.

Introduction of a Content Delivery Service

The first thing we did was stand up an Elasticsearch cluster, load some test data, and beat the heck out of it (see “Using JMeter to Test Elasticsearch“). Once we were satisfied it would be able to handle more than the expected load we moved on to the service.

The Content Delivery Service sits between Elasticsearch and the front-end applications. Its purpose is to abstract away Elasticsearch specifics and to protect the cluster by providing a simple, read-only REST API. It also enforces some light business logic such as making sure that only content that is currently effective according to its publication and expiration date is returned.

The diagram below shows the content infrastructure augmented with Elasticsearch and the content delivery service.

Content Delivery ServiceAs seen in the diagram, Interwoven is still the source of record and the primary way Marketing manages their content. But now, content fragments and system data are published to Elasticsearch. The front-end Single Page Apps ask the Content Delivery Service for content based on some set of context. The content is returned as a collection of JSON objects. The SPAs then take those objects and format them as needed.

Content Objects are Pure Content

A key concept worth emphasizing is that a content object is pure content. It contains no markup. It might have some properties that describe how it is expected to be used, but it is completely lacking in implementation. This has several benefits:

  1. Content objects returned by the Content Delivery Service can be used across any and all channels (such as mobile) rather than being specific to a single channel (such as web).
  2. Within a given channel the same object can have many different presentations.
  3. Responsibilities are cleanly separated: The content service provides content. The front-end applications style and present the content for consumption.

This was a bit of a departure from how things used to be done. In the bad old days presentation was always getting mixed up with content which severely limits reuse.

Micro-services Provide Administrative Features

I mentioned earlier that the Content Delivery Service is read-only. And in my previous diagram I showed Interwoven talking directly to the Elasticsearch cluster. In reality, we don’t let anyone talk directly to the Elasticsearch cluster. Instead, all writes have to go through the Content Management Service. This ensures that we know exactly what is going into the cluster and who is putting it there.

The other role the Content Management Service plays is JSON validation. When new types of content objects are developed we use JSON Schema to codify the structure. When a person or system posts a content object to the Content Management Service, the service validates the object against its JSON Schema before storing it in Elasticsearch.

In addition to the Content Management Service we also implemented a Scheduled Job Service. As the name suggests, it is used to perform administrative tasks on a schedule. For instance, maybe content needs to be reindexed from one cluster to another in a lower environment. Or maybe content needs to be fetched from a third-party and written to the cluster. The Job Service is able to talk to either the Content Management Service or Elasticsearch directly, depending on the task it needs to execute.

All of the administrative services are independently deployed web applications that sit behind an API Gateway. The Gateway leverages the Netflix Zuul Proxy. It is responsible for authenticating against LDAP and creating a shared session in redis. It gives the content admin team a single URL to hit and isolates authentication logic in a single place.

The diagram below shows the fully-realized picture.

Administrative ServicesA few key components aren’t on the diagram. We use Shield to protect the Elasticsearch cluster. Shield also makes it easy to configure SSL for node-to-node communication and provides out-of-the-box LDAP integration. With Shield we can map LDAP groups to roles and then grant roles various privileges on our Elasticsearch cluster and its indices.

We use Watcher to monitor cluster health and job failures that may happen in the Scheduled Job Service. The client has their own enterprise alerting and monitoring solution, but Watcher gives the content management team a flexible, powerful tool for keeping track of things at a level that is probably more granular than what the enterprise ops team cares about.

Ready for the Future

With Elasticsearch and a few relatively small services on top of that, this travel giant now has what it needs to provide its customers with a more customized online experience. Content can be targeted to the users it is most appropriate for using any kind of context the Marketing team can come up with. As the front-end commerce app evolves, new types of content objects can be added easily and be served to the front-end with no schema or service changes required. And it’s all built on commercially-supported open source software.

The emerging Content-as-a-Service market

at-your-service-by-andrew-j-cosgriff-cropThere is an interesting new market emerging in the world of content management: Commercially-hosted Content-as-a-Service (CaaS). These are vendors who provide a service your applications can leverage for content management. Different than, “Hey look, we’re running our old school CMS in the cloud!”, CaaS is singular in focus and free from the feature bloat and operational complexity typical of the CMS your parents probably used.

At a minimum, CaaS vendors provide the following:

  • a hosted repository,
  • some mechanism for defining the types of content you need to manage,
  • a RESTful API to get content and static assets into and out of the repository,
  • a web-based user interface for managing content,
  • web hooks for taking action when content changes,
  • CDN integration for efficiently serving up static assets, and
  • an up-time and performance SLA.

You then build your web site or mobile app using any technology that suits your needs and fetch content as JSON using the API.

The best approach is to use the service to manage reusable, presentation-agnostic chunks of content. Metadata associated with the content chunks can then be used to make it easier to fetch the content for a variety of contexts. Because it is free of presentation the content can be more easily shared and reused across properties and channels.

Why not Drupal or WordPress?

CaaS vendors do not directly compete with full-featured platforms like Drupal or WordPress. There are Drupal and WordPress modules that add RESTful APIs on top of those platforms, so you could build a web or mobile site that is completely de-coupled from your Drupal back-end. Conversely, you could build a web site on top of a CaaS vendor’s service that had the same look, feel, and features of a site built with a traditional CMS. But both of those examples miss the point of CaaS which is, in a word, simplicity.

I’m not saying products like Drupal and WordPress are hard to use. On the contrary, you can install those tools and have a great looking site up-and-running in minutes. I’ve run this blog on WordPress for years and I am extremely happy with it. And sites like wordpress.com and Drupal Gardens take the hassle out of setting up your own server.

When I say the key to CaaS is simplicity I mean it strips away everything. It makes no assumptions. A hosted CaaS offering should distill content management down to its very essence, implied by the term itself: to manage content. Do nothing else. Take this chunk of JSON, free of any hint of style or presentation, and store it for me, making it available via a tool-agnostic API to my front-end channels to present as I see fit.

This pragmatic approach to content management can be implemented on-premises or on your own cloud-based servers using freely-available technology. I’ll talk more about that in another post. The nice thing about hosted CaaS is that you don’t have to assemble, test, scale, and maintain the solution yourself. Yes, you are giving up some amount of control, the degree to which varies across vendors, but many are willing and able to make that trade-off.

Business model

As with other Software-as-a-Service (SaaS) offerings, CaaS vendors charge a monthly subscription for their service. Some charge additional fees based on things such as number of content objects managed, number of content authors, and data volume. All of the market leaders I looked into provide a free-to-get-started plan to make it easy on developers in the early days of their projects.

Approach appeals to both startups and enterprises

The primary target market for CaaS vendors is clearly start-ups who are writing mobile and/or web apps that need some form of content management. Cost is usually a major factor for this segment, at least until the venture proves itself successful, but so is simplicity and efficiency. There’s no time for complex server installs, any sort of run-and-maintain burden, or pushing new app versions as content evolves. Hosted CaaS is a natural fit for these folks.

But this approach also make sense for enterprises, many of whom are still wrestling with their legacy content management vendor boat anchors (I’m looking at you, Interwoven). A hosted service that does nothing more than capture and share content chunks is a refreshing contrast to those bloated, over-priced WCM systems that require a huge staff to run and maintain yet still leave end-users frustrated.

Those systems haven’t changed much in nearly two decades and yet they remain firmly embedded in many companies where they are busy managing sites that may have been state of the art in 1999, but in a world where even the concept of a “page” is falling by the wayside, are now woefully outdated.

The content-as-a-service approach (API-first, native JSON, pragmatic, emphasis on reuse) aligns with how mobile apps and modern web sites are built and deployed as well as their content needs. This is true whether those apps are built by scrappy startups or huge enterprises.

Stay tuned for a CaaS round-up

So join me as I take a look at some of the players in the CaaS space. In the coming posts I’ll be looking at Prismic, Contentful, and Cloud CMS. If you have used any of these for your mobile or web project and you want to share your story with other ecmarchitect.com readers, do let me know.

Alfresco, NOSQL, and the Future of ECM

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

Introducing the Alfresco Community Committer Program

It’s been a little over two years since I wrote a blog post entitled, “Is Alfresco the ‘near beer’ of open source?“. In that post, I lamented the fact that the Alfresco code line is entirely closed to community developers and that Alfresco seems unwilling to relinquish any amount of control over the development of their open source product. Writing that post had me a bit riled up so during the Q&A session at the community meetup in San Jose later that week, I asked John Newton, Alfresco CTO, and former Alfrescan, Kevin Cochrane when and if it would ever be different. They said they were “working on it” (See Alfresco pledges to open community by 3.0).

I’m glad to say that, although it took a while, there is now a process by which your code can find its way into the Alfresco code base (Community, and even, potentially, Enterprise). It’s called the Alfresco Community Committer Program (ACCP). The ACCP is a motley crew of volunteers from Alfresco customers and partners around the world. Although not a requirement for membership, I think most of us have developed at least one open source add-on for Alfresco. Our goal is to help community-developed code find its way into the product. Does this mean Alfresco is now as open as “true” open source community projects like Apache and Drupal? No, and honestly, I’m not sure it will ever get there. But Alfresco’s support of the ACCP process is a start. Here’s how the process works.

First step: Nomination to the ACCP Incubator

Today, developers in the community create add-ons, utilities, extensions, language packs and all kinds of software built to work with Alfresco. Some of these might make great additions to the Alfresco product. At a high-level, what the ACCP seeks to do is to act as an on-ramp or incubator for that subset of projects. We want you, real world Alfresco developers and end-users, to nominate community-developed extensions that you find useful and that you would eventually like to see as part of the Alfresco product. The ACCP then reviews these nominations and votes for their inclusion into the incubator. The project’s developers can then decide to leave their code where it is (Google Code, Sourceforge, Alfresco forge, etc.) or they may choose to migrate to the Alfresco-hosted ACCP incubator subversion repository.

Projects accepted to the incubator so far include:

As a side note, it’s great that there are so many community-developed add-ons for Alfresco. But the lack of a central index makes it hard to see what’s available. As a related effort, Nancy Garrity is working on something that would provide a central index, support ratings, etc.

Second step: Community code line

Once a project has been in the incubator for a while, the ACCP may recommend its inclusion as part of Alfresco Community making it much easier for Alfresco Community users to leverage these add-ons. The exact nature of how these will be made available is still being worked out. You could imagine a “community-extensions” directory under the Alfresco Community subversion root or something similar. For certain types of contributions, maybe the installer could even provide an optional “install community extensions” step. Again, although we have recently voted some projects into the ACCP incubator, none have yet to reach Community so the details of exactly how those will be incorporated into the Community code base are still being worked out.

Third step: Enterprise code line

The ACCP may then recommend Enterprise adoption. This step is subject to Alfresco Engineering approval, which may be a significant hurdle for some, but if it happens, the entire Alfresco customer base gets the benefit of Alfresco’s ongoing support of the community-developed code. Note that the Enterprise approval step is the only one where Alfresco employees have a say about how an ACCP project is handled–per our charter, Alfresco employees cannot be voting members of the committee.

How you can get involved

First, and foremost, you can nominate an open source Alfresco add-on/extension/customization project. If you want to take an active role on the committee or know someone who would be a good addition, there are spots available. So, another way to help out would be to serve on the committee or nominate someone who should. The committee meets regularly to review and vote on project and committee member nominations. All you have to do is get in touch with me or one of the other regular members of the committee. You’ll find the list on the Alfresco Community Contributor Program wiki page.

We’ll be doing a webinar on July 27th to talk about this more and answer questions. Check out the Alfresco events page to register.

Big News: I’ve left Optaros to start my own firm

After nearly four years at Optaros I’ve decided to start a new chapter in my career. I’ve created a new firm called Metaversant that is focused on providing content-centric solutions and consulting to clients across all verticals and geographies. Based on my deep experience with the platform and my active participation in the community, I expect Metaversant to be heavily-focused on Alfresco. We may broaden into other technologies over time, but the over-arching theme will be to help companies get the most out of their digital assets–whether that’s documents, web content, rich media, or legal records–by leveraging Enterprise-ready, open, rich content repositories.

On one hand, I was sad to leave Optaros–it was a great place to work with lots of smart people and interesting clients/projects. And I had fun building the ECM practice into a significant portion of overall revenue. On the other hand, the timing felt right to make this change and I’m very excited about starting my own company. Optaros and I are on great terms and I’m sure we’ll find ways to do business together going forward.

There are a lot of to-do’s to get Metaversant fully functional as a corporation, but nothing’s more important than the success of your project. If you are looking for help in any of the following areas, we should talk:

  • Customized, on-site Alfresco developer training
  • Short-term tactical technical assistance
  • Architectural reviews/product fit assessments
  • Content Management customization & implementation leadership
  • Custom content-centric application development & integration
  • Domino.Doc, Vignette, Stellent, or other legacy ECM migrations

You can contact me at “jpotts” at either this domain or metaversant.com.

As usual, keep an eye out here for news on Metaversant (like a link to the yet-to-be-built web site) and other content management news and thanks for your continued support!

ECM vendors have their heads in the cloud, can you see through the fog?

The hype around cloud computing has reached a fevered pitch so it is natural that ECM vendors try to take advantage of that as much as they can. Some examples from the open source ECM world:

  • Alfresco always seems to be partnering with one cloud vendor or another. I went to a brief session on Alfresco, GoGrid, and ParaScale earlier this year. (As an aside, those GoGrid cycling socks, which I thought was a strange giveaway at the time, are awesome).
  • At the end of last year eZ Publish announced a partnership with Mamut to provide eZ as SaaS.
  • Just last week Nuxeo announced a cloud edition of its product.

Clearly, ECM vendors are busy figuring out how to take advantage of the cloud. But what does it mean for ECM to be “in the cloud”? When might it work for you?

Cirrus, Stratus, or Cumulonimbus

The first thing you need to realize is that when people say “cloud” they often mean very different things. Generally, there are three types of clouds: Software-as-a-Service (Saas), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS).

Software-as-a-Service (SaaS) is the same model that’s been around for years but has lately taken advantage of the cloud moniker. Google Apps and Salesforce.com are the big SaaS players but there are SaaS offerings for all kinds of business applications, including content management.

The allure of SaaS ECM is the same as that of SaaS in general:

  • Lower up-front costs
  • Someone else gets to worry about running and scaling the infrastructure
  • Depending on the vendor, you may only have to pay for what you use

The challenges of SaaS ECM include things like:

  • The ability to do heavy customization and complex workflows
  • Ease of integration with other systems
  • Client perceptions (and real issues) around data security
  • Data portability/vendor lock-in

Open Source CM vendors Nuxeo and eZ Systems have SaaS offerings as do proprietary vendors such as SpringCM, CrownPeak, Clickability, and PaperThin, to name a few. Beyond just general-purpose document and content management, I think you’ll also see vendors build verticalized SaaS offerings on top of hosted content management technology.

The next type of cloud is Platform-as-a-Service (PaaS). The two best examples of PaaS are Google App Engine (GAE) and Salesforce.com’s force.com platform. With PaaS, you provide the code and the PaaS provider does the rest. Of course this means your code has to follow certain standards and is often subject to limitations, but the beauty is that you get a completely custom solution without worrying about any of the infrastructure.

I like GAE. For certain applications, the benefits of instantaneous, global scale far outweigh the limitations of the platform. But I don’t expect ECM vendors that would do well in SaaS or IaaS clouds to do much with PaaS. You can’t take an Alfresco or a Drupal and run it on a PaaS cloud. I do think we will see PaaS-native content management systems. For example, I’ve seen apps in the Salesforce.com AppExchange that are basically tools for building a web site that’s tightly integrated with Salesforce.com. I think you’ll also see solutions that leverage a PaaS for certain components or sub-systems.

The third type of cloud is Infrastructure-as-a-Service (IaaS). An IaaS cloud is about providing virtual servers on-demand. Examples include things like Amazon’s EC2, Rackspace Cloud, and GoGrid. With these services you can instantly provision as many servers as you need. What you do with them is up to you. When you’re done, you turn them off. Specifics vary but you are essentially billed for CPU time.

The way people leverage IaaS differs. Some people will provision a server and install their ECM software of choice and stop there. Other than dealing with different file storage approaches of various IaaS vendors, this is really no different than running your own virtual servers. So when someone says they are running XYZ CMS “in the cloud” and it turns out to be a single node on a virtual machine, I can barely stifle a yawn. It’s fast and convenient to set up, yes, but technically it’s pretty boring.

The more interesting way to use ECM in an IaaS cloud is to leverage the ability of the infrastructure to scale on-demand. That’s the real value of “the cloud” after all. For example, at Optaros we run an IaaS-hosted solution called OView that syndicates content and content-centric applications to web sites. When a client places that content or app on Yahoo’s home page we get a huge spike in traffic. We run the solution on Amazon EC2 images and we use RightScale to dynamically provision additional nodes when traffic warrants.

The degree to which a specific ECM vendor can operate in a dynamically-scaled infrastructure varies greatly. Simply “running in the cloud” is easy. Scaling your ECM infrastructure automagically is harder.

What do you really need?

If the list of SaaS benefits have a lot of appeal to you and the challenges and potential limitations aren’t much of a bother, SaaS ECM might be worth evaluating. This will most likely be a better fit for clients with limited IT resources and simple to moderate requirements around ECM.

On the IaaS front, if it is just an issue of externally-hosting your ECM infrastructure, make sure the cloud is what you want. The best use case for the cloud is when demand is temporary or unpredictable with huge spikes. I would argue that for your core ECM infrastructure demand is neither temporary nor unpredictable.

If “scale” is your issue, I would challenge you to think about exactly what needs to be scaled. If it is just content delivery of static content, maybe you could get by with a CDN. If your content management system can separate authoring from dynamic delivery of content, maybe only the dynamic content delivery mechanism needs to be able to scale quickly.

You might have certain processes (large-scale video transcoding, for example, or other types of periodic batch processing) that you could leverage the cloud for without cloud-enabling your entire ECM infrastructure. Acquia‘s hosted spam filtering service, Mollum, and their newly-released hosted-search offering are two examples where only specific pieces of your infrastructure are off-loaded to the cloud.

If it turns out that you need to scale the whole ball of wax, fine, it can be done, but have a good reason.

ECM in the cloud is, um, cloudy

The cloud as a style of computing is exciting. The cloud as a “feature” is potentially confusing. ECM vendors are going to do what they can do have it somewhere “on the box”. But it’s not something you can simply check off. The next time you hear an ECM vendor say, “cloud-ready”, ask them what they mean. Then figure out whether or not that has any relevance at all to your real requirements.

Is the cloud on your horizon? Let me know if/how the cloud relates to your ECM strategy.