Tag: ECM

ECM: You Ain’t Gonna Need It

Sometimes I feel like I spend as much time telling people why they don’t need an Enterprise Content Management (ECM) platform as I do helping people implement one. Today’s blog post by my friends over at TSG underscores another use case where a full-blown ECM platform may be overkill: Serving as a repository for huge volumes of files.

Their blog post says they are loading 20,000 documents per second into DynamoDB with a goal of getting up to 11 billion documents. The actual files are stored in S3, and that load rate does not include uploading files to S3 buckets, so this doesn’t exactly mimic a real-world bulk document import scenario, but that’s not what TSG was trying to test.

TSG correctly points out that it is the metadata repository, which legacy vendors often base on relational databases, that struggles in high-volume implementations, so their test focuses on the ability of Dynamo to store a high-volume of data while maintaining performance.

Let me shift slightly from case management, which is what TSG focuses on, to the more general problem firms have when they generate 100’s of millions of files that they need to manage and deliver to their stakeholders.

I have seen multiple clients and prospects who have very demanding requirements in terms of data volume while at the same time requiring very little in terms of what I’d call traditional ECM functionality. Often, they need nothing more than a RESTful API to get data into and out of the repository and some basic searching across a few metadata fields. Insurance companies and financial services companies are two examples of industries with lots of such use cases.

TSG is using native AWS services to manage metadata (DynamoDB) and file storage (S3), but this can also be done on-premises leveraging either commercial or open source solutions to provide a nearly-infinite scale, highly-performant, fully redundant solution.

The important part here is that you do not need an ECM platform to do this. In fact, many high-profile customers of legacy ECM vendors like Documentum, Alfresco, and FileNet, are actively moving away from those platforms for use cases like these.

Why? Companies tell me those platforms are proving too difficult to scale, too complex to implement and maintain, and too expensive in terms of licensing cost.

My company, Metaversant, can build you a minimal content management system in your own data center or in AWS. We call our solution Magritte. It includes:

  • Scalable and distributed object storage built on commodity disk drives
  • Flexible content model
  • REST API for CRUD functions
  • Basic permissions on objects
  • Metadata search (and full-text search if you need it)
  • Ability to manage billions of objects
  • $0 in mandatory licensing costs (commercial support of underlying open source components is available at the customer’s option)

That’s enough functionality for many use cases. In fact it seems like the exact right amount of functionality for managing things like customer statements, contracts, and agreements in large insurance or financial services companies.

What legacy ECM platforms include–to be sure, for a cost, in terms of both real dollars and complexity–are things like:

  • Support not for just object storage but also additional file system types (Glacier, NAS, SAN, EMC Centera)
  • Extensible, formal (schema-based) content model
  • REST API for every aspect of the platform
  • Foundational/native APIs, client libraries, or SDKs
  • Complex, fine-grained access control lists, including ability to support inheritance and “deny”
  • Extensible platform (hooks where developers can add code to alter or enhance the platform functionality)
  • Support for additional file protocols such as FTP, SMB, IMAP, SMTP, WebDAV
  • Support for content repository standards such as JCR and CMIS
  • Transformation engine (generates previews and thumbnails)
  • Workflow engine
  • Rules engine
  • Analytics, reporting, & dashboards
  • Integrations with third-party systems such as Outlook, SAP, Salesforce, Google Docs, Box, Dropbox
  • Web-based user interface
  • Application components/framework for extending the web UI or for building new custom web UIs
  • Forms engine
  • Mobile applications
  • Desktop Sync
  • Business-specific applications (Records Management, Media Management, Reporting)

Can those traditional ECM features be added to a minimal content management solution like Magritte? Of course. And if you add enough of them you might be better off with a legacy ECM vendor (assuming you can get it to scale to meet your needs, which may be a big assumption depending on your operational constraints).

But if you don’t need a lot or any of those additional features, why start out implementing and paying for an entire aircraft carrier if what you really need is a speedboat?

In software there’s a phrase, “You Ain’t Gonna Need It”, which aims to defer development of features until they are actually needed instead of developing them now for some future need, which may never materialize. In ECM, you might take on the complexity of an entire platform because the “E” in “Enterprise” makes you think you are implementing something the entire company will leverage. That hasn’t panned out–just look at how many companies have multiple so-called “Enterprise” Content Management systems.

Instead of continuing to install these giant platforms, most of which go unused, let’s implement right-sized solutions on top of clusterable, scalable, open source components that talk to each other via API’s. ECM: You Ain’t Gonna Need It.

(Updated 6/3/2020 to fix a minor typo)

CMIS 1.1 is now an approved spec; Here’s a recap of what’s new

CMIS LogoWith a final, heroic push and get-out-the-vote campaign, the OASIS CMIS Technical Committee (TC), the committee that is responsible for moving the CMIS specification forward, was able to get enough votes on Thursday to ratify the next version of Content Management Interoperability Services (CMIS) as a standard. This is a seriously cool accomplishment for everyone on the TC and the entire Enterprise Content Management (ECM) industry because CMIS establishes an industry standard for working with repositories like Alfresco, Documentum, FileNet, Nuxeo, and SharePoint, and it is important that the spec continues to evolve.

CMIS 1.1 has some exciting new features. Here’s a re-cap of what’s new…

Browser Binding

A binding is the protocol a client uses to talk to a CMIS server. CMIS 1.0 supported two bindings, Web Services (SOAP) and AtomPub (RESTful XML), the latter being the most performant and the most popular. But if you’ve ever looked at the XML that comes back from a CMIS AtomPub call you know how verbose it can be. The Browser Binding is based on JSON, so the payloads that go between client and server are smaller, making it the fastest of the three bindings. The original purpose of the Browser Binding was to make it easy for those building “single page” web apps or doing other work with CMIS via client-side JavaScript, but I think apps of all types will move to the Browser Binding as quickly as possible simply because it is easier to work with.

Type Mutability

This allows CMIS developers to create and update the repository’s content model with code. Imagine you’re building an Accounts Payable solution. You’re using CMIS because you want your solution to run on top of any CMIS-compliant ECM repository. It is highly likely you will need content types to store objects with metadata specific to your solution. Before CMIS 1.1, you have to ship repository-specific content models and installation and configuration instructions with your app. With CMIS 1.1, developers can simply write install and config logic as part of the solution that will interrogate the repository to determine if any changes need to be made to the content model to support the solution, and if changes are required, implement them.

Secondary Types

Some repositories (like Alfresco) have the concept of free-floating or cross-cutting content types that group together related properties that can be added to object instances in the repository. For example, perhaps you want to define a “client-related” set of properties that can be added to any document in the repository that is related to one of your clients. Not all documents are related to a client, but the ones that are need to be able to refer to a client name or number or something. In Alfresco, these are called “aspects”. CMIS 1.0 didn’t support aspects natively so developers using CMIS to query for or set properties defined in an aspect had to use a workaround. In CMIS 1.1, aspects are supported natively.

New “Item” Type

In CMIS 1.0, Document objects are assumed to have a content stream. Some repositories even require it. In Alfresco, this means if you want to work with a type that inherits from something other than cm:content (or cm:folder), you are out-of-luck. CMIS 1.1 adds a new base object type called “Item” that represents objects that don’t have a file associated with it.

Bulk Updates

CMIS 1.1 adds a new feature that makes mass changes more performant. Instead of iterating over a list of objects, changing and saving each one, you can define a set of property changes and make those against an entire collection, which is much more efficient.

Append to Content Stream

A challenge with any ECM project is how to move large files into the repository. The new append to content stream feature in CMIS 1.1 allows you to send files to the repository in chunks which could be a key to addressing that challenge.

Retentions and Holds

This new feature allows you to set retention periods for a piece of content or place a legal hold on content through the CMIS 1.1 API. This is useful in compliance solutions like Records Management (RM). Honestly, I am not a big fan of this feature. It seems too specific to a particular domain (RM) and I think CMIS should be more general. If you are going to start adding RM features into the spec, why not add Web Content Management (WCM) features as well? And Digital Asset Management (DAM) and so on? I’m sure it is useful, I just don’t think it belongs in the spec.

So that’s what’s new in CMIS 1.1. You can read the authoritative spec for details.

When Will My Favorite Repository Support CMIS 1.1?

That’s up to each vendor. If these features are important to your installation or the solution you are building, you should be making it very clear to your vendor contacts that you want to see these features get priority over other things the engineering team might be working on. As for Alfresco, I can’t make any promises on dates. We’ve had experimental support for the browser binding in place for some time. I think we all want to see the other CMIS 1.1 features in both Alfresco on-premise and in the cloud sooner rather than later, but I don’t know when that will be.

Try It!

You can play with CMIS 1.1 by downloading the OpenCMIS InMemory Server from Apache Chemistry and a client library for your favorite language, or just launch the OpenCMIS Workbench and you’ll see the Browser Binding as an option when you connect. If you need to know more about CMIS and the client libraries, server frameworks, and CMIS development tools available at Apache Chemistry, you should buy the book Jay Brown, Florian Mueller, and I have been working on. It should be in print later this summer but you can get the eBook now. It covers both CMIS 1.0 and CMIS 1.1.

Updated Python CMIS library released

I’ve tagged and released a new version of cmislib, the Python CMIS client library. What’s cool about this release is that it is the first one known to work with more than one CMIS provider. Yea for interoperability! The beauty of CMIS, realized! Okay, it wasn’t that beautiful, it’s still “0.1”, and there are known issues. But I can now say the library works with both Alfresco and IBM FileNet and that’s a Good Thing.

IBM was a big help with this. Al Brown, one of the CMIS spec leads turned one of his colleagues, Jay Brown, onto cmislib. Jay called me up and asked, “If I give you access to a FileNet P8 server, can you test cmislib against it?” I was on it faster than you could say, “unittest.main()”.

I think the effort was valuable for all sides. Our little “mini plugfest” turned up issues in my client as well as both CMIS providers. Jay worked hard to chase down everything on the FileNet side. Dave Caruana chased a few down on the Alfresco side as well. Thanks to everyone for the team effort.

Anyway, give the new cmislib release a try and give me your feedback. If you want a feel for how easy it can be to work with CMIS repositories using the cmislib API, check out the documentation or dive right in. Installation is as easy as “easy_install cmislib” (easy_install instructions).

Next up is Nuxeo. Can the open source ECM vendor achieve cmislib Unit Test Greatness faster than Big Blue? We shall see!

Forrester says 2010 looks good for ECM

Forrester has released the results from its 2009 Global Enterprise Content Management Online Survey. Here are a few of the things that jumped out at me…

72% of respondents plan on increasing their ECM investments in the coming year. That’s certainly good news. Of those increasing their investment, the big drivers are content sharing, compliance, search, and automation, which are all typical reasons to roll out a content management solution.

When asked to list the vendors that supply them with ECM solutions, 63% of respondents included Microsoft with EMC a distant second at 35%. (I kind of expected that Microsoft number to be higher). OpenText/Vignette (29%) and IBM (28%) were clustered right around there with a third clump forming around Autonomy/Interwoven (19%), Oracle (17%), and Alfresco (14%). The only other open source ECM players explicitly named were KnowledgeTree and Nuxeo, each with 1%. Almost a third of respondents also listed “Other, please specify” but Forrester doesn’t provide the list of write-ins. I assume it is a bunch of small, niche or homegrown solutions because the usual suspects were listed as explicit choices. Still, this chart and the one following that shows that nearly 3/4 of respondents have 2 or more ECM solutions in-house confirms what we’ve seen in our Optaros clients: Most people haven’t settled on a single ECM provider.

A little more than 1 in 4 of respondents were unsatisfied with their ECM solution. Of those, 41% blamed the solution itself as failing to “live up to expectations” followed by the usual grab bag of non-technical reasons IT projects fail. I would have liked to see a follow-up that dissected the various ways the solution fell short. Was it not able to do something you thought it was going to be able to do? Was stability an issue? Scale? Bad support experience? Or was it just that the beans you were told were magic turned out to be just plain old beans?

As my college stats teacher was fond of saying, “There are three kinds of lies: Lies, Damn Lies, and Statistics,” so take all of this with a grain of salt.

ECM vendors have their heads in the cloud, can you see through the fog?

The hype around cloud computing has reached a fevered pitch so it is natural that ECM vendors try to take advantage of that as much as they can. Some examples from the open source ECM world:

  • Alfresco always seems to be partnering with one cloud vendor or another. I went to a brief session on Alfresco, GoGrid, and ParaScale earlier this year. (As an aside, those GoGrid cycling socks, which I thought was a strange giveaway at the time, are awesome).
  • At the end of last year eZ Publish announced a partnership with Mamut to provide eZ as SaaS.
  • Just last week Nuxeo announced a cloud edition of its product.

Clearly, ECM vendors are busy figuring out how to take advantage of the cloud. But what does it mean for ECM to be “in the cloud”? When might it work for you?

Cirrus, Stratus, or Cumulonimbus

The first thing you need to realize is that when people say “cloud” they often mean very different things. Generally, there are three types of clouds: Software-as-a-Service (Saas), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS).

Software-as-a-Service (SaaS) is the same model that’s been around for years but has lately taken advantage of the cloud moniker. Google Apps and Salesforce.com are the big SaaS players but there are SaaS offerings for all kinds of business applications, including content management.

The allure of SaaS ECM is the same as that of SaaS in general:

  • Lower up-front costs
  • Someone else gets to worry about running and scaling the infrastructure
  • Depending on the vendor, you may only have to pay for what you use

The challenges of SaaS ECM include things like:

  • The ability to do heavy customization and complex workflows
  • Ease of integration with other systems
  • Client perceptions (and real issues) around data security
  • Data portability/vendor lock-in

Open Source CM vendors Nuxeo and eZ Systems have SaaS offerings as do proprietary vendors such as SpringCM, CrownPeak, Clickability, and PaperThin, to name a few. Beyond just general-purpose document and content management, I think you’ll also see vendors build verticalized SaaS offerings on top of hosted content management technology.

The next type of cloud is Platform-as-a-Service (PaaS). The two best examples of PaaS are Google App Engine (GAE) and Salesforce.com’s force.com platform. With PaaS, you provide the code and the PaaS provider does the rest. Of course this means your code has to follow certain standards and is often subject to limitations, but the beauty is that you get a completely custom solution without worrying about any of the infrastructure.

I like GAE. For certain applications, the benefits of instantaneous, global scale far outweigh the limitations of the platform. But I don’t expect ECM vendors that would do well in SaaS or IaaS clouds to do much with PaaS. You can’t take an Alfresco or a Drupal and run it on a PaaS cloud. I do think we will see PaaS-native content management systems. For example, I’ve seen apps in the Salesforce.com AppExchange that are basically tools for building a web site that’s tightly integrated with Salesforce.com. I think you’ll also see solutions that leverage a PaaS for certain components or sub-systems.

The third type of cloud is Infrastructure-as-a-Service (IaaS). An IaaS cloud is about providing virtual servers on-demand. Examples include things like Amazon’s EC2, Rackspace Cloud, and GoGrid. With these services you can instantly provision as many servers as you need. What you do with them is up to you. When you’re done, you turn them off. Specifics vary but you are essentially billed for CPU time.

The way people leverage IaaS differs. Some people will provision a server and install their ECM software of choice and stop there. Other than dealing with different file storage approaches of various IaaS vendors, this is really no different than running your own virtual servers. So when someone says they are running XYZ CMS “in the cloud” and it turns out to be a single node on a virtual machine, I can barely stifle a yawn. It’s fast and convenient to set up, yes, but technically it’s pretty boring.

The more interesting way to use ECM in an IaaS cloud is to leverage the ability of the infrastructure to scale on-demand. That’s the real value of “the cloud” after all. For example, at Optaros we run an IaaS-hosted solution called OView that syndicates content and content-centric applications to web sites. When a client places that content or app on Yahoo’s home page we get a huge spike in traffic. We run the solution on Amazon EC2 images and we use RightScale to dynamically provision additional nodes when traffic warrants.

The degree to which a specific ECM vendor can operate in a dynamically-scaled infrastructure varies greatly. Simply “running in the cloud” is easy. Scaling your ECM infrastructure automagically is harder.

What do you really need?

If the list of SaaS benefits have a lot of appeal to you and the challenges and potential limitations aren’t much of a bother, SaaS ECM might be worth evaluating. This will most likely be a better fit for clients with limited IT resources and simple to moderate requirements around ECM.

On the IaaS front, if it is just an issue of externally-hosting your ECM infrastructure, make sure the cloud is what you want. The best use case for the cloud is when demand is temporary or unpredictable with huge spikes. I would argue that for your core ECM infrastructure demand is neither temporary nor unpredictable.

If “scale” is your issue, I would challenge you to think about exactly what needs to be scaled. If it is just content delivery of static content, maybe you could get by with a CDN. If your content management system can separate authoring from dynamic delivery of content, maybe only the dynamic content delivery mechanism needs to be able to scale quickly.

You might have certain processes (large-scale video transcoding, for example, or other types of periodic batch processing) that you could leverage the cloud for without cloud-enabling your entire ECM infrastructure. Acquia‘s hosted spam filtering service, Mollum, and their newly-released hosted-search offering are two examples where only specific pieces of your infrastructure are off-loaded to the cloud.

If it turns out that you need to scale the whole ball of wax, fine, it can be done, but have a good reason.

ECM in the cloud is, um, cloudy

The cloud as a style of computing is exciting. The cloud as a “feature” is potentially confusing. ECM vendors are going to do what they can do have it somewhere “on the box”. But it’s not something you can simply check off. The next time you hear an ECM vendor say, “cloud-ready”, ask them what they mean. Then figure out whether or not that has any relevance at all to your real requirements.

Is the cloud on your horizon? Let me know if/how the cloud relates to your ECM strategy.

Alfresco ECM is 96% cheaper than legacy ECM vendors?

If you are evaluating ECM solutions, particularly if you are interested in cost, you need to take a look at Alfresco’s TCO Whitepaper. In it, Alfresco uses licensing numbers they snagged from the United States government to compare the first year costs of their solution with EMC/Documentum, OpenText, and Sharepoint.

When the whitepaper came to my attention, I expected it to be Marketing hype, full of soft numbers and exaggerated claims. While readers must take the paper with a grain of salt considering the obvious bias of the source, Alfresco does a good job of avoiding Marketing speak for the most part and simply laying out the facts. The whitepaper shows line item detail for licensing and support for the first year. If you want to include supporting infrastructure (OS, application server, database) in your analysis those are provided for you as well.

The paper shows that for document management plus collaboration and integration with SharePoint, you’d have to pay EMC/Documentum $863,937.98 for a 1000 user configuration as opposed to $318,738 for SharePoint and $33,500 for Alfresco for similarly-sized systems with equivalent functionality. Those numbers exclude the supporting infrastructure software.

So what’s the fine print? Here are some considerations…

The numbers Alfresco used are from a government price list. It isn’t clear to me whether those numbers are “list” or are a negotiated, reduced rate, but from my past experience with Documentum, I’d say they are closer to list. I don’t think it is likely that anyone would actually pay $800k for a 1000-user Documentum system. Even if you were to negotiate 50% off of those numbers, though, the difference is still significant.

A portion of the “first year’s cost” is maintenance and that recurs every year. For Alfresco you are only paying for maintenance, so the entire $33.5k will be due every year. Using the numbers from the whitepaper your Documentum maintenance bill would be about $115k every year. I think in all cases, the maintenance is probably understated for what typical clients will pay because most will want “top shelf” SLA’s. The numbers used here are for lower levels of service.

The legacy vendors have 1000’s of product configuration options. The line items Alfresco chose to include for the Documentum configuration look roughly right, but with so many options you can’t say with certainty that what’s listed is what everyone who needs a 1000-user document management system built with Documentum will use. So tweak the table using the quote your vendor gave you and come to your own conclusions.

Alfresco showed a 2-CPU configuration for their 1000-user config priced at $33,500 which included a test server. Then they showed a “high availability” config with a $9,250 up-charge. But they didn’t double the procs. If you’re going to be HA, you’ll need at least two of everything. While they did double the test server procs, they didn’t double the production server procs so the HA version of the 1000-user config should be more like $76,250, in my opinion. Incidentally, it isn’t clear to me what you get for that extra $9,250. I have an open question with the Alfresco folks to clarify both issues.

What about services? Honestly, it’s usually a wash. There are things you can get done faster because you can see the source code but there are other things you may end up spending more time on. When it comes to services, the primary value of open source is in the ability to spend less on the software and still end up getting something closer to what you actually need through customizations (See “Why Open Source?”).

Obviously, big decisions like this should never be made on cost alone. Documentum, FileNet, SharePoint, and Alfresco aren’t perfectly interchangeable. You still have to figure out which one is a better fit for you along all sorts of dimensions. But the stark analysis Alfresco is providing is likely to get a lot of attention from buyers who are particularly price-sensitive in today’s market.