Month: May 2019

ECM: You Ain’t Gonna Need It

Sometimes I feel like I spend as much time telling people why they don’t need an Enterprise Content Management (ECM) platform as I do helping people implement one. Today’s blog post by my friends over at TSG underscores another use case where a full-blown ECM platform may be overkill: Serving as a repository for huge volumes of files.

Their blog post says they are loading 20,000 documents per second into DynamoDB with a goal of getting up to 11 billion documents. The actual files are stored in S3, and that load rate does not include uploading files to S3 buckets, so this doesn’t exactly mimic a real-world bulk document import scenario, but that’s not what TSG was trying to test.

TSG correctly points out that it is the metadata repository, which legacy vendors often base on relational databases, that struggles in high-volume implementations, so their test focuses on the ability of Dynamo to store a high-volume of data while maintaining performance.

Let me shift slightly from case management, which is what TSG focuses on, to the more general problem firms have when they generate 100’s of millions of files that they need to manage and deliver to their stakeholders.

I have seen multiple clients and prospects who have very demanding requirements in terms of data volume while at the same time requiring very little in terms of what I’d call traditional ECM functionality. Often, they need nothing more than a RESTful API to get data into and out of the repository and some basic searching across a few metadata fields. Insurance companies and financial services companies are two examples of industries with lots of such use cases.

TSG is using native AWS services to manage metadata (DynamoDB) and file storage (S3), but this can also be done on-premises leveraging either commercial or open source solutions to provide a nearly-infinite scale, highly-performant, fully redundant solution.

The important part here is that you do not need an ECM platform to do this. In fact, many high-profile customers of legacy ECM vendors like Documentum, Alfresco, and FileNet, are actively moving away from those platforms for use cases like these.

Why? Companies tell me those platforms are proving too difficult to scale, too complex to implement and maintain, and too expensive in terms of licensing cost.

My company, Metaversant, can build you a minimal content management system in your own data center or in AWS. We call our solution Magritte. It includes:

  • Scalable and distributed object storage built on commodity disk drives
  • Flexible content model
  • REST API for CRUD functions
  • Basic permissions on objects
  • Metadata search (and full-text search if you need it)
  • Ability to manage billions of objects
  • $0 in mandatory licensing costs (commercial support of underlying open source components is available at the customer’s option)

That’s enough functionality for many use cases. In fact it seems like the exact right amount of functionality for managing things like customer statements, contracts, and agreements in large insurance or financial services companies.

What legacy ECM platforms include–to be sure, for a cost, in terms of both real dollars and complexity–are things like:

  • Support not for just object storage but also additional file system types (Glacier, NAS, SAN, EMC Centera)
  • Extensible, formal (schema-based) content model
  • REST API for every aspect of the platform
  • Foundational/native APIs, client libraries, or SDKs
  • Complex, fine-grained access control lists, including ability to support inheritance and “deny”
  • Extensible platform (hooks where developers can add code to alter or enhance the platform functionality)
  • Support for additional file protocols such as FTP, SMB, IMAP, SMTP, WebDAV
  • Support for content repository standards such as JCR and CMIS
  • Transformation engine (generates previews and thumbnails)
  • Workflow engine
  • Rules engine
  • Analytics, reporting, & dashboards
  • Integrations with third-party systems such as Outlook, SAP, Salesforce, Google Docs, Box, Dropbox
  • Web-based user interface
  • Application components/framework for extending the web UI or for building new custom web UIs
  • Forms engine
  • Mobile applications
  • Desktop Sync
  • Business-specific applications (Records Management, Media Management, Reporting)

Can those traditional ECM features be added to a minimal content management solution like Magritte? Of course. And if you add enough of them you might be better off with a legacy ECM vendor (assuming you can get it to scale to meet your needs, which may be a big assumption depending on your operational constraints).

But if you don’t need a lot or any of those additional features, why start out implementing and paying for an entire aircraft carrier if what you really need is a speedboat?

In software there’s a phrase, “You Ain’t Gonna Need It”, which aims to defer development of features until they are actually needed instead of developing them now for some future need, which may never materialize. In ECM, you might take on the complexity of an entire platform because the “E” in “Enterprise” makes you think you are implementing something the entire company will leverage. That hasn’t panned out–just look at how many companies have multiple so-called “Enterprise” Content Management systems.

Instead of continuing to install these giant platforms, most of which go unused, let’s implement right-sized solutions on top of clusterable, scalable, open source components that talk to each other via API’s. ECM: You Ain’t Gonna Need It.

(Updated 6/3/2020 to fix a minor typo)