CMIS: An open API for managing content

Most of the content in a company is completely unstructured. Just think about the documents you collaborate on with the rest of your team throughout the day. They might include things like proposals, architecture diagrams, presentations, invoices, screenshots, videos, books, meeting notes, or pictures from your last company get-together.

How does a company organize all of that content? Often it is scattered across file shares and employee hard drives. It isn’t really organized at all. It’s hard enough to simply find content in that environment, but what about answering questions like:

Is this the latest version and how has it changed over time?
Which customer is this document related to?
Who is allowed to read or make changes to this document?
How long are we legally required to keep this document?
When I’m done making my change to this document, what is the next step in the process?

To address this, companies will often write content-centric applications that try to put some order to the chaos. But most of our content resides in files, and files can be a pain to work with. Databases can store files up to a certain file size, but they aren’t great for working with audio and video. File systems solve that problem but they alone don’t offer rich functionality like the ability to track complex metadata with each file or the ability to easily full-text index and then run searches across all of your content.

That’s where a content repository comes in. You might hear these referred to as a Document Management (DM) system or an Enterprise Content Management (ECM) system. No matter what you call it, they are purpose-built for making it easier for your company to get a handle on its file-based content.

Here’s the problem for developers, though: There is a lot of repository software out there. Most large companies have more than one up-and-running in their organization, and every one of them has their own API. It’s rare that these systems exist in a vacuum. They often need to feed and consume business processes and that takes code. So if you are an enterprise developer, and you are trying to integrate some of your systems with your ECM repositories, you’ve got multiple API’s you need to learn. Or, if you are a software vendor, and you are trying to build a solution that requires a rich content repository as a back-end, you either have to choose a specific back-end to support or you have to write adapters to support a handful of repositories.

The solution to this problem is called Content Management Interoperability Services (CMIS). It’s an industry-wide specification managed by OASIS. It describes a domain language, a query language, and multiple protocols for working with a content repository. With CMIS, developers write against the CMIS API instead of learning each repository’s proprietary API, and their applications will work with any CMIS-compliant repository.

The first version of the specification became official in May of 2010. The most recent version, 1.1, became official this past May.

Several developers have been busy writing client libraries, server-side libraries, and tools related to CMIS. Many of these are collected as part of an umbrella open source project known as Apache Chemistry (http://chemistry.apache.org). The most active Apache Chemistry sub-project is OpenCMIS. It includes a Java client library (including Android), multiple servers for testing purposes, and some developer tools, such as a Java Swing-based repository browser called OpenCMIS Workbench. Apache Chemistry also includes libraries for Python, .NET, PHP, and Objective-C.

The tools and libraries at Apache Chemistry are a great way to get started with CMIS. For example, I’ve got the Apache Chemistry InMemory Repository deployed to a local Tomcat server. I can fire up OpenCMIS Workbench and connect to the server using its service URL, http://localhost:8080/chemistry/browser. Once I do that I can navigate the repository’s folder hiearchy, inspecting or performing actions against objects along they way.

The OpenCMIS Workbench has a built-in Groovy console. One of the examples that ships with the Workbench is “Execute a Query”. Here’s what it looks like without the imports:

String cql = "SELECT cmis:objectId, cmis:name, cmis:contentStreamLength FROM cmis:document"


ItemIterable<QueryResult> results = session.query(cql, false)
results.each { hit ->

hit.properties.each { println "${it.queryName}: ${it.firstValue}" }

println "--------------------------------------"

}

println "--------------------------------------" println "Total number: ${results.totalNumItems}" println "Has more: ${results.hasMoreItems}" println "--------------------------------------"

The Apache Chemistry OpenCMIS InMemory Repository ships with some sample data so when I execute the Groovy script, I’ll see something like:

cmis:contentStreamLength: 33216 cmis:name: My_Document-0-1 cmis:objectId: 134 -------------------------------------- cmis:contentStreamLength: 33226 cmis:name: My_Document-1-0 cmis:objectId: 130 -------------------------------------- cmis:contentStreamLength: 33718 cmis:name: My_Document-2-0 cmis:objectId: 105 -------------------------------------- cmis:contentStreamLength: 33617 cmis:name: My_Document-2-1 cmis:objectId: 122 -------------------------------------- cmis:contentStreamLength: 33807 cmis:name: My_Document-2-2 cmis:objectId: 129 -------------------------------------- cmis:contentStreamLength: 33364 cmis:name: My_Document-2-1 cmis:objectId: 128 -------------------------------------- cmis:contentStreamLength: 33506 cmis:name: My_Document-2-1 cmis:objectId: 112 -------------------------------------- cmis:contentStreamLength: 33567 cmis:name: My_Document-2-1 cmis:objectId: 106 -------------------------------------- cmis:contentStreamLength: 33230 cmis:name: My_Document-2-2 cmis:objectId: 107 -------------------------------------- cmis:contentStreamLength: 33774 cmis:name: My_Document-1-1 cmis:objectId: 115 -------------------------------------- cmis:contentStreamLength: 33524 cmis:name: My_Document-2-0 cmis:objectId: 121 -------------------------------------- cmis:contentStreamLength: 33593 cmis:name: My_Document-2-0 cmis:objectId: 111 -------------------------------------- cmis:contentStreamLength: 34152 cmis:name: My_Document-2-2 cmis:objectId: 123 -------------------------------------- cmis:contentStreamLength: 33332 cmis:name: My_Document-0-0 cmis:objectId: 133 -------------------------------------- cmis:contentStreamLength: 33478 cmis:name: My_Document-1-2 cmis:objectId: 116 -------------------------------------- cmis:contentStreamLength: 33541 cmis:name: My_Document-1-2 cmis:objectId: 132 -------------------------------------- cmis:contentStreamLength: 33225 cmis:name: My_Document-2-0 cmis:objectId: 127 -------------------------------------- cmis:contentStreamLength: 33333 cmis:name: My_Document-2-2 cmis:objectId: 113 -------------------------------------- cmis:contentStreamLength: 33698 cmis:name: My_Document-1-0 cmis:objectId: 114 -------------------------------------- cmis:contentStreamLength: 33746 cmis:name: My_Document-0-2 cmis:objectId: 135 -------------------------------------- cmis:contentStreamLength: 33455 cmis:name: My_Document-1-1 cmis:objectId: 131 -------------------------------------- -------------------------------------- Total number: 21 Has more: false --------------------------------------

So that query returned three properties, cmis:objectId, cmis:name, and cmis:contentStreamLength, of every object in the repository that is of type cmis:document. We could have restricted the query further with a where clause that tested specific property values or even the full-text content of the files.

Now I also happen to be running Alfresco, which is an open source ECM repository. The beauty of CMIS is demonstrated by the fact that I can run that exact same Groovy script against Alfresco. I simply have to reconnect using Alfresco’s service URL, which is http://localhost:8080/alfresco/cmisatom (for the Atom Pub binding). My local Alfresco repository has many more objects than my OpenCMIS InMemory repository so I won’t list the output here, but the code runs successfuly unchanged.

Readers who spend their days enjoying standardized SQL that works across databases or the benefits of ORM tools that abstract their code from any specific relational database will no doubt be unimpressed by this feat. But I promise you that those of us who have to work with ECM repositories like SharePoint, Documentum, FileNet, and Alfresco, sometimes all on the same project, are rejoicing.

The next time you need to integrate with an ECM repository, CMIS should definitely be on your radar. I’ve created a list of CMIS resources to help you.

6 comments

August 21, 2013 at 5:47 pm

IanTruscott says:

Really nice CMIS primer by @jeffpotts01 “CMIS: An open API for managing content” http://t.co/djUIqjPdNT (#sdltridion also supports #CMIS)
August 21, 2013 at 5:48 pm

AsierFernandez says:

RT @IanTruscott: Really nice CMIS primer by @jeffpotts01 “CMIS: An open API for managing content” http://t.co/djUIqjPdNT (#sdltridion also …
August 21, 2013 at 9:04 pm

chrismrgn says:

RT @IanTruscott: Really nice CMIS primer by @jeffpotts01 “CMIS: An open API for managing content” http://t.co/djUIqjPdNT (#sdltridion also …
August 22, 2013 at 9:29 am

Victor says:

Hello Jeff,

I’m trying to migrate from a connector based on JSR-170 spec to a CMIS connector. The problem I have is CMIS don’t map the type “d:content”. That’s a problem for me because there are many data in two properties with that type.
What’s the solution in CMIS? Maybe I have to use rendiitions from now, but how can I migrate all data that are in that properties if from CMIS I can’t access to that properties?

Thanks!
August 23, 2013 at 3:01 am

_MBauer_ says:

http://t.co/OOzoMwlx4i CMIS: An open #API for managing #content. Pretty good article about #CMIS and #Apache #Chemistry by @jeffpotts01
Pingback: CMIS example: Uploading multiple files to a CMIS repository | ecmarchitect.com

Comments are closed.