Happy 5th Birthday, Alfresco Developer Guide!

Birthday Cake by Will ClaytonIt is hard to believe that the Alfresco Developer Guide was published five years ago this month (really, the tail end of October). My goal at the time was to help flatten the learning curve around the Alfresco platform and encourage people still using legacy ECM systems to make the leap. Based on the number of people who come up to me at meetups, conferences, and other events to tell me how much the book helped their projects, their teams, and even their careers, I’d say that goal was met and that makes me very happy.

The product and the company have definitely evolved a lot since 2008. The chapter on WCM is no longer relevant. The section on Single Sign-On is out-of-date. The book was written before Alfresco Share existed. And, at the time, jBPM was the only workflow engine in the product and Solr was not yet on the scene, nor was mobile. Both JCR and the native Web Services APIs have given way to CMIS as the preferred API. And Maven and AMPs are the recommended way to manage dependencies, run builds, and package extensions.

But the fundamentals of content modeling haven’t changed. Rules, actions, and behaviors are still great ways to automate content processing. Web Scripts are vital to any developer’s understanding of Alfresco, including those doing Share customizations. And, though the preferred workflow engine is Activiti rather than jBPM, the basics of how to design and deploy content-centric business processes in Alfresco haven’t changed that much.

So where do we go from here? The book was originally based on a set of tutorials I published here on ecmarchitect. Last year I created second editions of many of the tutorials to catch them up with major advancements in the platform. For example, the custom content types tutorial now includes Share configuration and CMIS. The custom actions tutorial now includes how to write Share UI actions. And the workflow tutorial now assumes Activiti and Alfresco Share rather than jBPM and Alfresco Explorer.

The source code that accompanied the book lives in Google Code, but I recently moved the source code that accompanies the tutorials to GitHub. I’m busy refactoring the tutorial source code to use Maven and AMPs. I’ve also started moving the actual tutorial text to markdown and am checking it in to GitHub so that the community can help me revise it and keep it up-to-date.

I learned a lot writing that first book. One of the lessons is to always work with co-authors. That made a big difference on CMIS and Apache Chemistry in Action. I hope that book helps as many people as the Alfresco Developer Guide did and I look forward to reflecting back on how CMIS has changed on that book’s fifth birthday in 2018.

Alfresco Community Edition 4.2.e Now Available

Alfresco Community LogoThe Alfresco engineering team has just released Alfresco Community Edition 4.2.e (Download, Release Notes). This is the final Community Edition release in the 4.2 line before 4.2 Enterprise is released.

This Community Edition release is mainly fixes for bugs found since 4.2.d and contains no new major features.

In a recent press release, we announced that Alfresco One 4.2 as well as the much-anticipated Records Management 2.1 release will be available on October 29. However, on the 4.2.e file list page on the wiki I notice that there is an RM 2.1 module available for download.

If you need the source, the public SVN revision number for Alfresco Community 4.2.e is 56674 and it has been tagged as COMMUNITYTAGS/V4.2e.

Reminder: Sign Up for Alfresco Summit Hack-a-Thon

Don’t forget, folks. If you DevCon 2012 San Jose Hack-a-Thonare planning on arriving in time for Day 0 of Alfresco Summit in either Barcelona (Nov 4) or Boston (Nov 12) so that you can participate in the hack-a-thon, you need to let us know so we can plan for food. Space is limited, so people who sign-up ahead of time using of these forms will be given priority (Barcelona, Boston).

 

Notes from Alfresco Office Hours

Today we had another Live Alfresco Office Hours session on Google Hangouts on Air. If you missed it, here is the recording. Below that I’ve got rough notes on what was discussed.

What’s new in Alfresco 4.2.d Community Edition

  • New Share look-and-feel, including ribbon
  • My Files
  • Shared Files
  • Document library enhancements (Filmstrip view, Table view)
  • Saved search
  • Folder Templates in Share (Yes, rules come across)
  • Manage permissions
  • Admin console
  • User Trashcan

Did you know you can do a proximity search? Allows you to search for words within a specified proximity of one other. Examples: big *(2) apple finds results where those two words are within two words of each other.

Did you know you can index inside of zips? There is a setting that must be changed in order to enable this.

Did you know you can use the “ALL” keyword to search all metadata instead of the default fields that are normally indexed? Example: ALL:SomeCo finds any document with “SomeCo” in any property. (Thanks, Peter Lofgren, who contributed that tidbit via IRC during the session).

Nightly build now includes PT and Norwegian language packs, both community contributions.

Question from IRC: Alfresco and NoSQL–Any plans to support NoSQL as a back-end for metadata?

  • The Big Data Server is still on the roadmap
  • Lots of people publish content out of Alfresco to a NoSQL target
  • Other than that, no plans

Alfresco Summit – Don’t miss out, register at http://summit.alfresco.com

Alfresco community survey

REMINDER: Alfresco Forge will be turned off completely in December. Add-Ons that point to Forge will be deleted on December 1.

Next Tech Talk Live is October 2. We’ll be talking to Dave Draper about Share customizations in 4.2

Recent meetups:

  • New York, September 24.
  • Chicago, September 25.

Upcoming meetups:

  • Seville, October 8.
  • Madrid, October 18
  • Leuven, Friday December 13.
  • Helsinki, TBD

Thanks to everyone who joined us live or watched the recording. We’ll do it again in two weeks.

 

Alfresco 4.2.d: The Public API & some CMIS changes you should be aware of

Well, Alfresco Community Edition 4.2.d has been out for a couple of weeks now and everyone I’ve talked to seems pretty excited about the new release. The new ribbon header in the Share user interface, the filmstrip view, and the table view for the document library seems to have gotten all of the attention (and it does look sharp) but my favorite new feature is actually behind-the-scenes: The Alfresco Public API has finally made its way from Alfresco in the cloud to on-premise.

What is the Alfresco Public API?

We’ve talked about it a lot over the last year, but it’s always been cloud-only, so you’re forgiven if you need a refresher. Basically, it means that we now have a documented, versioned, and backward compatible API that you can use to do things like:

  • Get a list of sites a user can see
  • Get some information about a site, including its members
  • Add, update, or remove someone from a site
  • Get information about a person, including their favorite sites and preferences
  • Get the tags for a node, including updating a tag on a node
  • Create, update, and delete comments on a node
  • Like/unlike a node
  • Do everything that CMIS can do

The Public API is comprised of CMIS for doing things like performing CRUD functions on documents and folders, querying for content, modifying ACLs, and creating relationships plus non-CMIS calls for things that CMIS doesn’t cover. When you run against Alfresco in the cloud you use OAuth2 for authentication. When you run against Alfresco on-premise, you use basic authentication.

Some Examples

I created some Java API examples for cloud a while ago. This week I spent some time on those examples to get them cleaned up a bit here and there and to have a really clear separation between what’s specific to running against Alfresco on-premise versus Alfresco in the cloud and what’s common to both. The result is a single set of code that runs against either one.

Here’s how the project is set up:

The remaining classes in com.alfresco.api.example are the actual examples. These would probably be better structured as unit tests but for now, they are just runnable Java classes. Currently there are only four:

  • GetSitesExample.java simply fetches the short names of up to 10 sites the current user has access to.
  • CmisRepositoryInfoExample.java is the simplest CMIS example you can write. It retrieves some basic information about the repository.
  • CmisCreateDocumentExample.java creates a folder, creates a document, likes the folder, and comments on the document. This is a mix of CMIS and non-CMIS calls.
  • CmisAspectExample.java imports some images, uses Apache Tika to extract some metadata from the binary image file, then uses the OpenCMIS Extension to store those metadata values in properties defined in aspects. The extension has to be used because CMIS 1.0 doesn’t support aspects.

Take a look at the readme.md file to see what changes you need to make to run these examples in your own environment.

New CMIS URLs in Alfresco Community Edition 4.2.d

One thing to notice is that starting with this release there are two new CMIS URLs. Which one you use depends on which version of CMIS you would like to leverage. If you want to use CMIS 1.0, the URL is:
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.0/atom

If you would rather use CMIS 1.1, the URL is:
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom

New with 4.2.d is the CMIS 1.1 browser binding. Currently Alfresco in the cloud only supports Atom Pub, but if you are only running against on-premise, you might decide that the browser binding, which uses JSON instead XML, is more performant. If you want to use the browser binding, the service URL is:

http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser

CMIS Repository ID and Object ID Changes

When using the new URLs, you’ll notice a couple of things right away. First, the repositoryId is coming back as “-default-” instead of the 32-byte string it used to return.

Second, the format of the object IDs has changed. The CMIS specification dictates that object IDs are supposed to be opaque. You shouldn’t build any logic on the content of a CMIS object ID. However, if you have built a client application that stores object IDs, you may want to pay particular attention to this when you test against the new version. The object IDs for the same object will be different than they were when using the old CMIS URLs.

You can see this in 4.2.d. If I use the old “/alfresco/cmisatom” URL, and I ask for the ID of the root folder, I’ll get:
workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc

But if I use either the new CMIS 1.0 URL or the new CMIS 1.1 URL and ask for the ID of the same root folder, I’ll get:
ae045ec3-5578-4c7b-b5a6-f893810e63dc

In order to minimize the impact of this change, you can issue a getObject call using either the old ID or the new ID and you’ll get the same object back. And if you ask for the object’s ID you’ll get it back in the same format that was used to retrieve it, as shown in this Python example:

>>> repo.getObject('ae045ec3-5578-4c7b-b5a6-f893810e63dc').id
'ae045ec3-5578-4c7b-b5a6-f893810e63dc'
>>> repo.getObject('ae045ec3-5578-4c7b-b5a6-f893810e63dc').name
u'Company Home'
>>> repo.getObject('workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc').id
'workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc'
>>> repo.getObject('workspace://SpacesStore/ae045ec3-5578-4c7b-b5a6-f893810e63dc').name
u'Company Home'

Still Missing Some CMIS 1.1 Goodness

The browser binding is great but 4.2.d is still missing some of the CMIS 1.1 goodness. For example, CMIS 1.1 defines a new concept called “secondary types”. In Alfresco land we call those aspects. Unfortunately, 4.2.d does not yet appear to have full support for aspects. You can see what aspects a node has applied by inspecting the cmis:secondaryObjectTypeIds property, which is useful. But in my test, changing the value of that property did not affect the aspects on the object as I would expect if secondary type support was fully functional.

Summary

The Public API is new. There are still a lot of areas not covered by either CMIS or the rest of the Alfresco API. Hopefully those gaps will close over time. If you have an opinion on what should be the next area to be added to the Alfresco API, please comment here or mention it to your account rep.

An Alfresco Summit letter-to-your-boss template for every type of boss

Office SpaceThese days every conference seems to provide a pretty standard letter-to-your-boss template. And they may be helpful for people who have normal, rational bosses. But we all know there are many different types of bosses out there and some require specialized strategies for getting them to do what you need them to do. That’s why the Alfresco Summit team has put together a list of letter-to-your-boss templates with one for every type of boss.

Don’t see your boss type listed? Let us know. We don’t want you to miss out on the conference!

Help us improve the ecosystem: Take the Alfresco Community 2013 Survey

Surveying by IamNotUniqueHopefully, you find the Alfresco community team and our company in general extremely approachable. My perception is that people aren’t afraid to reach out and give us feedback directly, which is exactly how we like it. But I think it is important for us to stop and ask for specific feedback on areas we are tracking on a regular basis. That’s why we do a survey of the Alfresco community every year.

So if you are involved at all with Alfresco, whether you are an Alfresco One subscriber, a partner, an employee, or a Community Edition user, please take 15 minutes to complete the survey.

We’ll use your feedback to make the Alfresco ecosystem a better place. If you’ve participated in the past, I hope you’ve seen some of your suggestions become a reality as we continuously improve over time. I’m hoping to consolidate the feedback and report on the results during my session at this year’s Alfresco Summit.

As a small way to say thanks for your time, we’re going to randomly draw two prize winners from valid, completed surveys. Each lucky winner will receive a $250 USD gift card from Amazon (see survey for contest rules and details).

CMIS example: Uploading multiple files to a CMIS repository

In my previous post on CMIS, I introduced the Content Management Interoperability Services (CMIS) specification and the Apache Chemistry project. You learned that CMIS gives you a language-neutral, vendor-independent way to perform CRUD functions against any CMIS-compliant server using a standard API. I showed a simple CMIS query being executed from a Groovy script running in the OpenCMIS Workbench.

Now I’d like to get a little more detailed and show a simple use case: I’ll use the OpenCMIS client library for Java to upload some files from my local machine to a CMIS repository. In my case, that repository is Alfresco 4.2.c Community Edition running locally, but this code should work with any CMIS-compliant server from vendors like IBM, EMC, Microsoft, Nuxeo, and so on. I’ll include the relevant snippets, but if you want to follow along, the full source code for this example lives here. I use that example to show the same code working against Alfresco in the cloud and Alfresco on-premise. If you are running against on-premise only or some other CMIS server, it has a few dependencies that won’t be relevant to you.

LoadFiles.java is a runnable Java class. The main method simply calls doExample(). That method grabs a session, gets a handle to the destination folder for the files on the local machine, and then, for each file in the local machine’s directory, it creates a hashmap of metadata values, then uploads each file and its associated metadata to the repository. Let’s look at each of these pieces in turn.

Get a Session

The first thing you need is a session. I have a getCmisSession() method that knows how to get one, and it looks like this:

SessionFactory factory = SessionFactoryImpl.newInstance();
Map parameter = new HashMap();

// connection settings
parameter.put(SessionParameter.ATOMPUB_URL, ATOMPUB_URL);
parameter.put(SessionParameter.BINDING_TYPE, BindingType.ATOMPUB.value());
parameter.put(SessionParameter.AUTH_HTTP_BASIC, "true");
parameter.put(SessionParameter.USER, USER_NAME);
parameter.put(SessionParameter.PASSWORD, PASSWORD);

List repositories = factory.getRepositories(parameter);

return repositories.get(0).createSession();

As you can see, establishing a session is as simple as providing the username, password, binding, and service URL. The binding is the protocol we’re going to use to talk to the server. In CMIS 1.0, usually the best choice is the Atom Pub binding because it is faster than the Web Services binding, the only other alternative. CMIS 1.1 adds a browser binding that is based on HTML forms and JSON but I won’t cover that here.

The other parameter that gets set is the service URL. This is server-specific. For Alfresco 4.x or higher, the CMIS 1.0 Atom Pub URL is http://localhost:8080/alfresco/cmisatom.

The last thing the method does is return a session for a specific repository. CMIS servers can serve more than one repository. In Alfresco’s case, there is only ever one, so it is safe to return the first one in the list.

Get the Target Folder

Okay, we’ve got a session with the CMIS server’s repository. The repository is a hierarchical tree of objects (folders and documents) similar to a local file system. The class is configured with a parent folder path and the name of a new folder that should be created in that parent folder. So the first thing we need to do is get a reference to the parent folder. My example getParentFolder() method just grabs the folder by path, like this:

Folder folder = (Folder) cmisSession.getObjectByPath(FOLDER_PATH);
return folder;

Now, given the parent folder and the name of a new folder, the createFolder() method attempts to create the new folder to hold our files:

Folder subFolder = null;
try {
  subFolder = (Folder) cmisSession.getObjectByPath(parentFolder.getPath() + "/" + folderName);
  System.out.println("Folder already existed!");
} catch (CmisObjectNotFoundException onfe) {
  Map props = new HashMap();
  props.put("cmis:objectTypeId",  "cmis:folder");
  props.put("cmis:name", folderName);
  subFolder = parentFolder.createFolder(props);
  String subFolderId = subFolder.getId();
  System.out.println("Created new folder: " + subFolderId);
}
return subFolder;

The folder is either going to already exist, in which case we’ll just grab it and return it, or it will need to be created. We can test the existence of the folder by trying to fetch it by path, and if it throws a CmisObjectNotFoundException, we’ll create it.

Look at the Map that is getting set up to hold the properties of the folder. The minimum required properties that need to be passed in are the type of folder to be created (“cmis:folder”) and the name of the folder to create. You might choose to extend your server’s content model with your own folder types. In this example, the out-of-the-box “cmis:folder” type is fine.

Set Up the Properties For Each New Document

Just like when the folder was created, every file we upload to the repository will have its own set of metadata. To make it interesting, though, we’ll provide more than just the type of document we want to create and the name of the document. In my example, I’m using a content model we created for the CMIS & Apache Chemistry in Action book. It contains several types. One of which is called “cmisbook:image”. The image type has attributes you’d expect that would be part of an image, like height, width, focal length, camera make, ISO speed, etc. In fact, if you use the OpenCMIS Workbench, you can inspect the type definition for cmisbook:image. Here’s a screenshot (click to enlarge):

OpenCMIS Workbench, Type Inspector

Two of the properties I’m going to work with in this example are the latitude and longitude. Alfresco will automatically extract metadata like this when you add files to the repository. In fact, Alfresco already has a “geographic aspect” out-of-the-box that can be used to extract and store lat and long. But we wanted this content model to work with any CMIS repository and not all repositories support aspects (CMIS 1.1 call these “secondary types”) so the content model used in the book defines lat and long on the cmisbook:image type.

Because not all repositories know how to extract metadata, we’re going to use Apache Tika to do it in our client app.

The getProperties() method does this work. It returns a Map of properties that consists of the type of the object we want to create (“cmisbook:image”), the name of the object (the file name being uploaded), and the latitude and longitude. Here’s what that code looks like:

Map props = new HashMap();

String fileName = file.getName();
System.out.println("File: " + fileName);
InputStream stream = new FileInputStream(file);
try {
  Metadata metadata = new Metadata();
  ContentHandler handler = new DefaultHandler();
  Parser parser = new JpegParser();
  ParseContext context = new ParseContext();

  metadata.set(Metadata.CONTENT_TYPE, FILE_TYPE);

  parser.parse(stream, handler, metadata, context);
  String lat = metadata.get("geo:lat");
  String lon = metadata.get("geo:long");
  stream.close();

  // create a map of properties
  props.put("cmis:objectTypeId",  objectTypeId);
  props.put("cmis:name", fileName);
  if (lat != null && lon != null) {
    System.out.println("LAT:" + lat);
    System.out.println("LON:" + lon);
    props.put("cmisbook:gpsLatitude", BigDecimal.valueOf(Float.parseFloat(lat)));
    props.put("cmisbook:gpsLongitude", BigDecimal.valueOf(Float.parseFloat(lon)));
  }
} catch (TikaException te) {
  System.out.println("Caught tika exception, skipping");
} catch (SAXException se) {
  System.out.println("Caught SAXException, skipping");
} finally {
  if (stream != null) {
    stream.close();
  }
}
return props;

Now we have everything we need to upload the file to the repository: a session, the target folder, and a map of properties for each object being uploaded. All that’s left to do is upload the file.

Upload the File

The first thing the createDocument() method does is to make sure that we have a Map with the minimal set of metadata, which is the object type and the name. It’s conceivable that things didn’t go well in the getProperties() method, and if that is the case, this bit of code makes sure everything is in place:


String fileName = file.getName();

// create a map of properties if one wasn't passed in
if (props == null) {
  props = new HashMap<String, Object>();
}

// Add the object type ID if it wasn't already
if (props.get("cmis:objectTypeId") == null) {
  props.put("cmis:objectTypeId",  "cmis:document");
}

// Add the name if it wasn't already
if (props.get("cmis:name") == null) {
  props.put("cmis:name", fileName);
}

Next we use the file and the object factory on the CMIS session to set up a ContentStream object:

ContentStream contentStream = cmisSession.getObjectFactory().
  createContentStream(
    fileName,
    file.length(),
    fileType,
    new FileInputStream(file)
  );

And finally, the file can be uploaded.

Document document = null;
try {
  document = parentFolder.createDocument(props, contentStream, null);
  System.out.println("Created new document: " + document.getId());
} catch (CmisContentAlreadyExistsException ccaee) {
  document = (Document) cmisSession.getObjectByPath(parentFolder.getPath() + "/" + fileName);
  System.out.println("Document already exists: " + fileName);
}
return document;

Similar to the folder creating logic earlier, it could be that the document already exists, so we use the same find-or-create pattern here.

When I run this locally using a folder that contains five pics I snapped in Berlin, the output looks like this:

Created new folder: workspace://SpacesStore/2f576635-5058-4053-9a61-dad68939fdd2
File: augustiner.jpg
LAT:52.51387
LON:13.39111
Created new document: workspace://SpacesStore/b19755e1-74a2-4c1e-9eb5-a5bfd2c0ebd7;1.0
File: berlin_cathedral.jpg
LAT:52.51897
LON:13.39936
Created new document: workspace://SpacesStore/34aa7b80-9f09-4c07-a040-9aee94debf80;1.0
File: brandenburg.jpg
LAT:52.51622
LON:13.37783
Created new document: workspace://SpacesStore/6c02f8f6-accc-4997-be5c-601bc7131247;1.0
File: gendarmenmarkt.jpg
LAT:52.51361
LON:13.39278
Created new document: workspace://SpacesStore/44ff28e7-782a-46c3-b388-453fd8495472;1.0
File: old_museum.jpg
LAT:52.52039
LON:13.39926
Created new document: workspace://SpacesStore/03a85605-4a66-4f94-b423-82502efbca4a;1.0

Now Run Against Another Vendor’s Repo

What’s kind of cool, and what I think really demonstrates the great thing about CMIS, is that you can run this class against any CMIS repository, virtually unchanged. To demonstrate this, I’ll fire up the Apache Chemistry InMemory Repository we ship with the source code that accompanies the book because it is already configured with a custom content model that includes “cmisbook:image”. As the name suggests, this repository is a reference CMIS server available from Apache Chemistry that runs entirely in-memory.

To run the class against the Apache Chemistry InMemory Repository, we have to change the service URL and the content type ID, like this:

//public static final String CONTENT_TYPE = "D:cmisbook:image";
public static final String CONTENT_TYPE = "cmisbook:image";

//public static final String ATOMPUB_URL = ALFRESCO_API_URL + "alfresco/cmisatom";
public static final String ATOMPUB_URL = ALFRESCO_API_URL + "inmemory/atom";

And when I run the class, my photos get uploaded to a completely different repository implementation.

That’s It!

That’s a simple example, I know, but it illustrates fetching objects, creating new objects, including those of custom types, setting metadata, and handling exceptions all through an industry-standard API. There is a lot more to CMIS and OpenCMIS, in particular. I invite you to learn more by diving in to CMIS & Apache Chemistry in Action!

My presentations from Alfresco Day Sydney

Well over a hundred people showed up to Alfresco Day Sydney today to spend a day hearing from customers, partners, and Alfrescans about the platform. We’ll get all of the talks uploaded somewhere. Mine are on slideshare:

I enjoyed meeting everyone and hearing about the wonderful things you are doing with Alfresco. I look forward to running into more of you online and in-person.

CMIS: An open API for managing content

Most of the content in a company is completely unstructured. Just think about the documents you collaborate on with the rest of your team throughout the day. They might include things like proposals, architecture diagrams, presentations, invoices, screenshots, videos, books, meeting notes, or pictures from your last company get-together.

How does a company organize all of that content? Often it is scattered across file shares and employee hard drives. It isn’t really organized at all. It’s hard enough to simply find content in that environment, but what about answering questions like:

  • Is this the latest version and how has it changed over time?
  • Which customer is this document related to?
  • Who is allowed to read or make changes to this document?
  • How long are we legally required to keep this document?
  • When I’m done making my change to this document, what is the next step in the process?

To address this, companies will often write content-centric applications that try to put some order to the chaos. But most of our content resides in files, and files can be a pain to work with. Databases can store files up to a certain file size, but they aren’t great for working with audio and video. File systems solve that problem but they alone don’t offer rich functionality like the ability to track complex metadata with each file or the ability to easily full-text index and then run searches across all of your content.

That’s where a content repository comes in. You might hear these referred to as a Document Management (DM) system or an Enterprise Content Management (ECM) system. No matter what you call it, they are purpose-built for making it easier for your company to get a handle on its file-based content.

Here’s the problem for developers, though: There is a lot of repository software out there. Most large companies have more than one up-and-running in their organization, and every one of them has their own API. It’s rare that these systems exist in a vacuum. They often need to feed and consume business processes and that takes code. So if you are an enterprise developer, and you are trying to integrate some of your systems with your ECM repositories, you’ve got multiple API’s you need to learn. Or, if you are a software vendor, and you are trying to build a solution that requires a rich content repository as a back-end, you either have to choose a specific back-end to support or you have to write adapters to support a handful of repositories.

The solution to this problem is called Content Management Interoperability Services (CMIS). It’s an industry-wide specification managed by OASIS. It describes a domain language, a query language, and multiple protocols for working with a content repository. With CMIS, developers write against the CMIS API instead of learning each repository’s proprietary API, and their applications will work with any CMIS-compliant repository.

The first version of the specification became official in May of 2010. The most recent version, 1.1, became official this past May.

Several developers have been busy writing client libraries, server-side libraries, and tools related to CMIS. Many of these are collected as part of an umbrella open source project known as Apache Chemistry (http://chemistry.apache.org). The most active Apache Chemistry sub-project is OpenCMIS. It includes a Java client library (including Android), multiple servers for testing purposes, and some developer tools, such as a Java Swing-based repository browser called OpenCMIS Workbench. Apache Chemistry also includes libraries for Python, .NET, PHP, and Objective-C.

The tools and libraries at Apache Chemistry are a great way to get started with CMIS. For example, I’ve got the Apache Chemistry InMemory Repository deployed to a local Tomcat server. I can fire up OpenCMIS Workbench and connect to the server using its service URL, http://localhost:8080/chemistry/browser. Once I do that I can navigate the repository’s folder hiearchy, inspecting or performing actions against objects along they way.

The OpenCMIS Workbench has a built-in Groovy console. One of the examples that ships with the Workbench is “Execute a Query”. Here’s what it looks like without the imports:

String cql = "SELECT cmis:objectId, cmis:name, cmis:contentStreamLength FROM cmis:document"

ItemIterable<QueryResult> results = session.query(cql, false)

results.each { hit ->
hit.properties.each { println "${it.queryName}: ${it.firstValue}" }
println "--------------------------------------"
}

println "--------------------------------------"
println "Total number: ${results.totalNumItems}"
println "Has more: ${results.hasMoreItems}"
println "--------------------------------------"

The Apache Chemistry OpenCMIS InMemory Repository ships with some sample data so when I execute the Groovy script, I’ll see something like:

cmis:contentStreamLength: 33216
cmis:name: My_Document-0-1
cmis:objectId: 134
--------------------------------------
cmis:contentStreamLength: 33226
cmis:name: My_Document-1-0
cmis:objectId: 130
--------------------------------------
cmis:contentStreamLength: 33718
cmis:name: My_Document-2-0
cmis:objectId: 105
--------------------------------------
cmis:contentStreamLength: 33617
cmis:name: My_Document-2-1
cmis:objectId: 122
--------------------------------------
cmis:contentStreamLength: 33807
cmis:name: My_Document-2-2
cmis:objectId: 129
--------------------------------------
cmis:contentStreamLength: 33364
cmis:name: My_Document-2-1
cmis:objectId: 128
--------------------------------------
cmis:contentStreamLength: 33506
cmis:name: My_Document-2-1
cmis:objectId: 112
--------------------------------------
cmis:contentStreamLength: 33567
cmis:name: My_Document-2-1
cmis:objectId: 106
--------------------------------------
cmis:contentStreamLength: 33230
cmis:name: My_Document-2-2
cmis:objectId: 107
--------------------------------------
cmis:contentStreamLength: 33774
cmis:name: My_Document-1-1
cmis:objectId: 115
--------------------------------------
cmis:contentStreamLength: 33524
cmis:name: My_Document-2-0
cmis:objectId: 121
--------------------------------------
cmis:contentStreamLength: 33593
cmis:name: My_Document-2-0
cmis:objectId: 111
--------------------------------------
cmis:contentStreamLength: 34152
cmis:name: My_Document-2-2
cmis:objectId: 123
--------------------------------------
cmis:contentStreamLength: 33332
cmis:name: My_Document-0-0
cmis:objectId: 133
--------------------------------------
cmis:contentStreamLength: 33478
cmis:name: My_Document-1-2
cmis:objectId: 116
--------------------------------------
cmis:contentStreamLength: 33541
cmis:name: My_Document-1-2
cmis:objectId: 132
--------------------------------------
cmis:contentStreamLength: 33225
cmis:name: My_Document-2-0
cmis:objectId: 127
--------------------------------------
cmis:contentStreamLength: 33333
cmis:name: My_Document-2-2
cmis:objectId: 113
--------------------------------------
cmis:contentStreamLength: 33698
cmis:name: My_Document-1-0
cmis:objectId: 114
--------------------------------------
cmis:contentStreamLength: 33746
cmis:name: My_Document-0-2
cmis:objectId: 135
--------------------------------------
cmis:contentStreamLength: 33455
cmis:name: My_Document-1-1
cmis:objectId: 131
--------------------------------------
--------------------------------------
Total number: 21
Has more: false
--------------------------------------

So that query returned three properties, cmis:objectId, cmis:name, and cmis:contentStreamLength, of every object in the repository that is of type cmis:document. We could have restricted the query further with a where clause that tested specific property values or even the full-text content of the files.

Now I also happen to be running Alfresco, which is an open source ECM repository. The beauty of CMIS is demonstrated by the fact that I can run that exact same Groovy script against Alfresco. I simply have to reconnect using Alfresco’s service URL, which is http://localhost:8080/alfresco/cmisatom (for the Atom Pub binding). My local Alfresco repository has many more objects than my OpenCMIS InMemory repository so I won’t list the output here, but the code runs successfuly unchanged.

Readers who spend their days enjoying standardized SQL that works across databases or the benefits of ORM tools that abstract their code from any specific relational database will no doubt be unimpressed by this feat. But I promise you that those of us who have to work with ECM repositories like SharePoint, Documentum, FileNet, and Alfresco, sometimes all on the same project, are rejoicing.

The next time you need to integrate with an ECM repository, CMIS should definitely be on your radar. I’ve created a list of CMIS resources to help you.