Month: August 2013

CMIS example: Uploading multiple files to a CMIS repository

In my previous post on CMIS, I introduced the Content Management Interoperability Services (CMIS) specification and the Apache Chemistry project. You learned that CMIS gives you a language-neutral, vendor-independent way to perform CRUD functions against any CMIS-compliant server using a standard API. I showed a simple CMIS query being executed from a Groovy script running in the OpenCMIS Workbench.

Now I’d like to get a little more detailed and show a simple use case: I’ll use the OpenCMIS client library for Java to upload some files from my local machine to a CMIS repository. In my case, that repository is Alfresco 4.2.c Community Edition running locally, but this code should work with any CMIS-compliant server from vendors like IBM, EMC, Microsoft, Nuxeo, and so on. I’ll include the relevant snippets, but if you want to follow along, the full source code for this example lives here. I use that example to show the same code working against Alfresco in the cloud and Alfresco on-premise. If you are running against on-premise only or some other CMIS server, it has a few dependencies that won’t be relevant to you. is a runnable Java class. The main method simply calls doExample(). That method grabs a session, gets a handle to the destination folder for the files on the local machine, and then, for each file in the local machine’s directory, it creates a hashmap of metadata values, then uploads each file and its associated metadata to the repository. Let’s look at each of these pieces in turn.

Get a Session

The first thing you need is a session. I have a getCmisSession() method that knows how to get one, and it looks like this:

SessionFactory factory = SessionFactoryImpl.newInstance();
Map parameter = new HashMap();

// connection settings
parameter.put(SessionParameter.ATOMPUB_URL, ATOMPUB_URL);
parameter.put(SessionParameter.BINDING_TYPE, BindingType.ATOMPUB.value());
parameter.put(SessionParameter.AUTH_HTTP_BASIC, "true");
parameter.put(SessionParameter.USER, USER_NAME);
parameter.put(SessionParameter.PASSWORD, PASSWORD);

List repositories = factory.getRepositories(parameter);

return repositories.get(0).createSession();

As you can see, establishing a session is as simple as providing the username, password, binding, and service URL. The binding is the protocol we’re going to use to talk to the server. In CMIS 1.0, usually the best choice is the Atom Pub binding because it is faster than the Web Services binding, the only other alternative. CMIS 1.1 adds a browser binding that is based on HTML forms and JSON but I won’t cover that here.

The other parameter that gets set is the service URL. This is server-specific. For Alfresco 4.x or higher, the CMIS 1.0 Atom Pub URL is http://localhost:8080/alfresco/cmisatom.

The last thing the method does is return a session for a specific repository. CMIS servers can serve more than one repository. In Alfresco’s case, there is only ever one, so it is safe to return the first one in the list.

Get the Target Folder

Okay, we’ve got a session with the CMIS server’s repository. The repository is a hierarchical tree of objects (folders and documents) similar to a local file system. The class is configured with a parent folder path and the name of a new folder that should be created in that parent folder. So the first thing we need to do is get a reference to the parent folder. My example getParentFolder() method just grabs the folder by path, like this:

Folder folder = (Folder) cmisSession.getObjectByPath(FOLDER_PATH);
return folder;

Now, given the parent folder and the name of a new folder, the createFolder() method attempts to create the new folder to hold our files:

Folder subFolder = null;
try {
  subFolder = (Folder) cmisSession.getObjectByPath(parentFolder.getPath() + "/" + folderName);
  System.out.println("Folder already existed!");
} catch (CmisObjectNotFoundException onfe) {
  Map props = new HashMap();
  props.put("cmis:objectTypeId",  "cmis:folder");
  props.put("cmis:name", folderName);
  subFolder = parentFolder.createFolder(props);
  String subFolderId = subFolder.getId();
  System.out.println("Created new folder: " + subFolderId);
return subFolder;

The folder is either going to already exist, in which case we’ll just grab it and return it, or it will need to be created. We can test the existence of the folder by trying to fetch it by path, and if it throws a CmisObjectNotFoundException, we’ll create it.

Look at the Map that is getting set up to hold the properties of the folder. The minimum required properties that need to be passed in are the type of folder to be created (“cmis:folder”) and the name of the folder to create. You might choose to extend your server’s content model with your own folder types. In this example, the out-of-the-box “cmis:folder” type is fine.

Set Up the Properties For Each New Document

Just like when the folder was created, every file we upload to the repository will have its own set of metadata. To make it interesting, though, we’ll provide more than just the type of document we want to create and the name of the document. In my example, I’m using a content model we created for the CMIS & Apache Chemistry in Action book. It contains several types. One of which is called “cmisbook:image”. The image type has attributes you’d expect that would be part of an image, like height, width, focal length, camera make, ISO speed, etc. In fact, if you use the OpenCMIS Workbench, you can inspect the type definition for cmisbook:image. Here’s a screenshot (click to enlarge):

OpenCMIS Workbench, Type Inspector

Two of the properties I’m going to work with in this example are the latitude and longitude. Alfresco will automatically extract metadata like this when you add files to the repository. In fact, Alfresco already has a “geographic aspect” out-of-the-box that can be used to extract and store lat and long. But we wanted this content model to work with any CMIS repository and not all repositories support aspects (CMIS 1.1 call these “secondary types”) so the content model used in the book defines lat and long on the cmisbook:image type.

Because not all repositories know how to extract metadata, we’re going to use Apache Tika to do it in our client app.

The getProperties() method does this work. It returns a Map of properties that consists of the type of the object we want to create (“cmisbook:image”), the name of the object (the file name being uploaded), and the latitude and longitude. Here’s what that code looks like:

Map props = new HashMap();

String fileName = file.getName();
System.out.println("File: " + fileName);
InputStream stream = new FileInputStream(file);
try {
  Metadata metadata = new Metadata();
  ContentHandler handler = new DefaultHandler();
  Parser parser = new JpegParser();
  ParseContext context = new ParseContext();

  metadata.set(Metadata.CONTENT_TYPE, FILE_TYPE);

  parser.parse(stream, handler, metadata, context);
  String lat = metadata.get("geo:lat");
  String lon = metadata.get("geo:long");

  // create a map of properties
  props.put("cmis:objectTypeId",  objectTypeId);
  props.put("cmis:name", fileName);
  if (lat != null && lon != null) {
    System.out.println("LAT:" + lat);
    System.out.println("LON:" + lon);
    props.put("cmisbook:gpsLatitude", BigDecimal.valueOf(Float.parseFloat(lat)));
    props.put("cmisbook:gpsLongitude", BigDecimal.valueOf(Float.parseFloat(lon)));
} catch (TikaException te) {
  System.out.println("Caught tika exception, skipping");
} catch (SAXException se) {
  System.out.println("Caught SAXException, skipping");
} finally {
  if (stream != null) {
return props;

Now we have everything we need to upload the file to the repository: a session, the target folder, and a map of properties for each object being uploaded. All that’s left to do is upload the file.

Upload the File

The first thing the createDocument() method does is to make sure that we have a Map with the minimal set of metadata, which is the object type and the name. It’s conceivable that things didn’t go well in the getProperties() method, and if that is the case, this bit of code makes sure everything is in place:

String fileName = file.getName();

// create a map of properties if one wasn't passed in
if (props == null) {
  props = new HashMap<String, Object>();

// Add the object type ID if it wasn't already
if (props.get("cmis:objectTypeId") == null) {
  props.put("cmis:objectTypeId",  "cmis:document");

// Add the name if it wasn't already
if (props.get("cmis:name") == null) {
  props.put("cmis:name", fileName);

Next we use the file and the object factory on the CMIS session to set up a ContentStream object:

ContentStream contentStream = cmisSession.getObjectFactory().
    new FileInputStream(file)

And finally, the file can be uploaded.

Document document = null;
try {
  document = parentFolder.createDocument(props, contentStream, null);
  System.out.println("Created new document: " + document.getId());
} catch (CmisContentAlreadyExistsException ccaee) {
  document = (Document) cmisSession.getObjectByPath(parentFolder.getPath() + "/" + fileName);
  System.out.println("Document already exists: " + fileName);
return document;

Similar to the folder creating logic earlier, it could be that the document already exists, so we use the same find-or-create pattern here.

When I run this locally using a folder that contains five pics I snapped in Berlin, the output looks like this:

Created new folder: workspace://SpacesStore/2f576635-5058-4053-9a61-dad68939fdd2
File: augustiner.jpg
Created new document: workspace://SpacesStore/b19755e1-74a2-4c1e-9eb5-a5bfd2c0ebd7;1.0
File: berlin_cathedral.jpg
Created new document: workspace://SpacesStore/34aa7b80-9f09-4c07-a040-9aee94debf80;1.0
File: brandenburg.jpg
Created new document: workspace://SpacesStore/6c02f8f6-accc-4997-be5c-601bc7131247;1.0
File: gendarmenmarkt.jpg
Created new document: workspace://SpacesStore/44ff28e7-782a-46c3-b388-453fd8495472;1.0
File: old_museum.jpg
Created new document: workspace://SpacesStore/03a85605-4a66-4f94-b423-82502efbca4a;1.0

Now Run Against Another Vendor’s Repo

What’s kind of cool, and what I think really demonstrates the great thing about CMIS, is that you can run this class against any CMIS repository, virtually unchanged. To demonstrate this, I’ll fire up the Apache Chemistry InMemory Repository we ship with the source code that accompanies the book because it is already configured with a custom content model that includes “cmisbook:image”. As the name suggests, this repository is a reference CMIS server available from Apache Chemistry that runs entirely in-memory.

To run the class against the Apache Chemistry InMemory Repository, we have to change the service URL and the content type ID, like this:

//public static final String CONTENT_TYPE = "D:cmisbook:image";
public static final String CONTENT_TYPE = "cmisbook:image";

//public static final String ATOMPUB_URL = ALFRESCO_API_URL + "alfresco/cmisatom";
public static final String ATOMPUB_URL = ALFRESCO_API_URL + "inmemory/atom";

And when I run the class, my photos get uploaded to a completely different repository implementation.

That’s It!

That’s a simple example, I know, but it illustrates fetching objects, creating new objects, including those of custom types, setting metadata, and handling exceptions all through an industry-standard API. There is a lot more to CMIS and OpenCMIS, in particular. I invite you to learn more by diving in to CMIS & Apache Chemistry in Action!

My presentations from Alfresco Day Sydney

Well over a hundred people showed up to Alfresco Day Sydney today to spend a day hearing from customers, partners, and Alfrescans about the platform. We’ll get all of the talks uploaded somewhere. Mine are on slideshare:

I enjoyed meeting everyone and hearing about the wonderful things you are doing with Alfresco. I look forward to running into more of you online and in-person.

CMIS: An open API for managing content

Most of the content in a company is completely unstructured. Just think about the documents you collaborate on with the rest of your team throughout the day. They might include things like proposals, architecture diagrams, presentations, invoices, screenshots, videos, books, meeting notes, or pictures from your last company get-together.

How does a company organize all of that content? Often it is scattered across file shares and employee hard drives. It isn’t really organized at all. It’s hard enough to simply find content in that environment, but what about answering questions like:

  • Is this the latest version and how has it changed over time?
  • Which customer is this document related to?
  • Who is allowed to read or make changes to this document?
  • How long are we legally required to keep this document?
  • When I’m done making my change to this document, what is the next step in the process?

To address this, companies will often write content-centric applications that try to put some order to the chaos. But most of our content resides in files, and files can be a pain to work with. Databases can store files up to a certain file size, but they aren’t great for working with audio and video. File systems solve that problem but they alone don’t offer rich functionality like the ability to track complex metadata with each file or the ability to easily full-text index and then run searches across all of your content.

That’s where a content repository comes in. You might hear these referred to as a Document Management (DM) system or an Enterprise Content Management (ECM) system. No matter what you call it, they are purpose-built for making it easier for your company to get a handle on its file-based content.

Here’s the problem for developers, though: There is a lot of repository software out there. Most large companies have more than one up-and-running in their organization, and every one of them has their own API. It’s rare that these systems exist in a vacuum. They often need to feed and consume business processes and that takes code. So if you are an enterprise developer, and you are trying to integrate some of your systems with your ECM repositories, you’ve got multiple API’s you need to learn. Or, if you are a software vendor, and you are trying to build a solution that requires a rich content repository as a back-end, you either have to choose a specific back-end to support or you have to write adapters to support a handful of repositories.

The solution to this problem is called Content Management Interoperability Services (CMIS). It’s an industry-wide specification managed by OASIS. It describes a domain language, a query language, and multiple protocols for working with a content repository. With CMIS, developers write against the CMIS API instead of learning each repository’s proprietary API, and their applications will work with any CMIS-compliant repository.

The first version of the specification became official in May of 2010. The most recent version, 1.1, became official this past May.

Several developers have been busy writing client libraries, server-side libraries, and tools related to CMIS. Many of these are collected as part of an umbrella open source project known as Apache Chemistry ( The most active Apache Chemistry sub-project is OpenCMIS. It includes a Java client library (including Android), multiple servers for testing purposes, and some developer tools, such as a Java Swing-based repository browser called OpenCMIS Workbench. Apache Chemistry also includes libraries for Python, .NET, PHP, and Objective-C.

The tools and libraries at Apache Chemistry are a great way to get started with CMIS. For example, I’ve got the Apache Chemistry InMemory Repository deployed to a local Tomcat server. I can fire up OpenCMIS Workbench and connect to the server using its service URL, http://localhost:8080/chemistry/browser. Once I do that I can navigate the repository’s folder hiearchy, inspecting or performing actions against objects along they way.

The OpenCMIS Workbench has a built-in Groovy console. One of the examples that ships with the Workbench is “Execute a Query”. Here’s what it looks like without the imports:

String cql = "SELECT cmis:objectId, cmis:name, cmis:contentStreamLength FROM cmis:document"

ItemIterable<QueryResult> results = session.query(cql, false)

results.each { hit -> { println "${it.queryName}: ${it.firstValue}" }
println "--------------------------------------"

println "--------------------------------------"
println "Total number: ${results.totalNumItems}"
println "Has more: ${results.hasMoreItems}"
println "--------------------------------------"

The Apache Chemistry OpenCMIS InMemory Repository ships with some sample data so when I execute the Groovy script, I’ll see something like:

cmis:contentStreamLength: 33216
cmis:name: My_Document-0-1
cmis:objectId: 134
cmis:contentStreamLength: 33226
cmis:name: My_Document-1-0
cmis:objectId: 130
cmis:contentStreamLength: 33718
cmis:name: My_Document-2-0
cmis:objectId: 105
cmis:contentStreamLength: 33617
cmis:name: My_Document-2-1
cmis:objectId: 122
cmis:contentStreamLength: 33807
cmis:name: My_Document-2-2
cmis:objectId: 129
cmis:contentStreamLength: 33364
cmis:name: My_Document-2-1
cmis:objectId: 128
cmis:contentStreamLength: 33506
cmis:name: My_Document-2-1
cmis:objectId: 112
cmis:contentStreamLength: 33567
cmis:name: My_Document-2-1
cmis:objectId: 106
cmis:contentStreamLength: 33230
cmis:name: My_Document-2-2
cmis:objectId: 107
cmis:contentStreamLength: 33774
cmis:name: My_Document-1-1
cmis:objectId: 115
cmis:contentStreamLength: 33524
cmis:name: My_Document-2-0
cmis:objectId: 121
cmis:contentStreamLength: 33593
cmis:name: My_Document-2-0
cmis:objectId: 111
cmis:contentStreamLength: 34152
cmis:name: My_Document-2-2
cmis:objectId: 123
cmis:contentStreamLength: 33332
cmis:name: My_Document-0-0
cmis:objectId: 133
cmis:contentStreamLength: 33478
cmis:name: My_Document-1-2
cmis:objectId: 116
cmis:contentStreamLength: 33541
cmis:name: My_Document-1-2
cmis:objectId: 132
cmis:contentStreamLength: 33225
cmis:name: My_Document-2-0
cmis:objectId: 127
cmis:contentStreamLength: 33333
cmis:name: My_Document-2-2
cmis:objectId: 113
cmis:contentStreamLength: 33698
cmis:name: My_Document-1-0
cmis:objectId: 114
cmis:contentStreamLength: 33746
cmis:name: My_Document-0-2
cmis:objectId: 135
cmis:contentStreamLength: 33455
cmis:name: My_Document-1-1
cmis:objectId: 131
Total number: 21
Has more: false

So that query returned three properties, cmis:objectId, cmis:name, and cmis:contentStreamLength, of every object in the repository that is of type cmis:document. We could have restricted the query further with a where clause that tested specific property values or even the full-text content of the files.

Now I also happen to be running Alfresco, which is an open source ECM repository. The beauty of CMIS is demonstrated by the fact that I can run that exact same Groovy script against Alfresco. I simply have to reconnect using Alfresco’s service URL, which is http://localhost:8080/alfresco/cmisatom (for the Atom Pub binding). My local Alfresco repository has many more objects than my OpenCMIS InMemory repository so I won’t list the output here, but the code runs successfuly unchanged.

Readers who spend their days enjoying standardized SQL that works across databases or the benefits of ORM tools that abstract their code from any specific relational database will no doubt be unimpressed by this feat. But I promise you that those of us who have to work with ECM repositories like SharePoint, Documentum, FileNet, and Alfresco, sometimes all on the same project, are rejoicing.

The next time you need to integrate with an ECM repository, CMIS should definitely be on your radar. I’ve created a list of CMIS resources to help you.

Up-coming Alfresco meetups in Sydney, Atlanta, London, New York, Chicago & Seville

Toronto Alfresco MeetupThe months leading up to Alfresco Summit are typically popping with meetup activity and this year is no exception. I thought I’d give you a quick rundown of the Alfresco meetups I know about that are coming up this month and next month:

  • Alfresco Sydney Day, August 22. This is a day-long meetup featuring talks from customers, partners, and yours truly. If you find yourself down under, it is not too late to sign up.
  • Atlanta, August 27. This meetup will feature a talk about Alfresco in the Insurance industry as well as a technical talk on the new backup and recovery toolkit. Sign up here.
  • London, September 11. Beer and Alfresco. What’s not to like? And this one has a hometown advantage. You never know who might drop by. Sign up here.
  • New York, September 24. We’ll hear from Mitch Brodsky, Digital Archive Manager from the New York Philharmonic. And I’ll be there to share some CMIS tips and tricks. Mitch is going to be organizing this group going forward, so he’ll want to hear your ideas on how to shape the revitalized New York Alfresco community. Sign up and share your ideas.
  • Chicago, September 25. How about a long lunch with the Alfresco Chicago community? The good folks at TSG once again offer up their sweet digs for the local community to swap tips and tricks. I’ll be there to hear about the great work being done with Alfresco in chi-town and maybe share a few tips of my own. Sign-up here.
  • Seville, October 8. Our Spanish community is one of the most passionate and committed on the planet. Sample what’s in store for Barcelona by hanging out with this awesome community in October. Inscribete aqui.

If you’ve never been to an Alfresco meetup, you’re missing out on a wonderful chance to hear first-hand from people just like you who are implementing Alfresco in their companies. These local communities vary greatly. Some meet very regularly, others not so much. Some lean towards the technical end of the spectrum while others are more focused on end-users. Often there is a formal agenda with one or more talks. Other times the goal is to spend time chatting over drinks.

Regardless of the style of the local Alfresco community in your geography, these principles hold true across all of them:

  1. Everyone is welcome. If you are interested in Alfresco, for whatever reason, we want you to participate. It doesn’t matter which product you use, whether or not you are a partner, or what your experience level is. Ours is a friendly, welcoming community online as well as in-person.
  2. You get out of it what you put into it. Most meetups are run by the local community. Organizing the meetings, finding people to speak, and finding a location all takes time and energy. So find a local community in your area, attend, and ask the organizer if you can help with the next one (even if the organizer works for one of your competitors).
  3. These aren’t sales events. Sure, the group might have one or more sponsors who paid for the food or supplied the venue, and they should get a few minutes to say who they are and let people thank them for their much-needed support of the group, but these meetups are for learning, sharing, and socializing. I haven’t heard of any problems in this area–I just want you to know our meetups are intended to be hard-sales-pitch-free zones.

If you are thinking about starting your own meetup and want some tips, take a look at Amy Currans’ Lightning Talk from last year’s DevCon.

If hosting your own Alfresco meetup is too much of a commitment for you at the moment, find an existing one and show up. I think you’ll have fun, you might learn something, and you’ll meet some really cool people. At the very least, you might walk away with some coveted Alfresco footwear. (Seriously, ask around).

I hope to see you at one of these meetups before Alfresco Summit!

New Alfresco community project: Refactor the Alfresco SDK

Veere: tools by DocmanOne of the best things a community can do for its members is to make it easy for newcomers to get started. In the Alfresco community, we’ve made some improvements rather recently such as:

Those all help people get pointed in the right direction. Now it is time to focus on the specific tools people use to write code for their Alfresco projects.

When someone wants to customize or extend Alfresco they often start with the downloadable SDK. The downloadable SDK includes dependencies (Alfresco & third-party), source for Alfresco dependencies, JavaDoc, and sample projects.

There’s nothing necessarily wrong with the downloadable SDK. It has existed in pretty much the same state since it was originally created and has served us well. But there are newer tools available. For example, thanks to the hard work of Gab Columbro and some of his cohorts, there is now a set of officially-supported artifacts for both Community Edition and Enterprise Edition. That means you can use a tool like Maven to resolve dependencies for you. There are also Maven archetypes that make it easy for you to start a new Alfresco project with the appropriate folder structure for the type of customization you need to do, complete with a ready-to-import Eclipse project.

So all of this great work has been done on the Maven-based SDK but the “last mile” is making it easily consumable by brand new developers. The best way to do that, I think, is to refactor and revitalize the downloadable SDK. I think we need to:

  • Remove old sample projects that are no longer relevant
  • Add new sample projects for areas of the platform that may currently be missing
  • Convert all sample projects to builds that leverage the Alfresco Maven SDK
  • Provide a light set of documentation that explains how to use the Alfresco Maven SDK and how to build the sample projects. This should not replace any formal official documentation on customizing Alfresco. Instead, it should be just enough to understand what’s in the SDK, how to build and run the samples, and how to use the Alfresco Maven SDK to start a new project.

Toward this end, I’ve grabbed the Alfresco SDK source out of Alfresco SVN and used it to create an Alfresco SDK project on Github. If the community leaves it up to me, I’ll work on it in fits and starts as I am able and it will get done in a few years. Instead, I’m hoping that a few of you who are excited about this idea will fork the project and start giving me pull requests. We can discuss this effort in #alfresco on freenode IRC. If enough people are interested we could also have a regular Skype call to coordinate efforts.

Thanks ahead of time for any time you are able to put in to this project. I’m hoping that if we work together we can get this looking great by Alfresco Summit, but that depends on you!

Notes from today’s Alfresco Office Hours

We had another live broadcast of Alfresco Office Hours today on Google Hangouts on Air. If you missed the broadcast you can watch the recorded session.

Here are my rough notes from today’s session:

CMIS & Apache Chemistry book is now in print

Lightning talks deadline is this weekend!

Why don’t more people use the source code?

Question raised on #alfresco freenode IRC about Share moving away from YUI:

Forum fix update:

  • Have you noticed that tags and alfresco version are being shown in forum posts now?
  • When creating posts in the forum, please try to remember to set your Alfresco version.

Alfresco Developer Series stuff moved to github

  • Code lives here
  • Thinking about converting the actual tutorials themselves to a plain-text based format and checking that in as well. WDYT?
  • Need to move that code and all of my other code to the Maven SDK

Speaking of the SDK, it is time for the community to step up and rescue that project

  • Engineering is on board with us doing that
  • I’ll move the code to github and will then start taking pull requests
  • I’d like to get the SDK converted to use Maven
  • We should refactor old code if it needs it
  • We should add new examples where none exist, like for CMIS, the Public API, and simple Share customizations.

Other projects we need your help on…

Pages marked as needing work on the wiki

Jira bug triage

  • I think we can increase the attention community-reported issues get if we can focus engineering on the quality bug reports
  • Maybe the community could help triage these
  • If you are interested, let me know

Projects on the help wanted page

Organizing local meetups

As you can see, we covered a lot of ground and had some great discussion. We were even joined for a little while by a fellow community member, Alfresco partner, and Alfresco Summit speaker, Boris Mejias. It could be you next time. See you at Alfresco Office Hours on August 30.