Tag: CMIS

Seven Options for Scripting Against Alfresco

It is often necessary to perform bulk operations against the data in your Alfresco repository. Frequently, writing a quick script, rather than a full application, is the best approach. Sometimes those scripts need to be scheduled, like with a cron job, or given to power users to run ad hoc.

The good news is that there are many options available depending on what you need to do and personal preference.

All of these options can be used to develop full-blown applications, but the focus in this post is on how well each option works specifically for the scripting use case.

JavaScript Console
License: Apache 2.0
Link: https://github.com/share-extras/js-console
Install: Repo and Share AMPs, requires restart

The JavaScript Console provides a user interface within Share to interactively run server-side scripts that leverage the Alfresco JavaScript API. The UI features syntax highlighting and code completion.

This is a must-have tool for both developers and administrators. It is the fastest way to perform bulk tasks or to test out JavaScript logic before using it in something harder to debug or change, such as in the controller of a web script. I install this on every server I touch. I suppose that, eventually, as Share falls out of favor, this will be less of a go-to option, but, for now, it is essential.

The JavaScript Console is only available to administrators, so this is not an option if you are developing something that will be invoked by non-admins.

Apache Chemistry CMIS Workbench
License: Apache 2.0
Link: https://chemistry.apache.org/java/developing/tools/dev-tools-workbench.html
Install: On your own desktop, no server config/restart required

The Apache Chemistry CMIS Workbench is a desktop application that uses pure CMIS calls to communicate with the repository. It contains a lot of useful features such as a CMIS Query Language console, a node browser, a data dictionary browser, and a Groovy console. The Groovy console is similar to the JavaScript console in that it can be used to quickly script repetitive tasks, but instead of JavaScript it uses Apache Groovy. If you aren’t familiar with Groovy but you know Java, you can use Java in the Groovy console–Groovy is a JVM language and is happy to interpret your Java syntax.

This tool is particularly useful if you are writing code that leverages CMIS. If you can do it in the Workbench you can be confident you can do it from your CMIS-based application. However, it is also useful when you just need to execute some repetitive tasks, especially if the server in question does not have the JavaScript Console installed.

Unlike the JavaScript Console, however, you are limited by what is supported in the CMIS specification. For example, you cannot manage users, groups, or workflows, just to name a few. It is pretty much limited to documents, folders, and items (content-less objects). Custom content models, however, are fully-supported.

The Swing-based UI of the workbench has a face only a mother could love so this is definitely not something you’d put in the hands of typical end-users.

Apache Chemistry cmislib (Python client)
License: Apache 2.0
Link: https://chemistry.apache.org/python/cmislib.html
Install: Client library, no server config/restart required

If what you need to do is covered by the CMIS specification but you prefer Python, then Apache Chemistry cmislib might be a good choice. It is a client library that you can use from your own Python scripts and applications. For example, if I need to generate a bunch of folders or documents, I’ll often just write a quick Python script to do it.

The library has the same limitation as the Workbench in that it only makes CMIS calls, but if that’s all you need it should work well. The current release requires Python 2.7 but a new release that supports Python 3.x is being tested now.

If you are giving the script to power users who might want to make tweaks this can be a good choice if they already have Python installed.

Apache Chemistry OpenCMIS (Java client)
License: Apache 2.0
Link: https://chemistry.apache.org/java/opencmis.html
Install: Client library, no server config/restart required

Rounding out my CMIS-related recommendations is Apache Chemistry OpenCMIS. Like cmislib, it is a client library, but this one is written in Java and is much more mature and more widely-used. For quick scripts you can use Groovy on the command-line to make CMIS calls against Alfresco via OpenCMIS. Often, I already have my Java IDE open anyway, so it can be just as quick to write a little runnable Java class that uses OpenCMIS to carry out some tasks.

I use Groovy scripts and OpenCMIS when I want to automate tasks that power users are going to run from the command-line that they may want to tweak, but who do not have Python installed.

When a single script needs to perform operations that are a mix of CMIS and non-CMIS, I use Groovy’s HTTP client to call the Alfresco REST API (more on that, below).

Alfresco Web Script Framework
License: LGPL v3
Link: https://docs.alfresco.com/5.2/concepts/ws-framework.html
Install: For quick scripts, copy to Data Dictionary

Strictly speaking, web scripts weren’t built to be a “quick scripting” option, but they’ll work in a pinch. Basically, you just write a web script that does what you need it to do, but instead of packaging the web script into an AMP like you would for a production extension, you deploy the web script to the running repository by copying the web script related files into the Data Dictionary’s Web Scripts folder. The next step is to refresh the web scripts via the web script console. After that the web script can be invoked in a browser, via CuRL, or Postman.

Just like the JavaScript Console, you are limited to what the Alfresco JavaScript API can do, but that is definitely a broader set of capabilities than a CMIS-based solution.

Files in the Data Dictionary can only be edited by administrators, so if the folks running this script need to make changes and aren’t administrators, this won’t work. You can always build that flexibility into the web script and let them control various options by what they pass to the web script.

For more information on Web Scripts, see the documentation link above or check out my tutorial.

Alfresco Client-side JavaScript API
License: Apache 2.0
Link: https://github.com/Alfresco/alfresco-js-api
Install: Node Package Manager

A relatively new kid on the block is the Alfresco Client-side JavaScript API, aka, alfresco-js, which is part of the Alfresco Developer Framework (ADF). Of course you can use it to develop custom user interfaces in your favorite JavaScript framework, but you can also use it for quick scripting needs. For example, you could write a Node.js script and run it from the command-line.

The alfresco-js library relies on the Alfresco REST API that was created in 5.2, so if you need to work with an older Alfresco version, this isn’t an option. The alfresco-js library and the REST API are both under active development, so there could be gaps and some instability, but it is still worth taking a look, especially if you are already invested in the Node.js and NPM ecosystem.

Alfresco Content Services REST API (And others)
License: LGPL v3
Link: https://api-explorer.alfresco.com
Install: Your favorite REST client

Last, but not least, is the Alfresco Content Services REST API. If none of the other options in this list appeal to you, grab your favorite REST client or import your preferred scripting language’s HTTP client library, head over to the Alfresco API Explorer, and get busy.

As mentioned above in the alfresco-js description, this option relies on Alfresco 5.2 or higher (5.2.d Community Edition, distributed as 201612).

If the Alfresco Content Services REST API doesn’t have what you need or you aren’t yet running 5.2 or higher, you could always hit the CMIS browser (JSON) binding, the CMIS atompub (XML) binding, out-of-the-box web scripts, or custom web scripts.

Depending on what you need to do, using a lower-level REST API may not be the fastest option in terms of development time because instead of working with higher-level wrapper classes it is up to you to issue the REST calls and parse the responses.

Summary

Hopefully you see some options in the above list that will work for you and your environment. I definitely recommend you get a few of these in place now (especially the JavaScript Console) because if you ever need them in a hurry (oh, if this keyboard could talk!), you’ll want to have a quick scripting option you are comfortable and proficient with.

Photo Credit: Quality Quick Cleaning by GmanViz, by-nc-nd 2.0

Apache Chemistry cmislib 0.6.0 released

It has been far too long since our last Apache Chemistry cmislib release, but we finally managed to get one out. The new release, 0.6.0, features support for the browser binding as well as many fixes contributed by the community.

If you make no changes to your code the library will continue to use the Atom Pub binding, by default. But, the browser binding, which communicates with CMIS 1.1-compliant repositories using HTML forms and JSON, is often preferable because it may be more performant than the XML-based Atom Pub binding.

To use the new browser binding, import it, then pass it to the CmisClient constructor, like this:

from cmislib.browser.binding import BrowserBinding
client = CmisClient('http://localhost:8081/chemistry/browser',
   'admin',
   'admin',
   binding=BrowserBinding())

From there everything works like it always has.

For more information, please see the docs. If you have issues, please file a Jira with as much detail as possible, including the vendor and version of the repository you are working with. And if you have a fix, include that in your Jira. Contributions are welcome!

 

Cool things you can do in Alfresco with cmis:item support

allyoursysbaseI’ve been taking a look at the newly-added support for cmis:item in Alfresco. As a refresher for those who may not be familiar, cmis:item is a new object type added to the CMIS 1.1 specification. Alfresco added support for CMIS 1.1 in 4.2 but did not immediately add support for cmis:item, which is optional, according to the spec. Now cmis:item support is available in Alfresco in the cloud as well as the nightly builds of 4.3.a Community Edition.

So what is cmis:item?

We’ve all written content-centric applications that have included objects that don’t have a content stream. You might use one to store some configuration information, for example, or maybe you have some other object that doesn’t naturally include a file that needs to be managed as part of it. There are a few approaches to this in Alfresco:

  1. Create your custom type as a child of sys:base (or cm:cmobject if you don’t mind your objects being subject to the file name constraint)
  2. Create your custom type as child of cm:content and simply do not set a content stream
  3. Ignore the content model altogether and use the attribute service

I’m not going to cover the third option in this post. If you want to learn more about the attribute service you should take a look at the Tech Talk Live we did in April.

The second option is fine, but then you’ve got objects with content properties that will never be set. Your objects look like they want to be documents, but they really aren’t because you don’t expect them to ever have a file as part of the object. Not the end of the world, but it’s kind of lazy and potentially confusing to developers that come after you.

The first option is what most people go with, but it has a drawback. Instances of types that do not extend from cm:content or cm:folder are invisible to CMIS 1.0. Not only are those objects invisible to CMIS 1.0 but relationships that point to such objects are also invisible. Depending on how much you use CMIS in your application this could be a fairly severe limitation.

Thankfully, CMIS 1.1 addresses the issue with a new object type called cmis:item. It’s made precisely for this purpose.

What can I do with it out-of-the-box?

Even if your custom content model doesn’t need a “content-less object” you can still benefit from cmis:item support. Let me walk you through my five favorite object types that you can now work with via cmis:item support in CMIS 1.1 that were not previously available: Category, Person, Group, Rating, and Rule. I tested everything I’m showing you here using Alfresco 4.3.a Community Edition, Nightly Build (4/27/2014, (r68092-b4954) schema 7003) and Apache Chemistry OpenCMIS Workbench 0.11.0.

Category

Man, if I had a bitcoin for every person I’ve seen asking how to get the categories of a document via CMIS, I’d have a garage full of Teslas. With cmis:item support, it’s easy. Here’s how you get a list of every category in the system with a CMIS Query Language (CQL) query:

SELECT * FROM cm:category

That returns a list of cm:category objects that represent each category in the system. Note that this is a flat list. In Alfresco, categories are hierarchical. I’m not sure what the plan to address that is, to be honest.

Now suppose you have an object and you want to know the categories that have been assigned. Here is some Groovy code that does that:

(Can’t see the code? Click here)

Categories live in a property called “cm:categories”. It’s a multi-value field that contains the Alfresco node references for each category that has been assigned to a document. Once you get that list you can iterate over them and fetch the cm:category object to figure out the category’s human-readable name. That’s pretty cool and wasn’t possible before CMIS 1.1 support.

How about assigning existing categories to documents? Sure, no problem. Here’s how.

(Can’t see the code? Click here)

This is a little Groovy function that takes a full path to a document and the name of a category. Obviously it depends on your categories being uniquely-named.

First, it finds the category by name using CQL and snags its Alfresco node reference.

Next, it checks to see if the cm:generalclassifiable aspect has already been added to the document and adds it if it has not.

Finally, it gets the current list of categories from the cm:categories property and adds the new category’s node reference to it.

I started to look at creating new categories with CMIS, but it isn’t immediately obvious how that would work due to the hierarchical nature of categories. The only parent-child association supported by CMIS is the one implied by folder-document relationship. I’ll ask around and see if there’s a trick.

That’s it for categories. Let’s look at Person objects next.

Person

Here’s another frequently-asked how-to: How do I query users through CMIS? Before cmis:item support you couldn’t do it, but now you can. For example, here is how you can use CQL to find a list of the Person objects in Alfresco:

SELECT * FROM cm:person

You can qualify it further based on properties of cm:person. For example, maybe I want all users who’s first name starts with “Te” and who work for an organization that starts with “Green”. That query looks like:

SELECT * FROM cm:person where cm:firstName like 'Te%' and cm:organization like 'Green%'

Suppose you wanted to find every Alfresco Person object that has a city starting with “Dallas” and you want to set their postal code to “75070”. Ignoring cities named “Dallas” that aren’t in Texas (like Dallas, Georgia) or the fact that there are multiple zip codes in Dallas, Texas, the code would look like this:

(Can’t see the code? Click here)

That’s it for Person objects. Let’s look at Group objects.

Group

Similar to querying for Person objects, cmis:item support gives you the ability to query for groups–all you have to know is that groups are implemented as instances of cm:authorityContainer. Here’s the query:

SELECT * FROM cm:authorityContainer

Unfortunately, it doesn’t seem possible to use CMIS to actually add or remove members to or from the group. Maybe that will change at some point.

Rating

It’s easy to query for ratings:

SELECT * FROM cm:rating

But it’s hard to do anything useful with those objects because, at least in this nightly release, you can’t follow the relationships from or two a rating. As a sidenote, you can get the count and total for a specific node’s ratings by getting the value of the cm:likesRatingSchemeCount and cm:likesRatingSchemeTotal properties respectively. But that’s not related to cmis:item support.

Rule

Rules are a powerful feature in Alfresco that help you automate document handling. Here’s how to use a query to get a list of rules in your repository:

SELECT * FROM rule:rule

Rules are so handy you might end up with a bunch of them. What if you wanted to find all of the rules that matched a certain criteria (title, for example) so that you could enable them and tell them to run on sub-folders? Here is a little Groovy that does that:

(Can’t see the code? Click here)

First the code uses a join to find the rule based on a title. The property that holds a rule’s title is defined in an aspect and that requires a join when writing CQL.

Then the code simply iterates over the result set and updates the rule:applyToChildren and rule:disabled properties. You could also set the rule’s type, description, and whether or not it runs asynchronously. It does not appear to be possible to change the actions or the filter criteria for a rule through CMIS at this time.

What about custom types?

Custom types? Sure, no problem. Suppose I have a type called sc:client that extends sys:base and has two properties, sc:clientName and sc:clientId. Alfresco automatically makes that type accessible via CMIS. It’s easy to create a new instance and set the value of those two properties. Here’s the groovy code to do it:

(Can’t see the code? Click here)

Did you notice I created the object in a folder? In Alfresco, everything lives in a folder, even things that aren’t documents. In CMIS parlance, you would say that Alfresco does not support “unfiled” objects.

The custom object can be queried as you would expect, including where clauses that use the custom property values, like this:

SELECT * FROM sc:client where sc:clientId = '56789'

In CMIS 1.0 you could not use CMIS to work with associations (“relationships”) between objects that did not inherit from either cm:content (cmis:document) or cm:folder (cmis:folder). In CMIS 1.1 that changed. You can create relationships between documents and folders and items. Unfortunately, in the latest nightly build this does not appear to be implemented. Hopefully that goes in soon!

Summary

The new cmis:item type in CMIS 1.1 is a nice addition to the specification that is useful when you are working with objects that are not documents and do not have a content stream. I showed you some out-of-the-box types you can work with via cmis:item but you can also leverage cmis:item with your custom types as well.

Support for cmis:item requires CMIS 1.1 and either Alfresco in the cloud or a recent nightly build of Alfresco 4.3.

Deal of the Day: 50%-off CMIS & Apache Chemistry in Action

The book I mueller_cover150co-authored with Florian Mueller (SAP) and Jay Brown (IBM), CMIS and Apache Chemistry in Action, is 50% off today (January 30). Use code dotd013014au when you checkout.

If you are doing anything with CMIS, whether that is with Alfresco or some other CMIS-compliant Enterprise Content Management server, like Nuxeo, SharePoint, FileNet, or Documentum, you should really take a look at this book. It provides Java, Python, PHP, and .NET examples including a working web application.

CMIS example: Uploading multiple files to a CMIS repository

In my previous post on CMIS, I introduced the Content Management Interoperability Services (CMIS) specification and the Apache Chemistry project. You learned that CMIS gives you a language-neutral, vendor-independent way to perform CRUD functions against any CMIS-compliant server using a standard API. I showed a simple CMIS query being executed from a Groovy script running in the OpenCMIS Workbench.

Now I’d like to get a little more detailed and show a simple use case: I’ll use the OpenCMIS client library for Java to upload some files from my local machine to a CMIS repository. In my case, that repository is Alfresco 4.2.c Community Edition running locally, but this code should work with any CMIS-compliant server from vendors like IBM, EMC, Microsoft, Nuxeo, and so on. I’ll include the relevant snippets, but if you want to follow along, the full source code for this example lives here. I use that example to show the same code working against Alfresco in the cloud and Alfresco on-premise. If you are running against on-premise only or some other CMIS server, it has a few dependencies that won’t be relevant to you.

LoadFiles.java is a runnable Java class. The main method simply calls doExample(). That method grabs a session, gets a handle to the destination folder for the files on the local machine, and then, for each file in the local machine’s directory, it creates a hashmap of metadata values, then uploads each file and its associated metadata to the repository. Let’s look at each of these pieces in turn.

Get a Session

The first thing you need is a session. I have a getCmisSession() method that knows how to get one, and it looks like this:

SessionFactory factory = SessionFactoryImpl.newInstance();
Map parameter = new HashMap();

// connection settings
parameter.put(SessionParameter.ATOMPUB_URL, ATOMPUB_URL);
parameter.put(SessionParameter.BINDING_TYPE, BindingType.ATOMPUB.value());
parameter.put(SessionParameter.AUTH_HTTP_BASIC, "true");
parameter.put(SessionParameter.USER, USER_NAME);
parameter.put(SessionParameter.PASSWORD, PASSWORD);

List repositories = factory.getRepositories(parameter);

return repositories.get(0).createSession();

As you can see, establishing a session is as simple as providing the username, password, binding, and service URL. The binding is the protocol we’re going to use to talk to the server. In CMIS 1.0, usually the best choice is the Atom Pub binding because it is faster than the Web Services binding, the only other alternative. CMIS 1.1 adds a browser binding that is based on HTML forms and JSON but I won’t cover that here.

The other parameter that gets set is the service URL. This is server-specific. For Alfresco 4.x or higher, the CMIS 1.0 Atom Pub URL is http://localhost:8080/alfresco/cmisatom.

The last thing the method does is return a session for a specific repository. CMIS servers can serve more than one repository. In Alfresco’s case, there is only ever one, so it is safe to return the first one in the list.

Get the Target Folder

Okay, we’ve got a session with the CMIS server’s repository. The repository is a hierarchical tree of objects (folders and documents) similar to a local file system. The class is configured with a parent folder path and the name of a new folder that should be created in that parent folder. So the first thing we need to do is get a reference to the parent folder. My example getParentFolder() method just grabs the folder by path, like this:

Folder folder = (Folder) cmisSession.getObjectByPath(FOLDER_PATH);
return folder;

Now, given the parent folder and the name of a new folder, the createFolder() method attempts to create the new folder to hold our files:

Folder subFolder = null;
try {
  subFolder = (Folder) cmisSession.getObjectByPath(parentFolder.getPath() + "/" + folderName);
  System.out.println("Folder already existed!");
} catch (CmisObjectNotFoundException onfe) {
  Map props = new HashMap();
  props.put("cmis:objectTypeId",  "cmis:folder");
  props.put("cmis:name", folderName);
  subFolder = parentFolder.createFolder(props);
  String subFolderId = subFolder.getId();
  System.out.println("Created new folder: " + subFolderId);
}
return subFolder;

The folder is either going to already exist, in which case we’ll just grab it and return it, or it will need to be created. We can test the existence of the folder by trying to fetch it by path, and if it throws a CmisObjectNotFoundException, we’ll create it.

Look at the Map that is getting set up to hold the properties of the folder. The minimum required properties that need to be passed in are the type of folder to be created (“cmis:folder”) and the name of the folder to create. You might choose to extend your server’s content model with your own folder types. In this example, the out-of-the-box “cmis:folder” type is fine.

Set Up the Properties For Each New Document

Just like when the folder was created, every file we upload to the repository will have its own set of metadata. To make it interesting, though, we’ll provide more than just the type of document we want to create and the name of the document. In my example, I’m using a content model we created for the CMIS & Apache Chemistry in Action book. It contains several types. One of which is called “cmisbook:image”. The image type has attributes you’d expect that would be part of an image, like height, width, focal length, camera make, ISO speed, etc. In fact, if you use the OpenCMIS Workbench, you can inspect the type definition for cmisbook:image. Here’s a screenshot (click to enlarge):

OpenCMIS Workbench, Type Inspector

Two of the properties I’m going to work with in this example are the latitude and longitude. Alfresco will automatically extract metadata like this when you add files to the repository. In fact, Alfresco already has a “geographic aspect” out-of-the-box that can be used to extract and store lat and long. But we wanted this content model to work with any CMIS repository and not all repositories support aspects (CMIS 1.1 call these “secondary types”) so the content model used in the book defines lat and long on the cmisbook:image type.

Because not all repositories know how to extract metadata, we’re going to use Apache Tika to do it in our client app.

The getProperties() method does this work. It returns a Map of properties that consists of the type of the object we want to create (“cmisbook:image”), the name of the object (the file name being uploaded), and the latitude and longitude. Here’s what that code looks like:

Map props = new HashMap();

String fileName = file.getName();
System.out.println("File: " + fileName);
InputStream stream = new FileInputStream(file);
try {
  Metadata metadata = new Metadata();
  ContentHandler handler = new DefaultHandler();
  Parser parser = new JpegParser();
  ParseContext context = new ParseContext();

  metadata.set(Metadata.CONTENT_TYPE, FILE_TYPE);

  parser.parse(stream, handler, metadata, context);
  String lat = metadata.get("geo:lat");
  String lon = metadata.get("geo:long");
  stream.close();

  // create a map of properties
  props.put("cmis:objectTypeId",  objectTypeId);
  props.put("cmis:name", fileName);
  if (lat != null && lon != null) {
    System.out.println("LAT:" + lat);
    System.out.println("LON:" + lon);
    props.put("cmisbook:gpsLatitude", BigDecimal.valueOf(Float.parseFloat(lat)));
    props.put("cmisbook:gpsLongitude", BigDecimal.valueOf(Float.parseFloat(lon)));
  }
} catch (TikaException te) {
  System.out.println("Caught tika exception, skipping");
} catch (SAXException se) {
  System.out.println("Caught SAXException, skipping");
} finally {
  if (stream != null) {
    stream.close();
  }
}
return props;

Now we have everything we need to upload the file to the repository: a session, the target folder, and a map of properties for each object being uploaded. All that’s left to do is upload the file.

Upload the File

The first thing the createDocument() method does is to make sure that we have a Map with the minimal set of metadata, which is the object type and the name. It’s conceivable that things didn’t go well in the getProperties() method, and if that is the case, this bit of code makes sure everything is in place:


String fileName = file.getName();

// create a map of properties if one wasn't passed in
if (props == null) {
  props = new HashMap<String, Object>();
}

// Add the object type ID if it wasn't already
if (props.get("cmis:objectTypeId") == null) {
  props.put("cmis:objectTypeId",  "cmis:document");
}

// Add the name if it wasn't already
if (props.get("cmis:name") == null) {
  props.put("cmis:name", fileName);
}

Next we use the file and the object factory on the CMIS session to set up a ContentStream object:

ContentStream contentStream = cmisSession.getObjectFactory().
  createContentStream(
    fileName,
    file.length(),
    fileType,
    new FileInputStream(file)
  );

And finally, the file can be uploaded.

Document document = null;
try {
  document = parentFolder.createDocument(props, contentStream, null);
  System.out.println("Created new document: " + document.getId());
} catch (CmisContentAlreadyExistsException ccaee) {
  document = (Document) cmisSession.getObjectByPath(parentFolder.getPath() + "/" + fileName);
  System.out.println("Document already exists: " + fileName);
}
return document;

Similar to the folder creating logic earlier, it could be that the document already exists, so we use the same find-or-create pattern here.

When I run this locally using a folder that contains five pics I snapped in Berlin, the output looks like this:

Created new folder: workspace://SpacesStore/2f576635-5058-4053-9a61-dad68939fdd2
File: augustiner.jpg
LAT:52.51387
LON:13.39111
Created new document: workspace://SpacesStore/b19755e1-74a2-4c1e-9eb5-a5bfd2c0ebd7;1.0
File: berlin_cathedral.jpg
LAT:52.51897
LON:13.39936
Created new document: workspace://SpacesStore/34aa7b80-9f09-4c07-a040-9aee94debf80;1.0
File: brandenburg.jpg
LAT:52.51622
LON:13.37783
Created new document: workspace://SpacesStore/6c02f8f6-accc-4997-be5c-601bc7131247;1.0
File: gendarmenmarkt.jpg
LAT:52.51361
LON:13.39278
Created new document: workspace://SpacesStore/44ff28e7-782a-46c3-b388-453fd8495472;1.0
File: old_museum.jpg
LAT:52.52039
LON:13.39926
Created new document: workspace://SpacesStore/03a85605-4a66-4f94-b423-82502efbca4a;1.0

Now Run Against Another Vendor’s Repo

What’s kind of cool, and what I think really demonstrates the great thing about CMIS, is that you can run this class against any CMIS repository, virtually unchanged. To demonstrate this, I’ll fire up the Apache Chemistry InMemory Repository we ship with the source code that accompanies the book because it is already configured with a custom content model that includes “cmisbook:image”. As the name suggests, this repository is a reference CMIS server available from Apache Chemistry that runs entirely in-memory.

To run the class against the Apache Chemistry InMemory Repository, we have to change the service URL and the content type ID, like this:

//public static final String CONTENT_TYPE = "D:cmisbook:image";
public static final String CONTENT_TYPE = "cmisbook:image";

//public static final String ATOMPUB_URL = ALFRESCO_API_URL + "alfresco/cmisatom";
public static final String ATOMPUB_URL = ALFRESCO_API_URL + "inmemory/atom";

And when I run the class, my photos get uploaded to a completely different repository implementation.

That’s It!

That’s a simple example, I know, but it illustrates fetching objects, creating new objects, including those of custom types, setting metadata, and handling exceptions all through an industry-standard API. There is a lot more to CMIS and OpenCMIS, in particular. I invite you to learn more by diving in to CMIS & Apache Chemistry in Action!

CMIS: An open API for managing content

Most of the content in a company is completely unstructured. Just think about the documents you collaborate on with the rest of your team throughout the day. They might include things like proposals, architecture diagrams, presentations, invoices, screenshots, videos, books, meeting notes, or pictures from your last company get-together.

How does a company organize all of that content? Often it is scattered across file shares and employee hard drives. It isn’t really organized at all. It’s hard enough to simply find content in that environment, but what about answering questions like:

  • Is this the latest version and how has it changed over time?
  • Which customer is this document related to?
  • Who is allowed to read or make changes to this document?
  • How long are we legally required to keep this document?
  • When I’m done making my change to this document, what is the next step in the process?

To address this, companies will often write content-centric applications that try to put some order to the chaos. But most of our content resides in files, and files can be a pain to work with. Databases can store files up to a certain file size, but they aren’t great for working with audio and video. File systems solve that problem but they alone don’t offer rich functionality like the ability to track complex metadata with each file or the ability to easily full-text index and then run searches across all of your content.

That’s where a content repository comes in. You might hear these referred to as a Document Management (DM) system or an Enterprise Content Management (ECM) system. No matter what you call it, they are purpose-built for making it easier for your company to get a handle on its file-based content.

Here’s the problem for developers, though: There is a lot of repository software out there. Most large companies have more than one up-and-running in their organization, and every one of them has their own API. It’s rare that these systems exist in a vacuum. They often need to feed and consume business processes and that takes code. So if you are an enterprise developer, and you are trying to integrate some of your systems with your ECM repositories, you’ve got multiple API’s you need to learn. Or, if you are a software vendor, and you are trying to build a solution that requires a rich content repository as a back-end, you either have to choose a specific back-end to support or you have to write adapters to support a handful of repositories.

The solution to this problem is called Content Management Interoperability Services (CMIS). It’s an industry-wide specification managed by OASIS. It describes a domain language, a query language, and multiple protocols for working with a content repository. With CMIS, developers write against the CMIS API instead of learning each repository’s proprietary API, and their applications will work with any CMIS-compliant repository.

The first version of the specification became official in May of 2010. The most recent version, 1.1, became official this past May.

Several developers have been busy writing client libraries, server-side libraries, and tools related to CMIS. Many of these are collected as part of an umbrella open source project known as Apache Chemistry (http://chemistry.apache.org). The most active Apache Chemistry sub-project is OpenCMIS. It includes a Java client library (including Android), multiple servers for testing purposes, and some developer tools, such as a Java Swing-based repository browser called OpenCMIS Workbench. Apache Chemistry also includes libraries for Python, .NET, PHP, and Objective-C.

The tools and libraries at Apache Chemistry are a great way to get started with CMIS. For example, I’ve got the Apache Chemistry InMemory Repository deployed to a local Tomcat server. I can fire up OpenCMIS Workbench and connect to the server using its service URL, http://localhost:8080/chemistry/browser. Once I do that I can navigate the repository’s folder hiearchy, inspecting or performing actions against objects along they way.

The OpenCMIS Workbench has a built-in Groovy console. One of the examples that ships with the Workbench is “Execute a Query”. Here’s what it looks like without the imports:

String cql = "SELECT cmis:objectId, cmis:name, cmis:contentStreamLength FROM cmis:document"

ItemIterable<QueryResult> results = session.query(cql, false)

results.each { hit ->
hit.properties.each { println "${it.queryName}: ${it.firstValue}" }
println "--------------------------------------"
}

println "--------------------------------------"
println "Total number: ${results.totalNumItems}"
println "Has more: ${results.hasMoreItems}"
println "--------------------------------------"

The Apache Chemistry OpenCMIS InMemory Repository ships with some sample data so when I execute the Groovy script, I’ll see something like:

cmis:contentStreamLength: 33216
cmis:name: My_Document-0-1
cmis:objectId: 134
--------------------------------------
cmis:contentStreamLength: 33226
cmis:name: My_Document-1-0
cmis:objectId: 130
--------------------------------------
cmis:contentStreamLength: 33718
cmis:name: My_Document-2-0
cmis:objectId: 105
--------------------------------------
cmis:contentStreamLength: 33617
cmis:name: My_Document-2-1
cmis:objectId: 122
--------------------------------------
cmis:contentStreamLength: 33807
cmis:name: My_Document-2-2
cmis:objectId: 129
--------------------------------------
cmis:contentStreamLength: 33364
cmis:name: My_Document-2-1
cmis:objectId: 128
--------------------------------------
cmis:contentStreamLength: 33506
cmis:name: My_Document-2-1
cmis:objectId: 112
--------------------------------------
cmis:contentStreamLength: 33567
cmis:name: My_Document-2-1
cmis:objectId: 106
--------------------------------------
cmis:contentStreamLength: 33230
cmis:name: My_Document-2-2
cmis:objectId: 107
--------------------------------------
cmis:contentStreamLength: 33774
cmis:name: My_Document-1-1
cmis:objectId: 115
--------------------------------------
cmis:contentStreamLength: 33524
cmis:name: My_Document-2-0
cmis:objectId: 121
--------------------------------------
cmis:contentStreamLength: 33593
cmis:name: My_Document-2-0
cmis:objectId: 111
--------------------------------------
cmis:contentStreamLength: 34152
cmis:name: My_Document-2-2
cmis:objectId: 123
--------------------------------------
cmis:contentStreamLength: 33332
cmis:name: My_Document-0-0
cmis:objectId: 133
--------------------------------------
cmis:contentStreamLength: 33478
cmis:name: My_Document-1-2
cmis:objectId: 116
--------------------------------------
cmis:contentStreamLength: 33541
cmis:name: My_Document-1-2
cmis:objectId: 132
--------------------------------------
cmis:contentStreamLength: 33225
cmis:name: My_Document-2-0
cmis:objectId: 127
--------------------------------------
cmis:contentStreamLength: 33333
cmis:name: My_Document-2-2
cmis:objectId: 113
--------------------------------------
cmis:contentStreamLength: 33698
cmis:name: My_Document-1-0
cmis:objectId: 114
--------------------------------------
cmis:contentStreamLength: 33746
cmis:name: My_Document-0-2
cmis:objectId: 135
--------------------------------------
cmis:contentStreamLength: 33455
cmis:name: My_Document-1-1
cmis:objectId: 131
--------------------------------------
--------------------------------------
Total number: 21
Has more: false
--------------------------------------

So that query returned three properties, cmis:objectId, cmis:name, and cmis:contentStreamLength, of every object in the repository that is of type cmis:document. We could have restricted the query further with a where clause that tested specific property values or even the full-text content of the files.

Now I also happen to be running Alfresco, which is an open source ECM repository. The beauty of CMIS is demonstrated by the fact that I can run that exact same Groovy script against Alfresco. I simply have to reconnect using Alfresco’s service URL, which is http://localhost:8080/alfresco/cmisatom (for the Atom Pub binding). My local Alfresco repository has many more objects than my OpenCMIS InMemory repository so I won’t list the output here, but the code runs successfuly unchanged.

Readers who spend their days enjoying standardized SQL that works across databases or the benefits of ORM tools that abstract their code from any specific relational database will no doubt be unimpressed by this feat. But I promise you that those of us who have to work with ECM repositories like SharePoint, Documentum, FileNet, and Alfresco, sometimes all on the same project, are rejoicing.

The next time you need to integrate with an ECM repository, CMIS should definitely be on your radar. I’ve created a list of CMIS resources to help you.

Five of your favorite Alfresco-related presentations

If views of my presentations on SlideShare are any indication, a whole lot of you are interested in integrating Drupal and Alfresco. Despite the fact that the presentation is four years old, it consistently makes the “most viewed” list out of my uploads. If you are considering Drupal but need something a bit more document-centric to serve up your files as part of that Drupal site, take a look:

With over 12,000 views, it is safe to say there is definitely something to the combination of Alfresco and Drupal.

Another apparent classic is:

Which is kind of scary given its age and brevity. I think the popularity of this is due to the seemingly inexhaustible demand for “getting started” resources for new Alfresco developers.

This one has similar info, but with more details, and is probably a better choice for developers trying to get an extremely high-level overview:

The CMIS API is now the preferred way to interact with the Alfresco repository remotely, and many people use this presentation to get a quick overview:

In fact, I’ll have a CMIS powerhouse panel on Tech Talk Live tomorrow (July 10, 2013). So if you are just getting started with CMIS, please join us.

If you like CMIS but you don’t want to fool around with your own server, you can use Alfresco in the Cloud. This deck gives a CMIS overview and discusses the Alfresco API at a high-level with links to sample code and screencasts:

Thanks to everyone who has made use of these presentations!

CMIS 1.1 is now an approved spec; Here’s a recap of what’s new

CMIS LogoWith a final, heroic push and get-out-the-vote campaign, the OASIS CMIS Technical Committee (TC), the committee that is responsible for moving the CMIS specification forward, was able to get enough votes on Thursday to ratify the next version of Content Management Interoperability Services (CMIS) as a standard. This is a seriously cool accomplishment for everyone on the TC and the entire Enterprise Content Management (ECM) industry because CMIS establishes an industry standard for working with repositories like Alfresco, Documentum, FileNet, Nuxeo, and SharePoint, and it is important that the spec continues to evolve.

CMIS 1.1 has some exciting new features. Here’s a re-cap of what’s new…

Browser Binding

A binding is the protocol a client uses to talk to a CMIS server. CMIS 1.0 supported two bindings, Web Services (SOAP) and AtomPub (RESTful XML), the latter being the most performant and the most popular. But if you’ve ever looked at the XML that comes back from a CMIS AtomPub call you know how verbose it can be. The Browser Binding is based on JSON, so the payloads that go between client and server are smaller, making it the fastest of the three bindings. The original purpose of the Browser Binding was to make it easy for those building “single page” web apps or doing other work with CMIS via client-side JavaScript, but I think apps of all types will move to the Browser Binding as quickly as possible simply because it is easier to work with.

Type Mutability

This allows CMIS developers to create and update the repository’s content model with code. Imagine you’re building an Accounts Payable solution. You’re using CMIS because you want your solution to run on top of any CMIS-compliant ECM repository. It is highly likely you will need content types to store objects with metadata specific to your solution. Before CMIS 1.1, you have to ship repository-specific content models and installation and configuration instructions with your app. With CMIS 1.1, developers can simply write install and config logic as part of the solution that will interrogate the repository to determine if any changes need to be made to the content model to support the solution, and if changes are required, implement them.

Secondary Types

Some repositories (like Alfresco) have the concept of free-floating or cross-cutting content types that group together related properties that can be added to object instances in the repository. For example, perhaps you want to define a “client-related” set of properties that can be added to any document in the repository that is related to one of your clients. Not all documents are related to a client, but the ones that are need to be able to refer to a client name or number or something. In Alfresco, these are called “aspects”. CMIS 1.0 didn’t support aspects natively so developers using CMIS to query for or set properties defined in an aspect had to use a workaround. In CMIS 1.1, aspects are supported natively.

New “Item” Type

In CMIS 1.0, Document objects are assumed to have a content stream. Some repositories even require it. In Alfresco, this means if you want to work with a type that inherits from something other than cm:content (or cm:folder), you are out-of-luck. CMIS 1.1 adds a new base object type called “Item” that represents objects that don’t have a file associated with it.

Bulk Updates

CMIS 1.1 adds a new feature that makes mass changes more performant. Instead of iterating over a list of objects, changing and saving each one, you can define a set of property changes and make those against an entire collection, which is much more efficient.

Append to Content Stream

A challenge with any ECM project is how to move large files into the repository. The new append to content stream feature in CMIS 1.1 allows you to send files to the repository in chunks which could be a key to addressing that challenge.

Retentions and Holds

This new feature allows you to set retention periods for a piece of content or place a legal hold on content through the CMIS 1.1 API. This is useful in compliance solutions like Records Management (RM). Honestly, I am not a big fan of this feature. It seems too specific to a particular domain (RM) and I think CMIS should be more general. If you are going to start adding RM features into the spec, why not add Web Content Management (WCM) features as well? And Digital Asset Management (DAM) and so on? I’m sure it is useful, I just don’t think it belongs in the spec.

So that’s what’s new in CMIS 1.1. You can read the authoritative spec for details.

When Will My Favorite Repository Support CMIS 1.1?

That’s up to each vendor. If these features are important to your installation or the solution you are building, you should be making it very clear to your vendor contacts that you want to see these features get priority over other things the engineering team might be working on. As for Alfresco, I can’t make any promises on dates. We’ve had experimental support for the browser binding in place for some time. I think we all want to see the other CMIS 1.1 features in both Alfresco on-premise and in the cloud sooner rather than later, but I don’t know when that will be.

Try It!

You can play with CMIS 1.1 by downloading the OpenCMIS InMemory Server from Apache Chemistry and a client library for your favorite language, or just launch the OpenCMIS Workbench and you’ll see the Browser Binding as an option when you connect. If you need to know more about CMIS and the client libraries, server frameworks, and CMIS development tools available at Apache Chemistry, you should buy the book Jay Brown, Florian Mueller, and I have been working on. It should be in print later this summer but you can get the eBook now. It covers both CMIS 1.0 and CMIS 1.1.

Alfresco Berlin Meetup Agenda

On Friday, May 10, we’ll be having a half-day meetup in Berlin, Germany in conjunction with the Codemotion conference happening at the same time. Everyone is welcome to attend and there is no cost, even if you are not registered for the Codemotion conference. You can register for the meetup here. The agenda will be as follows:

15:00 to 15:15 Welcome (Jeff Potts, Alfresco)
15:15 to 15:45 Introducing the Alfresco API (Jeff Potts)
15:45 to 16:15 Group Discussion: How Are You Using Alfresco? (All)
16:15 to 16:45 SmartWCM (Florian Maul, fme)
16:45 to 17:00 BREAK
17:00 to 17:30 Enhanced Script Import Tooling (Axel Faust, Prodyna)
17:30 to 18:00 Alfresco Workdesk (Bernhard Werner, Alfresco)
18:00 to 18:15 Invitation to Join the Community (Jeff Potts)
18:15 to 19:00 Bratwurst, Beer, & Networking

If you would like to present a 30-minute customer case study on how your organization implemented Alfresco, please let me know.

Earlier in the day I’ll be giving a talk at Codemotion Berlin on CMIS and Apache Chemistry in Action. So, if you are at Codemotion and you want to learn how to use an industry standard API to manage content in ECM repositories like SharePoint, FileNet, and Alfresco, come to my talk.

I hope to see you there!