Tag: Python

Announcement: Apache Chemistry cmislib 0.5.1 now available

The Apache Chemistry project is pleased to announce that cmislib 0.5.1 is now available (home, docs). Developers can use cmislib to write Python applications against any CMIS-compliant repository such as Alfresco, SharePoint, Nuxeo, and FileNet. You can download the client library from the Apache Chemistry cmislib home page or use Setup Tools to install the library quickly and easily.

This release features support for renditions, so if your repository supports things like thumbnails, you can retrieve a list of those for a given object. The new release also supports passing in arbitrary HTTP headers. That is one way to enable authentication scenarios beyond basic authentication such as OAuth2, which is the authentication mechanism Alfresco in the Cloud uses.

If you are brand new to CMIS, here are a few links to get you started:

In addition, I’ve been working on an Apache Chemistry and CMIS in Action book with Jay Brown and Florian Mueller. The book is available now through Manning’s early-access program.

Tips on Working with Google Fusion Tables

We had a need to see Alfresco forum users by geography. Google Fusion Tables provides the capability to see any geographic location stored in one or more columns on a map. We had successfully used this before for smaller batches of mostly static data, so I decided to see if it would work well for our forum data. This blog post is about what I did, including some useful tips for working with the Google Fusion Table API.

Determining the Location

First, I needed a city and country for each forum user. In our forums, users can declare their location, but not everyone does. So I wrote a little Python script that uses the MaxMind GeoLite database to determine a location for each user based on IP address. The script then compares the IP-determined location with the user’s declared location, and if they are different, it asks the person running the script to choose which one is likely to be more accurate. For example, the IP address based lookup might come back with “Suriname” but the user’s declared location is “Paramaribo, Suriname”, so you’d choose the latter. The script saves each decision so that it doesn’t have to ask again for the same comparison on this run or subsequent runs.

Loading the Data into Google Fusion Tables with Python

Once I had a city and country for each forum user I had to get those loaded into a Google Fusion Table. I found this Python-based Fusion Tables client and it worked quite nicely.

Here are a few tips that might save you some time when you are working with Google Fusion Tables, regardless of the client-side language…

Don’t Update–Drop, then Add

I started by trying to be smart about updating existing records rather than inserting new ones. But this meant that for each row, I had to do a query to test for the existence of a match and then do an update. This was incredibly slow, especially because you can’t do bulk updates (see next point).

So every time I run an update, the script first clears out the table. That means I load the entire dataset every time there is an update, but that is much faster than the update-if-present-otherwise-insert approach.

Batch Your Queries

The Google Fusion Tables API supports bulk operations. You can execute up to 500 at-a-time, if I recall correctly. This is a huge time-saver. My script just adds the insert statements to a list, and when it gets 500 (or runs out of inserts) it joins the list on “;” and then executes the batch with a single call to the Fusion Tables API.

The one drawback, as mentioned in the previous point is that it does not support bulk updates–only inserts are supported. But with the performance gain of bulk operations, I don’t mind clearing out the table and re-inserting.

Throttle Your Requests

If the script exceeds 30 requests per minute it is highly likely you will get rate-limited. So it is important to throttle your requests. I found that a 2.5 second wait between queries was fine and because the queries are batched 500 at-a-time, it really isn’t a big deal to wait.

Geocoding Takes Time

So the whole thing is pretty slick but there is a small pain. Because all rows get dropped every time I load the table, every row has to be geocoded and that takes time. I believe there is an API call to ask the table to be geocoded but I haven’t found that to work reliably. Instead, I have to go to the table in my browser and tell Fusion Tables to geocode the table. This takes a LONG time. For a table of about 10,000 rows it could easily take 45 minutes or more. At least it is something I can kick off and let run. I only update the table once a month. If it were more often, it would be an issue.

Voila!

That’s it! Thanks to Python and Google Fusion Tables, I now have an interactive map of forum users. Not only is it useful to use interactively, it also lets me run geographic queries against it from Python, such as, “find me the 20 forum users with more than X posts who work within a 20 mile radius of this spot” which can be handy for doing local community outreach.

cmislib extension supports Alfresco aspects

I can’t believe I didn’t know about this sooner. It completely passed me by. Patrice Collardez created an extension for cmislib that gives it the capability to work with aspects. Patrice’s version works with cmislib 0.4.1. I cloned it and made the updates necessary for it to work with cmislib 0.5.

What this means is that you can now use Python and cmislib to work with Alfresco aspects. Patrice’s extension adds “addAspect”, “removeAspect” and “getAspects” to Document and Folder objects. It also allows you to call getProperties and updateProperties on Folders and Documents even when those properties are defined in an aspect.

Check it out:

properties = {}
properties['cmis:objectTypeId'] = "D:sc:whitepaper"
properties['cmis:name'] = fileName

docText = "This is a sample " + TYPE + " document called " + NAME

doc = folder.createDocumentFromString(fileName, properties, contentString=docText, contentType="text/plain")

# Add two custom aspects and set aspect-related properties
doc.addAspect('P:sc:webable')
doc.addAspect('P:sc:productRelated')
props = {}
props['sc:isActive'] = True
props['sc:published'] = datetime.datetime(2007, 4, 1)
props['sc:product'] = 'SomePortal'
props['sc:version'] = '1.1'
doc.updateProperties(props)

Also, if you saw the webinar yesterday you know I showed some Python examples in the shell, but I then switched over to some OpenCMIS Java examples in Eclipse that I included in the custom content types tutorial. I didn’t want my fellow Pythonistas to feel neglected, so I ported those OpenCMIS examples to Python. Grab them here.

The examples assume you also have Patrice’s extension installed (my clone if you are using cmislib 0.5). If you don’t want to use Patrice’s extension for some reason, just comment out the “import cmislibalf” statement as well as the lines in the createTestDoc method that deal with aspects and aspect-defined properties. You should then be able to run the examples in straight cmislib.

If you don’t have cmislib you can install it by typing “easy_install cmislib”.

Apache Chemistry cmislib 0.4 incubating now available

Apache Chemistry LogoThe Apache Chemistry development team is pleased to announce that the 0.4 incubating release of cmislib, the Python client API for CMIS, is now available for download. You may have to use one of the backup servers until the mirrors fully update. Alternatively, you can use easy_install to install cmislib by typing “easy_install cmislib”.

This release has various fixes and enhancements that the community has contributed since cmislib joined the Apache Chemistry project with its 0.3 release. If you are using Alfresco, you might be interested in an enhancement in cmislib 0.4 that makes it possible to use ticket-based authentication instead of basic auth.

For those who haven’t used it, cmislib makes it easy to work with CMIS-compliant repositories from Python.

Updated Python CMIS library released

I’ve tagged and released a new version of cmislib, the Python CMIS client library. What’s cool about this release is that it is the first one known to work with more than one CMIS provider. Yea for interoperability! The beauty of CMIS, realized! Okay, it wasn’t that beautiful, it’s still “0.1”, and there are known issues. But I can now say the library works with both Alfresco and IBM FileNet and that’s a Good Thing.

IBM was a big help with this. Al Brown, one of the CMIS spec leads turned one of his colleagues, Jay Brown, onto cmislib. Jay called me up and asked, “If I give you access to a FileNet P8 server, can you test cmislib against it?” I was on it faster than you could say, “unittest.main()”.

I think the effort was valuable for all sides. Our little “mini plugfest” turned up issues in my client as well as both CMIS providers. Jay worked hard to chase down everything on the FileNet side. Dave Caruana chased a few down on the Alfresco side as well. Thanks to everyone for the team effort.

Anyway, give the new cmislib release a try and give me your feedback. If you want a feel for how easy it can be to work with CMIS repositories using the cmislib API, check out the documentation or dive right in. Installation is as easy as “easy_install cmislib” (easy_install instructions).

Next up is Nuxeo. Can the open source ECM vendor achieve cmislib Unit Test Greatness faster than Big Blue? We shall see!

cmislib: A CMIS client library for Python

I’ve started a new project on Google Code called cmislib. It is an interoperable client library for CMIS in Python that uses the Restful AtomPub Binding of a CMIS provider to perform CRUD and query functions on the repository.

I created it for a couple of reasons. First, it’s been bugging me that, unlike our Drupal Alfresco integration, our Django Alfresco integration does not use CMIS. After talking it over with one of our clients we decided it would make more sense to create a more general purpose CMIS API for Python that Django (and any other Python app) could leverage, rather than build CMIS support directly into the Django Alfresco integration.

Second, around the time I was putting together the Getting Started with CMIS tutorial, it struck me that there needed to be an API that didn’t have a lot of dependencies and was very easy to use. Otherwise, it’s too easy to get lost in the weeds and miss the whole point of CMIS: Easily working with rich content repositories, regardless of the underlying implementation.

Even if you’ve never worked with Python before, it is super easy to get started with cmislib. The install is less than 3 steps and the API should feel very natural to anyone that’s worked with a content repository before. Check it out.

Install

  1. If you don’t have Python installed already, do so. I’ve only tested on Python 2.6 so unless you’re looking to help test, stick with that.
  2. If you don’t have setuptools installed already, do so. It’s a nice tool to use for installing Python packages.
  3. Once setuptools is installed, type easy_install cmislib

That’s all there is to it. Now you’re ready to connect to your favorite CMIS-compliant repository.

Examples

There’s nothing in cmislib that is specific to any particular vendor. Once you give it your CMIS provider’s service URL and some credentials, it figures out where to go from there. But I haven’t tested with anything other than Alfresco yet, and this thing is still hot out of the oven. If you want to help test it against other CMIS 1.0cd04 repositories I’d love the help.

Anyway, let’s look at some examples using Alfresco’s public CMIS repository.

  1. From the command-line, start the Python shell by typing python then hit enter.
  2. Python 2.6.3 (r263:75183, Oct 22 2009, 20:01:16)
    GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
  3. Import the CmisClient and Repository classes:
  4. >>> from cmislib.model import CmisClient, Repository
  5. Point the CmisClient at the repository’s service URL
  6. >>> client = CmisClient('http://cmis.alfresco.com/s/cmis', 'admin', 'admin')
  7. Get the default repository for the service
  8. >>> repo = client.getDefaultRepository()
    >>> repo.getRepositoryId()
    u'83beb297-a6fa-4ac5-844b-98c871c0eea9'
  9. Get the repository’s properties. This for-loop spits out everything cmislib knows about the repo.
  10. >>> repo.getRepositoryName()
        u'Main Repository'
    >>> info = repo.getRepositoryInfo()
    >>> for k,v in info.items():
        ...     print "%s:%s" % (k,v)
        ...
        cmisSpecificationTitle:Version 1.0 Committee Draft 04
        cmisVersionSupported:1.0
        repositoryDescription:None
        productVersion:3.2.0 (r2 2440)
        rootFolderId:workspace://SpacesStore/aa1ecedf-9551-49c5-831a-0502bb43f348
        repositoryId:83beb297-a6fa-4ac5-844b-98c871c0eea9
        repositoryName:Main Repository
        vendorName:Alfresco
        productName:Alfresco Repository (Community)

Once you’ve got the Repository object you can start working with folders.

  1. Create a new folder in the root. You should name yours something unique.
  2. >>> root = repo.getRootFolder()
    >>> someFolder = root.createFolder('someFolder')
    >>> someFolder.getObjectId()
    u'workspace://SpacesStore/91f344ef-84e7-43d8-b379-959c0be7e8fc'
  3. Then, you can create some content:
  4. >>> someFile = open('test.txt', 'r')
    >>> someDoc = someFolder.createDocument('Test Document', contentFile=someFile)
  5. And, if you want, you can dump the properties of the newly-created document (this is a partial list):
  6. >>> props = someDoc.getProperties()
    >>> for k,v in props.items():
    ...     print '%s:%s' % (k,v)
    ...
    cmis:contentStreamMimeType:text/plain
    cmis:creationDate:2009-12-18T10:59:26.667-06:00
    cmis:baseTypeId:cmis:document
    cmis:isLatestMajorVersion:false
    cmis:isImmutable:false
    cmis:isMajorVersion:false
    cmis:objectId:workspace://SpacesStore/2cf36ad5-92b0-4731-94a4-9f3fef25b479
  7. You can also use cmislib to run CMIS queries. Let’s find the doc we just created with a full-text search. (Note that I’m currently seeing a problem with Alfresco in which the CMIS service returns one less result than what’s really there):
  8. >>> results = repo.query("select * from cmis:document where contains('test')")
    >>> for result in results:
    ...     print result.getName()
    ...
    Test Document2
    example test script.js
  9. Alternatively, you can also get objects by their object ID or their path, like this:
  10. >>> someDoc = repo.getObjectByPath('/someFolder/Test Document')
    >>> someDoc.getObjectId()
    u'workspace://SpacesStore/2cf36ad5-92b0-4731-94a4-9f3fef25b479'

Set Python loose on your CMIS repository

These are just a few examples meant to give you a feel for the API. There are several other things you can do with cmislib. The package comes with documentation so look there for more info. If you find any problems and you want to pitch in, you can check out the source from Google Code and create issues there as well.

Give this a try and let me know what you think.

[UPDATE: I had the wrong URL for the Alfresco-hosted CMIS service. It’s fixed now.]

Reminder: DFW Alfresco Meet-up is Monday

Don’t forget to sign-up for the first ever DFW Alfresco Meet-up. It’s happening Monday, 3/9 at Ackerman McQueen over in Las Colinas. Plan to arrive around 5:30 and we’ll start our first topic at 6:00. We’ll hear about Ackerman McQueen’s recent Alfresco WCM-based project as well as the portal implementation built on Alfresco DM and Django (a Python-based framework) from the folks over at Neiman Marcus.

We’re letting Optaros pick up the tab on food and drinks so if you’re doing an Alfresco project right now or considering it, you need to join us. Come share what you’ve learned with others and maybe leave with a few new ideas as well.

Address and directions are on the sign-up page.