Category: Django

Django is a Python-based web application development framework that’s highly-productive and fun to work with.

Collaborative content creation with Amazon Mechanical Turk

Amazon’s Mechanical Turk has been intriguing to me since I first heard about it. I think it is because the idea of essentially having a workflow with tasks that can be handled by any one of potentially hundreds of thousands of people has mind-blowing potential.

If you’re not familiar, Mechanical Turk (MTurk) is essentially a marketplace that matches up work requests (called HITs) with human workers (called “Turkers”). The work requests are typically very short tasks that require human intelligence like identifying, labeling, and categorizing images or transcribing audio. Amazon is the middleman that matches up HITs with Turkers. From a coding standpoint, your app makes calls to Amazon’s Web Services API to submit requests and to respond to completed work. Turkers monitor the available HITs, select the ones that look interesting to them and then complete the tasks for which they are paid, usually pennies per task.

Sorting through images or performing other simple tasks is one thing, but what about more complex tasks, like, say, writing an article? Here’s a story about some guys who have created a framework called CrowdForge to do just that. CrowdForge is a Django implementation based on research that one of the authors did at Carnegie Mellon. In a nutshell, their approach splits complex problems into smaller problems until they are small enough to be successfully handled by MTurk, then aggregates the results to form the answer to the original problem. It’s Map Reduce applied to human tasks instead of data clusters.

You should read the original post, but to summarize it, the story talks about an experiment that the team did around collaborative content creation. They applied their framework to the task of writing travel articles. They split the task into 36 sub-tasks and gave each sub-task to an author, then aggregated the results into a coherent article. The partitioning, writing, and re-assembly (the “reduce” part of Map Reduce) was all done through Mechanical Turk by CrowdForge. Total cost for each article? About $3.26.

Then, for comparison, they assigned individual authors to write articles on the same topics using the traditional approach of one author per article paying roughly what they paid for the collaboratively created content. When the results were reviewed, the crowd sourced content beat the single author content in terms of quality. It’s important to note that in both cases, authors were Turkers. This wasn’t Mechanical Turk versus Rick Steves. But still, the researchers were able to use Mechanical Turk to break the problem down, perform each task, and then clean up the result, all for about the same cost without sacrificing quality. That’s pretty cool.

As you know, I’m a huge fan of Django, and I think it is more than okay for the presentation tier of a solution like this. But it seems like a workflow engine like Activiti or jBPM would be a better tool for implementing the actual process flow for a framework like CrowdForge because it could potentially mean less coding and maybe more accessibility by business analysts. Imagine using a process modeling tool to lay out your business process and then dropping in a “Mechanical Turk Partition Task” node, graphically connecting it with a “Mechanical Turk Map Task”, and then hooking that to a “Mechanical Turk Reduce Task”. In and around those you’re wiring up email notifications, internal review tasks, etc.

Metaversant has been working with a client who’s doing something very similar. Editors make writing assignments which are outsourced to Mechanical Turk. When the assignments are complete, they are published to one or more channels. Instead of the Django CrowdForge framework, we’re using Alfresco and the embedded jBPM workflow engine. Alfresco stores the content while the jBPM workflow engine orchestrates the process, making calls to Mechanical Turk and the publishing endpoints.

This approach can be generalized to apply to all kinds of problems beyond content authoring. If you are an Alfresco, jBPM, or Activiti user, and you have a business problem that might lend itself to being addressed by a micro task marketplace like Mechanical Turk, let me know. Maybe we can get my client to open source the specific integration between jBPM and Mechanical Turk. If you’ve already done something like this, let me know that too. I’m interested to hear how others might be integrating content repositories and BPM engines with Mechanical Turk.

Join us for CMSGeekUpDFW on 2/24

Every month, a handful of us CMS Geeks from around Dallas-Ft. Worth get together to have a beer or two and talk about content management. This month’s meeting is on Thursday, February 24 at 7:00p. We’re going to be talking about Django, a highly-productive python-based web application framework. If you’re going to be in the area (our meeting spots bounce around–this one will be at Cohabitat in Uptown Dallas) you should join in the discussion. Please RSVP so we know you’re coming.

cmislib: A CMIS client library for Python

I’ve started a new project on Google Code called cmislib. It is an interoperable client library for CMIS in Python that uses the Restful AtomPub Binding of a CMIS provider to perform CRUD and query functions on the repository.

I created it for a couple of reasons. First, it’s been bugging me that, unlike our Drupal Alfresco integration, our Django Alfresco integration does not use CMIS. After talking it over with one of our clients we decided it would make more sense to create a more general purpose CMIS API for Python that Django (and any other Python app) could leverage, rather than build CMIS support directly into the Django Alfresco integration.

Second, around the time I was putting together the Getting Started with CMIS tutorial, it struck me that there needed to be an API that didn’t have a lot of dependencies and was very easy to use. Otherwise, it’s too easy to get lost in the weeds and miss the whole point of CMIS: Easily working with rich content repositories, regardless of the underlying implementation.

Even if you’ve never worked with Python before, it is super easy to get started with cmislib. The install is less than 3 steps and the API should feel very natural to anyone that’s worked with a content repository before. Check it out.


  1. If you don’t have Python installed already, do so. I’ve only tested on Python 2.6 so unless you’re looking to help test, stick with that.
  2. If you don’t have setuptools installed already, do so. It’s a nice tool to use for installing Python packages.
  3. Once setuptools is installed, type easy_install cmislib

That’s all there is to it. Now you’re ready to connect to your favorite CMIS-compliant repository.


There’s nothing in cmislib that is specific to any particular vendor. Once you give it your CMIS provider’s service URL and some credentials, it figures out where to go from there. But I haven’t tested with anything other than Alfresco yet, and this thing is still hot out of the oven. If you want to help test it against other CMIS 1.0cd04 repositories I’d love the help.

Anyway, let’s look at some examples using Alfresco’s public CMIS repository.

  1. From the command-line, start the Python shell by typing python then hit enter.
  2. Python 2.6.3 (r263:75183, Oct 22 2009, 20:01:16)
    GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
  3. Import the CmisClient and Repository classes:
  4. >>> from cmislib.model import CmisClient, Repository
  5. Point the CmisClient at the repository’s service URL
  6. >>> client = CmisClient('', 'admin', 'admin')
  7. Get the default repository for the service
  8. >>> repo = client.getDefaultRepository()
    >>> repo.getRepositoryId()
  9. Get the repository’s properties. This for-loop spits out everything cmislib knows about the repo.
  10. >>> repo.getRepositoryName()
        u'Main Repository'
    >>> info = repo.getRepositoryInfo()
    >>> for k,v in info.items():
        ...     print "%s:%s" % (k,v)
        cmisSpecificationTitle:Version 1.0 Committee Draft 04
        productVersion:3.2.0 (r2 2440)
        repositoryName:Main Repository
        productName:Alfresco Repository (Community)

Once you’ve got the Repository object you can start working with folders.

  1. Create a new folder in the root. You should name yours something unique.
  2. >>> root = repo.getRootFolder()
    >>> someFolder = root.createFolder('someFolder')
    >>> someFolder.getObjectId()
  3. Then, you can create some content:
  4. >>> someFile = open('test.txt', 'r')
    >>> someDoc = someFolder.createDocument('Test Document', contentFile=someFile)
  5. And, if you want, you can dump the properties of the newly-created document (this is a partial list):
  6. >>> props = someDoc.getProperties()
    >>> for k,v in props.items():
    ...     print '%s:%s' % (k,v)
  7. You can also use cmislib to run CMIS queries. Let’s find the doc we just created with a full-text search. (Note that I’m currently seeing a problem with Alfresco in which the CMIS service returns one less result than what’s really there):
  8. >>> results = repo.query("select * from cmis:document where contains('test')")
    >>> for result in results:
    ...     print result.getName()
    Test Document2
    example test script.js
  9. Alternatively, you can also get objects by their object ID or their path, like this:
  10. >>> someDoc = repo.getObjectByPath('/someFolder/Test Document')
    >>> someDoc.getObjectId()

Set Python loose on your CMIS repository

These are just a few examples meant to give you a feel for the API. There are several other things you can do with cmislib. The package comes with documentation so look there for more info. If you find any problems and you want to pitch in, you can check out the source from Google Code and create issues there as well.

Give this a try and let me know what you think.

[UPDATE: I had the wrong URL for the Alfresco-hosted CMIS service. It’s fixed now.]

Book Review: Django 1.0 Web Site Development

I just finished “Django 1.0 Web Site Development” by Ayman Hourieh (2nd Ed). Packt sent me a copy and I was happy to read it. I thoroughly enjoy Python and Django and would like to see it more broadly adopted, so anything that helps people get started is a good thing.

Ayman’s book is clean and crisp. The book is built around a single cohesive example: Building a Delicious-style collaborative bookmark application with Django. That was a great pick for an example app. It’s easy to get your head around yet broad enough to provide good coverage of the topic. I also liked that Ayman threw in some JQuery to add a few AJAX features to the app. It was just the right amount to give you the idea without turning into a book on AJAX or UX development. Readers are similarly spared from wasting too much time on look-and-feel. CSS is kept to a minimum so that the focus remains squarely on Django.

Ayman flows logically from topic to topic. The book starts with a simple example and then gradually adds features until you’ve got a decent little app by the end. Within and between topics, the reader always has a good feel for what’s going on and what’s coming next. Initially, I was surprised that the admin UI–one of the cool time-saving features of Django that comes out of the box–was covered so late in the book, but later I decided that Ayman’s decision to focus on the shell to show API examples and test the app’s back-end was the right way to go. Chapter 12 is pretty weak–I would have traded most of the “ideas for evolving your bookmarking app” content for a discussion around Django on Google App Engine or maybe go deeper on some of the more interesting topics in that chapter that are only briefly covered, but that’s a minor nitpick.

I really enjoyed this book. Can you tell? Part of it is that Python and Django are such a pleasure to work with. The book itself is almost a metaphor: It’s concise (250 pages), well-written, and fun. If you’re new to the framework this book is a good way to see what all of the fuss is about.

Django + Alfresco was a winning combination for retailer’s intranet

Last week I spent some time with one of our clients talking about what it’s been like to live with their Intranet platform based on Django and Alfresco. The conversation got me really excited about what they’ve been able to do since the original implementation and where they are heading.

The client is a well-known, high-end retailer based in Dallas. About a year ago they engaged Optaros to replatform their intranet from a legacy Java portal product to something more agile. They had seen Alfresco and liked it as a core repository, but needed something for the presentation tier (See “Alfresco User Interface: What are my options?“).

The Optaros team worked with the client to consider many options, including open source Java portal servers. The client felt like they needed something lighter and more flexible than a portal server. They were willing to do a lot of the presentation work themselves in exchange for complete design freedom and yet still be enough of a framework to be highly productive. The winning solution turned out to be Django.

Python? No problem.

I was initially worried that introducing a Python-based framework into a Java shop was going to be a problem but they weren’t married to Java. Our team got them up-to-speed quickly and they never looked back. It also helped that the client’s intranet sites were very communication-centric which matched up well with Django’s newspaper heritage.

Here’s how they use the solution in a nutshell:

  • Content owners use Alfresco Explorer to upload HTML chunks, office documents, and images, set metadata, and submit content for review. This triggers any number of rules that automatically process the changed content (e.g., creating thumbnails, extracting metadata, converting images to a consistent type, creating PDFs from office documents).
  • Content owners and reviewers can use Alfresco’s “custom views” to preview the content chunk in the context of the front-end site.
  • Site designers lay out site pages and create components using the Django template system, CSS, JQuery, and other front-end libraries.
  • Content publishers use the Django administration UI to map areas on the site to categories, folders, and objects in the Alfresco repository–Alfresco has no idea where or how the chunks are being used. This means the repository tier is truly decoupled from the presentation tier, allowing the client to reuse content across multiple areas of the site and across multiple sites within the enterprise.
  • Designers leverage a Django tag library to create dynamic areas of a page (e.g., when the page is rendered, retrieve all of the content chunks in this particular category from the repository). Django calls Alfresco web scripts to get and post data. The web scripts respond with serialized Django XML which Django caches and then deserializes into Django objects that the front-end can work with.

Separate concerns, play to strengths

The thing to notice about the Alfresco piece is how it sticks to core Alfresco capabilities: Metadata, rules, search, basic workflows, transformers/extractors, presentation templates, web scripts, DM repository. This is straight out of the Alfresco best practices playbook and aligns the client well with Alfresco product direction. A nice enhancement would be to refactor the Django-Alfresco integration to use CMIS which is something we are considering for the open source version of the integration (Screencast, Code).

Agile intranet, happy team

Since the initial rollout, the client has been able to make changes and roll out new sites quickly and easily thanks to the productivity inherent in the Django framework and the clean separation between the front-end app and the repository. Unexpected benefits the client mentioned were how fast they can add new features to the administrative UI (a core admin UI gets built for you automatically by Django) and the ease with which the development team can stand up a new environment.

The language the client team used to describe their work since the rollout summed it up best. They were using words like “beautiful” and “a real pleasure to work with”. When was the last time you heard those sentiments expressed about a WCM implementation?

Alfresco-Django integration now available on Google Code

The Alfresco-Django code I demo’d in the screencast yesterday is available at Google Code. It includes the core Django integration, the sample site, an AMP file you can use to deploy the web scripts and the sample site bootstrap data to Alfresco, and documentation which you can build using Sphinx.

This should work with Alfresco Labs 3D Stable, Alfresco 3.0.1 Enterprise, and Alfresco 3.1 Enterprise.

My Optaros colleague, Sean Creeley, did most of the work, so thanks, Sean. Obviously, thanks to Justin, JC, and the rest of the Neiman Marcus team as well.

This is the initial public release of this thing so we welcome feedback in all forms, whether that’s suggestions for the roadmap, bug reports/fixes, enhancements, doc, etc. With your help, I think we could make this a really sweet Alfresco front-end development kit.

Screencast: Alfresco Django integration

I’ve created a screencast over at Optaros Labs that shows a simple web site, powered by Django, that pulls all of its content from Alfresco.

At Optaros, we see Django and Alfresco as a powerful combination for building content-centric applications. The integration shown in the screencast is based on work we did for our friends at Neiman Marcus. An open source version of this integration will be available within a week or so.

Drupal, Django, and Alfresco in Chicago

I’ll be in Chicago tomorrow for the Alfresco Meetup. I’ll be speaking during the Barcamp on Alfresco and Drupal integration with CMIS (module, screencast). I’ll also have the Alfresco-Django integration running on my laptop. I may not have time to show Alfresco-Django during my slot, but I’ll be happy to stick around and do informal demos and talk about either integration if you’re interested because I’d like your feedback on it.