Collaborative content creation with Amazon Mechanical Turk

Amazon’s Mechanical Turk has been intriguing to me since I first heard about it. I think it is because the idea of essentially having a workflow with tasks that can be handled by any one of potentially hundreds of thousands of people has mind-blowing potential.

If you’re not familiar, Mechanical Turk (MTurk) is essentially a marketplace that matches up work requests (called HITs) with human workers (called “Turkers”). The work requests are typically very short tasks that require human intelligence like identifying, labeling, and categorizing images or transcribing audio. Amazon is the middleman that matches up HITs with Turkers. From a coding standpoint, your app makes calls to Amazon’s Web Services API to submit requests and to respond to completed work. Turkers monitor the available HITs, select the ones that look interesting to them and then complete the tasks for which they are paid, usually pennies per task.

Sorting through images or performing other simple tasks is one thing, but what about more complex tasks, like, say, writing an article? Here’s a story about some guys who have created a framework called CrowdForge to do just that. CrowdForge is a Django implementation based on research that one of the authors did at Carnegie Mellon. In a nutshell, their approach splits complex problems into smaller problems until they are small enough to be successfully handled by MTurk, then aggregates the results to form the answer to the original problem. It’s Map Reduce applied to human tasks instead of data clusters.

You should read the original post, but to summarize it, the story talks about an experiment that the team did around collaborative content creation. They applied their framework to the task of writing travel articles. They split the task into 36 sub-tasks and gave each sub-task to an author, then aggregated the results into a coherent article. The partitioning, writing, and re-assembly (the “reduce” part of Map Reduce) was all done through Mechanical Turk by CrowdForge. Total cost for each article? About $3.26.

Then, for comparison, they assigned individual authors to write articles on the same topics using the traditional approach of one author per article paying roughly what they paid for the collaboratively created content. When the results were reviewed, the crowd sourced content beat the single author content in terms of quality. It’s important to note that in both cases, authors were Turkers. This wasn’t Mechanical Turk versus Rick Steves. But still, the researchers were able to use Mechanical Turk to break the problem down, perform each task, and then clean up the result, all for about the same cost without sacrificing quality. That’s pretty cool.

As you know, I’m a huge fan of Django, and I think it is more than okay for the presentation tier of a solution like this. But it seems like a workflow engine like Activiti or jBPM would be a better tool for implementing the actual process flow for a framework like CrowdForge because it could potentially mean less coding and maybe more accessibility by business analysts. Imagine using a process modeling tool to lay out your business process and then dropping in a “Mechanical Turk Partition Task” node, graphically connecting it with a “Mechanical Turk Map Task”, and then hooking that to a “Mechanical Turk Reduce Task”. In and around those you’re wiring up email notifications, internal review tasks, etc.

Metaversant has been working with a client who’s doing something very similar. Editors make writing assignments which are outsourced to Mechanical Turk. When the assignments are complete, they are published to one or more channels. Instead of the Django CrowdForge framework, we’re using Alfresco and the embedded jBPM workflow engine. Alfresco stores the content while the jBPM workflow engine orchestrates the process, making calls to Mechanical Turk and the publishing endpoints.

This approach can be generalized to apply to all kinds of problems beyond content authoring. If you are an Alfresco, jBPM, or Activiti user, and you have a business problem that might lend itself to being addressed by a micro task marketplace like Mechanical Turk, let me know. Maybe we can get my client to open source the specific integration between jBPM and Mechanical Turk. If you’ve already done something like this, let me know that too. I’m interested to hear how others might be integrating content repositories and BPM engines with Mechanical Turk.

3 comments

  1. Francois says:

    I was sure that somebody would have done that already 🙂 I think that there are a lot more use cases for workflow + crowdsourcing (look on the e-science side) but workflow engine would not support them right now.

  2. jpotts says:

    What do you see as the limits of workflow engines that keep them from being able to support those use cases?

  3. Sam says:

    Hi Jeff,
    I came across this article today, but I have been looking for such implementation for such a long time: mturk+Alfresco+CMIS+Liferay or Drupal. And your suggestion of using jBPM and Activity makes perfect sense.
    Per your suggestion, it would be great if you could get your client to open source at least the specific integration between jBPM and Mechanical Turk. If not, can you you help me with such a setup?
    Thanks
    Sam

Comments are closed.