Year: 2011

Trying out Activiti: Examples that leverage Alfresco’s new workflow engine

I’ve been playing with Activiti. It’s an open source, BPMN 2.0 compliant business process engine. The project is sponsored by Alfresco, who hired Tom Baeyens and Joram Barrez, the founders of jBPM, to create the Apache-licensed engine (take a look at the rest of Activiti’s all-star cast).

The first thing I did was head over to Activiti’s site and read through the user guide. I followed the tutorial and got a standalone instance of Activiti going with very little fuss. The concepts and terminology aren’t terribly different from jBPM, so if you’ve used jBPM, you’ll be familiar with the basics of Activiti in no time. The user guide is well-written so I urge everyone to start there.

Last week, Alfresco released a preview release of their Community product, labeled 3.4.e. This release, which I stress is only for preview purposes, was made available to let everyone get a first look at Alfresco’s integration of Activiti. If you watched the screencast showing an Alfresco workflow based on Activiti you may have thought, “Gee, that looks just like a jBPM-based workflow,” and you’re right–from a user standpoint, it is nearly identical. The difference, of course, is how the processes are described and the underlying implementation that executes the processes.

The screencast showed that the end users won’t see much of a change. That’s good, but I was anxious to find out how big a deal this transition will be from a developer’s perspective. The 3.4.e release gave me the perfect opportunity to dig in. I decided to take the examples from the Advanced Workflow chapter in the Alfresco Developer Guide (2008, Packt) and make them work with Alfresco’s embedded Activiti engine in 3.4.e. In this post, I’ll talk about how that went and I’ll give you the code so you can try it out yourself.

The code that accompanies this blog post includes the same set of four workflows implemented both in jBPM and Activiti as well as a readme that explains how to install and run everything. I’ll let you inspect that to see what the exact differences are rather than go over them here. Instead, I’ll spend the rest of the post covering the major differences in general.

Before we go any further, I guess we should have a quick terminology discussion. First, in jBPM, everything is a node. Specialized node types do different things like joins, splits, decisions, wait-states, sub-processes, and enclose tasks that get assigned to humans. In Activiti (and really, in BPMN) there are essentially events (start, stop, timer), tasks, and gateways. Of course, I’m simplifying greatly here–you should read the spec and the Activiti user guide. The important thing to note for people coming from jBPM is that in Activiti a “task” might be something a human does (“userTask”) or it could be automated (“scriptTask”, “serviceTask”, etc.). In jBPM connections between nodes are called “transitions” while in Activiti they are called “sequenceFlows”.

Designing Processes

I use Eclipse, so the first step was to get the Activiti BPMN 2.0 Designer plug-in working. Installation is well-documented on the Activiti wiki and it installs just like any other Eclipse plug-in, so it went fairly smooth. I had some sort of dependency conflict that I had to deal with, but nothing major.

All in all, designing processes in Activiti works just like it does in jBPM. The tool is different, but you’re still laying out a business process graphically, connecting steps in the workflow, and setting properties on those objects.

There are some known issues with the Designer that made creating and editing processes painful at times. I’m not going to call every one of those out in this post because this is a preview release–I expected to work through a few bumps. I will warn you of a few to hopefully save you some time:

  • You cannot save the diagram until it is syntactically correct. This means the BPMN 2.0 XML will not get generated until the diagram is correct. On a new process, when the editor complains about the diagram, you’d kind of like to just drop in to the XML source and fix what needs fixing. If that’s what you want to do, you have to open the .activiti file in the XML editor, make the change, re-open in the diagram, and then make a change and save to force the generation of the BPMN 2.0 XML.
  • You cannot change things like IDs, names, form keys, and task assignment in the BPMN 2.0 XML. You have to change these in the Activiti diagram. If you change the BPMN 2.0 XML the settings in the Activiti diagram will overwrite the BPMN XML. This doesn’t sound like a big deal until you come across the next issue.
  • There is a known problem enabling the properties for an object in the diagram: clicking an object in the diagram doesn’t refresh the properties view. I worked around it by first clicking some other tab in the properties view, then double-clicking on the object (and sometimes repeating that) until the properties view refreshed with the appropriate property set.

Again, I didn’t expect everything to be fully functional, so I am not complaining. I just want you to have your expectations properly set when you play with this on your own.

I should mention that the overall look-and-feel of the Activiti Designer seems a lot crisper and more visually appealing than the JBoss Graphical Process Designer (GPD) Eclipse plug-in. As an example, I loved the alignment helper rules. And I liked that you can bend sequence flows.

Adding Business Logic to Processes

My goal was to take four Alfresco jBPM processes and port them to Activiti. The first three are variations on Hello World. The fourth is a more real-life process that is used to review and approve whitepapers. In the book, the Publish Whitepaper workflow uses an action to set properties on the approved whitepaper. And I show how to combine a wait state with a mail action and a web script to allow third parties without direct access to Alfresco to participate in a workflow. For the initial cut at this exercise, I skipped all of that. For now, I really wanted to focus on the basics of the workflow engine. But the state idea and the web script interaction are interesting so I’ll do that later and will provide the update in a future blog post.

Challenge 1: Alfresco JavaScript in automated steps

The first problem I came to was how to handle workflow steps that have no human intervention. In jBPM those steps are implemented as nodes. Alfresco JavaScript can live inside events within the node or on transitions between nodes. Tasks assigned to users are typically enclosed in a task-node. In Activiti, tasks assigned to users are called userTasks. All of Alfresco’s sample Activiti workflows consist entirely of userTasks. But Activiti includes several node types that aren’t user tasks: a scriptTask uses JavaScript or Groovy to implement its logic and a serviceTask delegates to a Java class. My helloWorld processes consist entirely of automated steps, so a scriptTask sounded good to me. The problem was that scriptTask uses Activiti’s JavaScript implementation, not Alfresco’s JavaScript. So doing something simple like invoking the “logger” root object doesn’t work in a scriptTask.

Fine, I thought, I’ll use one of Alfresco’s listener classes to wrap my logger call and stick that listener in the scriptTask. But that didn’t work either because in the current release Alfresco’s listener classes don’t fully implement the interface necessary to run in a scriptTask.

After confirming these issues with the Activiti guys I decided I’d put my Alfresco JavaScript in listeners either on a userTask or on a sequenceFlow (we called those “transitions” in jBPM) depending on what I needed to do. Hopefully at some point we’ll be able to use scriptTask for Alfresco JavaScript because there are times when you need automated steps in your process that can deal with the Alfresco JavaScript root objects you’re used to.

Challenge 2: Processes without user tasks

As I mentioned, my overly simple Hello World examples are nothing but automated steps. I could implement those without userTasks by placing my Alfresco JavaScript on sequenceFlows. But Alfresco complained when I tried to run workflows that didn’t contain at least one user task. I didn’t debug this, and it is possible I could have worked through it, but I decided for now, the Activiti versions of my Hello World examples would all have at least one userTask.

Challenge 3: Known issue causes iBatis exceptions

In 3.4.e, there is a known issue in which user tasks will cause read-only iBatis exceptions unless you set the due date and priority. Search my examples for “ACT-765” to find the workaround.

Challenge 4: Letting a user pick between multiple output paths

Suppose you have a task in which a human must decide whether to “Approve” or “Reject”. In Alfresco jBPM, you’d simply have two transitions and you’d set the label for those transitions in a properties bundle. In Alfresco Activiti that is handled a bit differently. Instead of having two transitions leaving the task, you have a single transition to an “exclusive gateway” (called a “decision”, in polite company). The task presents the “outcome” options–in this case “Approve” and “Reject”–to the user in a dropdown, as if it were any other piece of metadata on the task. Once the user picks an outcome and completes the task, the exclusive gateway checks the outcome value and takes the appropriate sequence flow. This difference will impact your business process logic, your workflow content model, and your end user experience so it is a significant difference.

For comparison, here’s what this looks like in the Alfresco Explorer UI for jBPM (click to enlarge):

And here is what it looks like in the Alfresco Explorer UI for Activiti (click to enlarge):

So in Explorer, with jBPM, the user can just click “Approve” or “Reject” while in Activiti, the user must make a dropdown selection and then click “Next”.

Here is the same task managed through the Alfresco Share UI for jBPM:

Versus Alfresco Share for Activiti:

Similar to the Explorer differences, in Share, with jBPM, the user gets a set of buttons while with Activiti, the user makes a dropdown selection.

One open question I have about this is how to localize the transition steps for Activiti workflows if the steps are stored as constraints in the content model. On a past client project we implemented a Share-based customization to localize constraint list items but our approach won’t work in Explorer. Maybe the Activiti guys can help me out on that one.

Exposing Process to the Alfresco User Interface

And that brings us to user interface configuration. Overall, the process is exactly the same. First, you work on your process definition, then you create a workflow content model. Once the workflow content model is in place, you expose it to the user interface through the normal Alfresco user interface configuration approach. For the Explorer client that means web-client-config-custom. For the Share client that means share-config-custom. Labels, workflow titles, and workflow descriptions are localized via properties bundles.

One minor difference is that in jBPM, task names are identical to corresponding type names in your workflow content model. In Activiti, a userTask has an attribute called “activiti:formKey” that is used to map the task to the appropriate content type in the workflow content model.

Assigning Tasks to Users and Groups

The out-of-the-box workflows for both jBPM and Activiti show how to use pickers to let workflow initiators assign users and groups to workflows. My example workflows use hardcoded references rather than pickers so that you’ll have an example of both approaches. In my Hello World examples, I assign the userTask to the workflow initiator. This is done by using the “activiti:assignee” attribute on userTask, like this:

<userTask id="usertask3" name="User Task" activiti:assignee="${initiator.properties.userName}" activiti:formKey="bpm:task">

If you need to use a more complex expression there’s a longer form that uses a “humanPerformer” tag. See the User Guide.

In the Publish Whitepaper example I use pooled group assignment by using the “activiti:candidateGroups” attribute on userTask, like this:

<userTask id="usertask7" name="Operations Review" activiti:candidateGroups="GROUP_Operations" activiti:formKey="scwf:activitiOperationsReview">

Again, if you need to, there’s a longer form that uses a “potentialOwner” tag.

In my jBPM examples I use swimlanes for task assignment. I didn’t get a chance to use the equivalent in Activiti.

Deploying Processes

In standalone Activiti there are multiple options for deploying process definitions to the engine, including uploading a BAR (Business Archive) file into the running engine. I couldn’t find the equivalent of that in Alfresco’s embedded Activiti implementation or the equivalent of the jBPM deployer servlet, so for this exercise I used Spring configuration for both Activiti and jBPM processes. I hope by the time the code goes into Enterprise there will be a dynamic deployment option because that’s really helpful during development.

Workflow Console

Alfresco’s workflow console is a critical tool for anyone doing anything with advanced workflow. It has always been a puzzle to me as to why the workflow console (along with others) can only be navigated to directly using an unpublished URL. That head-scratcher still remains, but rest assured, all of your favorite console commands now work for both jBPM and Activiti workflows.

Summary

I hope this post has given you a small taste of the new Activiti engine embedded in Alfresco. I haven’t spent any time talking about the higher level benefits to Activiti. And there are many more details and features I didn’t have time to go into. My goal was to give all of you who have experience with Alfresco jBPM some start at getting your head around the new option for advanced workflow.

If you haven’t done so, grab a copy of Alfresco 3.4.e, download these examples, and play around. The zip is an Eclipse project that will deploy the workflows and associated configuration to your Alfresco and Share web applications via ant. The included readme file has step-by-step directions for running through each jBPM and Activiti example.

It is entirely possible that I’ve done something boneheaded. If so, do let me know so that all of us can benefit.

Resources

Three watershed moments in my career (Hint: One just happened)

I’ve recently made a big shift in the career department. But rather than tell you what it is right off, I want to build up to it. I think it’s kind of a cool story, so if you’ll bear with me, here are the three watershed moments of my career thus far…

Watershed moment #1: Specialization leads to consulting

In 1992, I graduated college and went to work for Texas Instruments working on mainframes. Somehow, I got exposed to Lotus Notes development. I loved it. I dove in deep, eventually leaving for a job where I could be completely focused on Notes. Notes taught me a lot about managing unstructured data and how people collaborate to get work done. I learned that, for me, interesting IT problems are those where humans and systems have to work together to get something done. And it taught me a lot about what a passionate technical community looks like. Ultimately it led to a job at a small, but up-and-coming consulting firm where I would spend the next nine years. That decision to focus on Notes development was a watershed moment.

Watershed moment #2: My blog gets me a job in open source

Fast-forward to 2001. My content management practice was making a shift. Notes was falling out of favor and many of our clients were looking at WCM and DM solutions from large proprietary vendors. We started looking at open source technologies as well, but it was a tough sell to our traditional clients who had never heard of open source, and if they had, were skeptical or even fearful. We started implementing Documentum-based solutions and did that for the next three years, but I continued to dabble in open source. A revolution seemed afoot, but I couldn’t figure out the best way to jump in.

I started blogging in 2001, stopped, then started again in 2002. My rationale was simple: Writing helped me learn. And, for virtually no added cost, I could multiply the benefit by sharing what I learned–particularly with coworkers, but if others got value out of it, that was okay too. The idea that if my writing helped enough people it might help the open source movement in some tiny way was a romantic notion, but seemed remote.

Then I came across Alfresco. In October of 2005 I wrote my first Alfresco-related blog post. It said simply, “Alfresco is an open source enterprise content management solution founded by one of the co-founders of Documentum,” and then included a lengthy excerpt from a Gilbane post on Alfresco’s release candidate. A month later I published a more detailed review of the product. After three or four years of blogging, I was starting to find my voice. Little did I know that I had also found a passion.

By 2006, my firm had been acquired and Alfresco was starting to look like it had legs. I looked back on my past Documentum projects and realized that Alfresco was a viable alternative as the underlying repository in every case. Open source had been around for years but it had been sneaking quietly in the back doors of my clients in the form of operating systems, developer libraries, databases, and tooling. Alfresco, and other commercial open source companies, were poised to crash through the front door with business-facing open source applications. I wanted in. I left my firm to join Optaros, an open source consultancy I had discovered through fellow content management blogger and then Optaros employee, Seth Gottlieb. My blog had gotten me a job working with a technology I loved. That was the second watershed moment.

Watershed moment #3: Wait for it…

My four years at Optaros gave me the opportunity to focus on Alfresco full-time. Not just implementing projects, although there were many. Just as important, I was able to fully-engage with the Alfresco community. I wrote blog posts and tutorials. I created add-ons and integrations and released those as open source projects. I wrote a book. I conducted code camps. I attended every event Alfresco ever put on and gave talks at most of those. I didn’t set out to be an evangelist, but that’s what I became. Did it benefit me, Optaros, and later, my own start-up, Metaversant? Of course it did. But, here’s the kicker: Acting in my own self-interest turned out to be a huge benefit to the greater Alfresco community. And I’m not alone. Many people all around the world are participating in the community in all kinds of ways to everyone’s mutual benefit.

Which brings us to the next watershed moment: Alfresco has hired me as their new Chief Community Officer. My mission is essentially to make the Alfresco community an example for all other commercial open source companies to follow. It’s a significant challenge, and I’m going to need your help. Alfresco may sign my check, but I work for the community. Therefore, you’ve got to tell me where we should take this thing. We have our ideas but yours are critical.

What this means

I’ll give specifics on how you can help in a future post. I expect that the specific strategies we undertake together will fall roughly into these buckets:

  • Motivating community members, regardless of skill set or relationship to Alfresco to engage more deeply in the community
  • Enabling the community with tools, resources, and product enhancements that leverage community contributions
  • Exposing the greatness already existing in the community, whether that’s in the form of contributions that have been made that people just don’t know about or shining a light on community contributors doing awesome things

And, of course I get to continue to work on my own community contributions like my work with Apache Chemistry, my Google Code projects, the blog, and new stuff I haven’t even thought of yet.

It was a tough decision to put the growth of my content management-focused consulting firm, Metaversant, on hold, but when Alfresco approached me about this opportunity, I had to take it. My career and my passion are already dovetailed. I do what I love, and for that I am very lucky. Who wouldn’t take the opportunity to make that an even tighter fit?

I am very excited about what this means for the community and the importance Alfresco places on its growth and well-being. I hope you are excited too. Actually, “hope” is the wrong word–I need you to be excited. Who’s with me? Ready to pitch in?

Improving Alfresco Share performance by using getChildren

I had a client that was seeing response time in the neighborhood of several seconds for the Alfresco Share document library and data list pages across all of his sites. The client’s Share install had just over 1,000 Share sites. The volume of the data lists in each site was insignificant. This is the story of how we resolved the issue, but note that the resolution may not be appropriate for everyone in all cases.

The Symptoms

The client was seeing slow performance of the document library and data list pages in Share. They noticed that the folder tree in the document library view responded quickly but the actual document list itself took a long time to render. This was happening for all users in all sites.

Looking for a quick resolution, even if that meant solving the symptom but not necessarily the underlying cause, we decided to see if we could optimize the repository tier web script that returns the document library contents to see if we could get it to perform a little closer to what we were seeing with the folder tree. I made a copy of the repository tier’s doclist.get web script into our project’s extension directory and started tweaking.

The Bunny Trail

First, I’ll fess up to a mistaken assumption I had: I thought that everything in Alfresco always went through Lucene. I knew that separating out full-text index searches and property searches into Lucene queries and DB queries, respectively, was on the roadmap, but I had it in my head that in 3.4, even a call to something like ScriptNode.getChildren() was ultimately a Lucene index hit. If everything is a Lucene hit, I figured, there had to be a different reason for the folder list control to perform so much better than the document library list.

So, instead of starting with what, in hindsight, would have yielded the most bang for the buck, I started tuning what turned out to be little things. For example, our app didn’t use favorites, so I removed any references to the preferences service. Our app didn’t allow users to check out documents so out went any logic that dealt with that. Goodbye, Google Docs code blocks. Adios, type and aspect checking for types and aspects our app doesn’t support. Farewell, filters. I hardcoded permissions to avoid the lookup. I set created by and modified by values to empty strings to avoid the lookup to the person object. I jettisoned anything that wasn’t crucial to simply producing a list of the folder contents. All of this did speed up the repository-tier web script, but only a little bit. I needed an order of magnitude improvement.

The Lightbulb

Next, I did what I should have done initially: Add some simple log statements to see which part of the code was taking the longest to execute. Of course, it was the query. As it turns out, it is much, much faster to ask a node for its children (which is what the treenode web script does) than it is to do a Lucene search with a PARENT clause that yields the same result set (which is what the doclist web script does). On a dev machine with a small dataset, you don’t notice the difference. But on our integration and prod servers the difference is huge.

The Fix

The Share document library page uses a YUI data table to produce the list of documents for the currently selected folder. The data table is bound to a web script that lives on the repository tier that is responsible for returning the requested data as JSON. Out-of-the-box, the repository tier web script that returns the document list calls a function called getFilterParams which is responsible for setting up a bunch of query predicates based on the document library filter the user has selected in the Share UI. The script then asks the filterParams object for the Lucene query it needs to run to return the document list. It then uses the search service to invoke the query and return the results.

My optimization was to bypass building and executing the query completely because, in our case, we don’t care about filters. All we want is the list of children in the current folder, and ScriptNode already has a function to do that called getChildren. So instead of performing a Lucene search, we ask the current “root node” for its children. We then iterate over the results and filter out a couple of content types that otherwise would have been excluded had we used the Lucene query instead of getting all children.

Oh man, that did it. The document library went from rendering in 6+ seconds to rendering in less than 1 second.

I gave the data lists web script the same treatment. In that case, our customized Share app still makes use of filters, so the “getChildren bypass” is only used when the “All” filter is selected. When any other filter is selected the original out-of-the-box Lucene query is used.

Now, again, I completely acknowledge that we may have succeeded in speeding up performance for those two cases, but failed to resolve the underlying issue, and addressing that may result in a system-wide performance boost, but it was good to get the quick fix in place and it should be easy enough to revert if and when we resolve the underlying index issue, if one exists.

Here’s a code snippet from the custom doclist.get.js controller if you are curious:


if (parsedArgs.path == "")
{
    parentNode = parsedArgs.rootNode;
}
else
{
    parentNode = parsedArgs.rootNode.childByNamePath(parsedArgs.path);
}      
// We are iterating over the parent node's children instead of iterating
// over search results... 
for each (node in parentNode.getChildren())
{
   try
   {
      // ...so we need to filter out some system types that would have otherwise been
      // filtered out by the lucene query
      if (node.typeShort == "cm:systemfolder" || node.typeShort == "cm:thumbnail")
      {
         // do nothing. we don't want these.
      }
      else if (node.isContainer || node.typeShort == "app:folderlink")
      {
         folderNodes.push(node);
      }
      else
      {
         documentNodes.push(node);
      }
   }
   catch (e)
   {
      // Possibly an old indexed node - ignore it
   }
}

Book Review: Alfresco 3 Records Management by Dick Weisinger

Packt Publishing sent me a copy of Dick Weisinger’s new book, Alfresco 3 Records Management, to read and review. Now, before I tell you what I thought of the book I have to say that I don’t perceive a huge demand for Alfresco’s Records Management offering and I haven’t heard them talk about it much lately. I don’t know if my perception matches reality and, if it does, why we don’t hear about it more often. An open source, DoD-certified, freely-available Records Management product definitely sounds unique and compelling. It could be such a focused niche that there’s a lot of activity around it that I’m simply not aware of.

Still, I was curious to learn more about the topic and Alfresco’s offering, so I gave it a go.

Let me start by saying that Dick does a fabulous job of defining a topic area and a target audience and sticking to that. The book is 457 pages without the online appendices but it feels concise. I think that’s because it is well organized, clearly written, and flows logically and seamlessly from one chapter to the next.

The book is written for both Records Managers and Software Developers. You’d think that having such disparate audiences would be a problem, but Dick handles it very well. Every chapter starts out with an end-user description of the Records Management functionality and then when everything’s been covered, shifts to a “How Does it Work?” section that dives into the technical details behind the functionality covered in the first half. Too often, books written for both technical and non-technical audiences munge their material together in such a way that is frustrating to both audiences. In this case, the content is separated cleanly so it works very well. If you are a Records Manager, and you have no desire to peek under the hood, you can read this book and easily skip the “How Does it Work?” section in each chapter without being confused or distracted.

But here’s the other thing that’s going on, which I thought was really cool. Even though the book is about Records Management, Dick’s managed to write a Share customization primer. As I was reading, I was thinking an alternate title for the book could be, “Learning Share Development by Deconstructing the Alfresco Records Management Application” (if you’re not into the whole brevity thing).

Alfresco’s Records Management add-on is actually just a set of repository tier and Share tier customizations. So, if you learn how the Records Management app is built, you learn how to build other Share-based solutions. In the book, each chapter’s “How Does it Work?” section covers a different example of Share functionality and how it works behind the scenes. So, someone who’s interested in learning about Share customizations can read this book from that perspective, and essentially use Records Management as one big sample application.

For example, in Chapter 5 the Records Manager learns how to set up a file plan. In the same chapter, a developer learns how the YUI data table that renders the file plan actually works. In Chapter 9 the Records Manager learns about holds/freezes, retention, and reviews while the developer learns how to configure scheduled jobs and how Share UI actions work. Every chapter works this way–I’m just picking out a couple of examples.

Don’t get me wrong. I’m not saying that you can go from zero to Badass Share Developer with this book alone. That’s not going to happen. But as Dick says in the summary of the last chapter, it’s a good start. The technical sections give you a pointer to the pieces that make up a particular area of functionality. That’s useful if you want to change how Records Management works but you can also use it as an example for adding similar functionality to your own Share-based app.

One thing that frustrates developers trying to customize Share (and Records Management) is the question, “Given what I’m looking at in the browser, where does the code reside that makes it work?”. The book shows the technical reader how to go from Share page to template to component to Spring Surf web script which is something you do over and over when you are first learning how Share is put together. Being pointed in the right direction and being shown the general pattern that the Share developers followed is really valuable.

So, if you’re thinking about implementing Records Management, and you need to know whether or not Alfresco’s offering will fit the bill, this would be a great book for you to read during your evaluation or after the selection has been made and you need to learn how to install and configure the product. If you are the technical person on the implementation team, the book will give you the end-user context as well as a peek under the hood. If you need to customize the product and you’ve never done Share customization work before, you’ll learn how Spring Surf works and, more importantly, given a piece of functionality, you’ll know where to look when you need to change it.

Well done, Dick!

Collaborative content creation with Amazon Mechanical Turk

Amazon’s Mechanical Turk has been intriguing to me since I first heard about it. I think it is because the idea of essentially having a workflow with tasks that can be handled by any one of potentially hundreds of thousands of people has mind-blowing potential.

If you’re not familiar, Mechanical Turk (MTurk) is essentially a marketplace that matches up work requests (called HITs) with human workers (called “Turkers”). The work requests are typically very short tasks that require human intelligence like identifying, labeling, and categorizing images or transcribing audio. Amazon is the middleman that matches up HITs with Turkers. From a coding standpoint, your app makes calls to Amazon’s Web Services API to submit requests and to respond to completed work. Turkers monitor the available HITs, select the ones that look interesting to them and then complete the tasks for which they are paid, usually pennies per task.

Sorting through images or performing other simple tasks is one thing, but what about more complex tasks, like, say, writing an article? Here’s a story about some guys who have created a framework called CrowdForge to do just that. CrowdForge is a Django implementation based on research that one of the authors did at Carnegie Mellon. In a nutshell, their approach splits complex problems into smaller problems until they are small enough to be successfully handled by MTurk, then aggregates the results to form the answer to the original problem. It’s Map Reduce applied to human tasks instead of data clusters.

You should read the original post, but to summarize it, the story talks about an experiment that the team did around collaborative content creation. They applied their framework to the task of writing travel articles. They split the task into 36 sub-tasks and gave each sub-task to an author, then aggregated the results into a coherent article. The partitioning, writing, and re-assembly (the “reduce” part of Map Reduce) was all done through Mechanical Turk by CrowdForge. Total cost for each article? About $3.26.

Then, for comparison, they assigned individual authors to write articles on the same topics using the traditional approach of one author per article paying roughly what they paid for the collaboratively created content. When the results were reviewed, the crowd sourced content beat the single author content in terms of quality. It’s important to note that in both cases, authors were Turkers. This wasn’t Mechanical Turk versus Rick Steves. But still, the researchers were able to use Mechanical Turk to break the problem down, perform each task, and then clean up the result, all for about the same cost without sacrificing quality. That’s pretty cool.

As you know, I’m a huge fan of Django, and I think it is more than okay for the presentation tier of a solution like this. But it seems like a workflow engine like Activiti or jBPM would be a better tool for implementing the actual process flow for a framework like CrowdForge because it could potentially mean less coding and maybe more accessibility by business analysts. Imagine using a process modeling tool to lay out your business process and then dropping in a “Mechanical Turk Partition Task” node, graphically connecting it with a “Mechanical Turk Map Task”, and then hooking that to a “Mechanical Turk Reduce Task”. In and around those you’re wiring up email notifications, internal review tasks, etc.

Metaversant has been working with a client who’s doing something very similar. Editors make writing assignments which are outsourced to Mechanical Turk. When the assignments are complete, they are published to one or more channels. Instead of the Django CrowdForge framework, we’re using Alfresco and the embedded jBPM workflow engine. Alfresco stores the content while the jBPM workflow engine orchestrates the process, making calls to Mechanical Turk and the publishing endpoints.

This approach can be generalized to apply to all kinds of problems beyond content authoring. If you are an Alfresco, jBPM, or Activiti user, and you have a business problem that might lend itself to being addressed by a micro task marketplace like Mechanical Turk, let me know. Maybe we can get my client to open source the specific integration between jBPM and Mechanical Turk. If you’ve already done something like this, let me know that too. I’m interested to hear how others might be integrating content repositories and BPM engines with Mechanical Turk.

Book Review: Alfresco 3 Business Solutions by Martin Bergljung

[UPDATED: To remove my comment about the absence of workflow config in Share, which Martin does cover. Sorry about that, Martin]

Packt Publishing sent me a copy of Martin Bergljung’s new book, Alfresco 3 Business Solutions. I just finished reading it, so I thought I’d write a quick review.

Overall, I think it is a good book with a lot of useful information across a variety of topics. The preface says the book is for “systems administrators and business owners”. Inclusion of “business owners” is a stretch–they’d have to be pretty technical to get something out of this book. I think the intent of including “business” in the title and the target audience is to set up the book as more of a solution-oriented look at Alfresco and less of an exhaustive technical how-to.

Bergljung attempts to organize the book around “business solution” focused chapters. For example, if your main concern is letting your users access the repository through file protocols, then Chapter 5, File System Access Solutions, is for you. If you are doing a content migration to Alfresco, Chapter 8, Document Migration Solutions outlines the different approach available for doing that. I think this is a good approach for the stated audience and most of the chapters fit the approach.

The book starts out with an overview of the Alfresco platform and various repository concepts. Although there are places that risk going into too much detail too early, that first chapter would be a good read for anyone new to Alfresco. The chapters on authentication and synchronization (Chapter 4) and CIFS/WebDAV (Chapter 5) are very thorough and provide some of the best coverage of those topics I’ve seen in any of the Alfresco books. The vigor with which Bergljung attacked those topics makes me think those areas are a particular passion for him. If you are strict about the target audience, Chapters 4 & 5 are definitely the strongest in the book.

However, at various points, the book strays from its intended audience and starts to go into developer topics. Don’t get me wrong, that’s the most interesting stuff to me, but I think it is potentially confusing (or, at best, superfluous) to system administrators. For example, the end of Chapter 1 covers the underlying database schema. Maybe it is a good idea to discuss what the tables are and how they are used so that a DBA gets a feel for the schema and can tune the database appropriately. But Alfresco’s schema is not public and shouldn’t be accessed directly unless you know what you’re doing and if you are willing to sign up for the inevitable maintenance down the road when the schema changes without warning. Bergljung gives a soft warning to this effect at the start of the section but the negative effects could have been emphasized more.

There are a few other developer-centric concepts in the book that I just simply don’t agree with. The first is about AMPs. In a couple of places, Bergljung implies that the Java Foundation API and AMPs are somehow dependent on each other. The statement, “The Foundation API is only used when deploying extensions as an AMP,” is just not true. Later, another statement compounds the problem by saying, “AMP extensions require Java instead of JavaScript”, which, again is not accurate. For some reason, the author is trying to link an API (Java, JavaScript) with a deployment approach (AMPs) which are not related or dependent on each other at all.

Another piece I don’t agree with is about $TOMCAT_SHARED. To be fair, I’ve seen this in other places and have heard certain Alfresco Engineers encouraging the use of $TOMCAT_SHARED for things I think belong in the web app instead. Regardless of where it comes from, I think it’s really bad advice to tell people to use $TOMCAT_SHARED for anything other than alfresco-global.properties and server- or environment-specific settings. Proponents of $TOMCAT_SHARED will say they like deploying their customizations there for two reasons. First, when Alfresco and Share are deployed in the same Tomcat instance, you can deploy your extensions as one package and both web apps will use it. Second, your extensions go into an extension directory external to the Alfresco WAR, which keep them well away from Alfresco’s code. In my option, both of these are actually reasons NOT to use $TOMCAT_SHARED. Why? As to the Alfresco/Share sharing bullet point, why unnecessarily couple those two web apps together? The Alfresco and Share WAR are built to run on completely separate nodes which is helpful. We shouldn’t ruin that by making them both rely on the same shared directory.

As for the “keep your extensions away from Alfresco’s” reason, that’s what the extension directory is for. I can keep my customizations separate and still have them reside in Alfresco’s WAR. In fact, most clients I’ve dealt with prefer that because they have IT Operations teams that only want to deal with self-contained WARs. Being a “special case” is not how you win the hearts and minds of your infrastructure team.

Now, with these picked nits out of the way, I should say that there are some developer-oriented topics that were very good. Bergljung has some good Java Foundation API and JavaScript API examples in Chapter 2 covering Node Service and other commonly-used services. And I like the section in Chapter 3 that talks about setting up Apache Hudson for continuous integration. Chapters 9 through 11 provide good coverage of Advanced Workflows, from designing workflows with swimlane diagrams to a lengthy example showing super states, sub-processes, and custom workflow management dashlets.

In keeping with the “solutions” approach, I think I would have combined the portlet chapter and the mobile app/Grails chapter into a single “integration solutions” chapter and talked less about the specific implementation details and more about the touch points: options for integration (Web Services, CMIS, custom web scripts), single sign-on approaches, what’s available out-of-the-box, caveats, etc. Most of this is covered one way or another between the two chapters. It just seems like a common thing people think through is “What’s the best approach for doing X on top of Alfresco” where X is a portal like Liferay, a community platform like Drupal, a mobile app, and so on.

So, if you’re a “business owner” and by that you mean “non-technical end-user”, you’d probably be better off with Munwar’s book, which is definitely end-user focused. If you are a “system administrator” or someone who needs to know the capabilities of Alfresco and how to integrate Alfresco with various touch points (LDAP, Active Directory, portals), it’s definitely worth a read, particularly if you need to deal with external authentication sources or you are responsible for getting CIFS working. Developers will benefit from the API examples and the chapters on Alfresco’s embedded jBPM engine.

Congrats to Martin and Packt. It’s good to see another title (I think we’re up to 8 or 9 now) added to the Alfresco bookshelf.

Top Takeaways from the Alfresco Kickoff

Alfresco kicked off their fiscal year with a meeting last week in Orlando. About 100 Alfresco employees and 50 partners attended two days of Alfresco-led talks on business and technical topics. As an aside, the conference food at the JW Marriott was maybe the best I’ve ever had at any tech event. Lobster Corn Dog. Enough said.

More importantly, the trip helped clarify in my mind Alfresco’s message around “social content management”. I now see it as taking two different forms: social content management and social content management. The first is basically just marketing what they’ve already got: content services, collaboration, task tracking, wikis, and blogs, exposed through a modern user interface that’s closer to the experience most users have come to expect from using consumer-facing sites and services. The idea is that when you add things like comments, tags, and ratings you go from boring, old-school Enterprise Content Management to fun and exciting social content management. Obviously, there’s other stuff going on here about how when people collaborate, they don’t do it in a vacuum, they do it around content.

The second form–social content management–is when you need to manage content that is published to one or more social channels. For example, Marketing might have a press release, a video, and a tweet that all need to go live at the same time. Order matters, and if one step fails, none of the steps should be performed. Alfresco is building a social publishing framework to handle this kind of use case. So, in this example, “social” doesn’t describe the features of the system–it describes the type of content being managed.

Alfresco didn’t explicitly differentiate between these two forms of social content management but they have current and future functionality that addresses both.

One of my other purposes of the trip was to find out what’s coming in Project Swift, which is the code name for an up-coming release of Alfresco. It sounds like Marketing has the final say on what specific release Swift will be, but after hearing what’s slated for the release, most of us in the room agreed it should be labeled as a major release (4.0). We’ll see.

So what’s going to be in Swift? Lot’s of cool stuff, but here are the top five technical takeaways from the Project Swift Roadmap that jumped out at me:

#5: CIFS, SharePoint, and FTP will be clusterable. CIFS and SharePoint performance are both issues at one of my clients so this one caught my eye.

#4: New Share extension points are coming in Swift including a framework for custom actions, dialogs, and evaluators. The goal is to reduce the amount of copy-and-paste that goes on during typical customizations of Share and to make upgrades a bit easier.

#3: Alfresco is developing a “social content publishing framework” with publishing endpoints for YouTube, Facebook, Twitter, Drupal, and more to address the social content management use case I described earlier in this post. I like this one a lot because I think a lot of people have this problem and because it leverages Alfresco in exactly the “right” way.

#2: Swift will sport a new Apache-licensed workflow engine called Activiti, which is a separate Alfresco-sponsored open source project founded by the creators of JBoss jBPM, which is currently the workflow engine embedded in Alfresco. With Swift, both engines will exist side-by-side. It sounds like you may be able to have jBPM continue to handle running workflow instances and use Activiti to handle new instances if you want to. Activiti will show up in Community soon for people to start playing with.

#1: Apache Solr will be implemented as an optional, separate shared search server and index. As part of this, Lucene will no longer be updated in the same transaction. Instead, the index will be eventually consistent. This should result in a huge performance gain and easier clustering. You’ll also get better control over what gets indexed. In Swift you’ll be able to configure full-text indexing by things like content type and path. The Solr server will accept CMIS Query Language and Alfresco FTS queries but not the current raw Lucene syntax so it might make sense to start moving your queries over to one of these two options if you anticipate leveraging Solr when it is available. Note that it is possible Alfresco may choose to make the Solr server an Enterprise-only feature. It didn’t sound like a final decision had been reached on that.

A Community release of Swift should happen some time in August, but we should start seeing a lot of activity in subversion starting in April. The Enterprise release is slated for mid-November. I predict some late nights ahead for QA and Engineering between Thanksgiving and Christmas. I know there’s not a huge difference between November and January but I’d love to see Swift go GA before year-end.

One comment about Share customizations: I get asked a lot about when I will be updating the Alfresco Developer Guide to include a chapter on Share. I have most of the SomeCo examples ported to Share in an as-yet-unshared code base but, as you can see from some of the changes coming in Swift, Share is still changing a lot with respect to customizations, so I’ve been hesitant to update the book. If you’re looking for Share examples you should take a look at Will Abson’s Share Extras project on Google Code. He’s got about 18 different examples of varying complexity and type. I believe each one is individually deployable.

Not bad for the price of a couple of days in Orlando.

Join us for CMSGeekUpDFW on 2/24

Every month, a handful of us CMS Geeks from around Dallas-Ft. Worth get together to have a beer or two and talk about content management. This month’s meeting is on Thursday, February 24 at 7:00p. We’re going to be talking about Django, a highly-productive python-based web application framework. If you’re going to be in the area (our meeting spots bounce around–this one will be at Cohabitat in Uptown Dallas) you should join in the discussion. Please RSVP so we know you’re coming.

Now available: Alfresco Fivestar Ratings add-on for Alfresco Share

A couple of weeks ago I posted a survey asking if anyone saw any value in a five star ratings widget for Alfresco Share. Honestly, it would have only taken one or two positive responses–even if no one needed one, there’s value in it for example’s sake. It turns out about 20 readers of this blog voted positively, so I went ahead and knocked it out.

This Alfresco Share customization makes it possible for any document in the repository to become “rateable”. When a document is rateable, the Alfresco Share user interface will show a clickable five star ratings widget. The stars light up to indicate the average rating for that document. Users simply click one of the stars to post their own rating. When clicked, the widget refreshes itself with the updated average.

Here is a short screencast that demonstrates the customization. You’ll want to make it full screen.

To implement this, I took the Someco Ratings Service from the Alfresco Developer Guide, moved it to the Metaversant namespace, and changed the names of my Spring beans and JavaScript root variable. Even though my initial target Alfresco version is 3.3, I didn’t want the code to conflict with Alfresco’s new back-end-only ratings service in 3.4 which uses some of the same names that were in the book. I also changed the JSON that the ratings web scripts use to be closer to what exists in 3.4. That way, when I do make a version that works with 3.4, it could potentially work with either my ratings back-end or Alfresco’s.

I then went to work on the UI side, integrating the widget into Share’s document details page, document library (both Share and repository views), search results page, and document-related dashlets. To go from what was in the book to a working integration I revamped the client-side ratings JavaScript from a set of functions to an actual object. Then, I started injecting my own methods into Alfresco’s client-side object prototypes to drop my widget in where appropriate.

Alfresco is still working to make customizations like this more modular and easier to plug in alongside their code and code from the community. Until then, be aware that if your Alfresco implementation already has customizations that override some of the same web scripts and client-side components this module does, there may be some manual integration needed. If you have an out-of-the-box installation (or a set of customizations that won’t conflict with this one) you can deploy the AMP to the Alfresco WAR and the Share customizations to the Share WAR and you’ll be set.

The Alfresco Fivestar Ratings project lives at Google Code. Feel free to check out the source, try it out, and use it on your projects. If you find a bug, log it, then fix it!

Thoughts on Alfresco’s Recognized Developer Accreditation Test

Alfresco is rolling out a new accreditation program. A certain subset of partners (I’m not sure what the criteria is–ask your rep) can take an online test that validates whether or not you know a thing or two about Alfresco. If you pass, you’re a “Recognized Alfresco Developer”, which ostensibly comes with a secret handshake and a map to the club house.

When I first heard about the program I was a little skeptical. Don’t get me wrong–anything that my company and I can use to help differentiate ourselves from more, um, what’s the word I’m looking for–casual? less-experienced? opportunistic?–partners is a good thing. And, as we continue to grow, I can use accreditation as a potential data point when making hiring decisions. My concern was that the test would lack both depth and breadth and we’d end up where we are now: A bunch of partners of varying capability lumped into the same “Gold” category (or “Platinum” if you fork over the big dough).

After taking the test (and passing–c’mon, was there any doubt?) I feel better about it. The test actually appeared to do a decent job of covering many different aspects of the product including configuration and developer issues we see on real world implementations, so it’s not an easy test. Kudos to Carlos Miguens and the Alfresco Training team–I know it must have been a big project to get the accreditation program pulled together and construct a test that, at least to me, feels like the right level of detail and difficulty.

I do think the test could be improved. I think there was way, way too much emphasis on the WCM product. The value of the 30 or so questions asked on WCM out of the 100 or so total asked is that it really does take someone who’s been around the product a while to get those right. I just think the test is over-weighted toward WCM compared to the proportion of actual “project share” the WCM product gets in real life versus the core repository.

I think there were also too many questions on the Explorer client. Again, maybe the goal is to be slightly biased towards those partners who have been working with Alfresco long enough to have done an Explorer customization or even simply to know certain details about the Explorer UI. But most clients are now using and customizing Share as their primary interface versus Explorer. On the other hand, I think it is a bit early for “Share customization” to be an accreditation test topic, so let’s not get ahead of ourselves!

I also think it would be helpful feedback to give the test taker an idea of what was missed. It doesn’t have to be the exact question (although there were several answers I’d love to vigorously defend if I did indeed miss them) but other similar tests I’ve taken in the past have offered up the “weak” areas so the student could shore those up. Thanks for the passing grade and everything, but, not giving any hint as to what I missed is like starting a great joke with a huge build-up and then dropping dead before you can tell me the punch line.

What I don’t know is how well formal training prepares you for the test. I would hope that, in the aggregate, real world implementers score as good or better than groups of test takers who are fresh out of Alfresco training but lack on-the-ground experience. Really, it has to be that way for accreditation to be meaningful. I haven’t taken any of Alfresco’s training, so I’m curious to hear feedback on the test from those that have.

If you haven’t taken your test yet, good luck! I wish I could tell you what to study that will help you do well, but, honestly, I can’t think of any one reference that’s going to do the trick. And, really, I guess that’s a good thing.