Month: December 2010

Monitoring folders with Alfresco workflows

This is the second part of a two-part post on some recent work Metaversant did with Alfresco workflows. The first part was a post on Workflow Reporting. It outlined a high-level recipe for creating a workflow dashboard in Share that showed a list of workflows started for a specific Share site and allowed bulk actions (such as “cancel”). In this post, we’ll look at Folder Monitoring which deals with how to automatically start an Alfresco jBPM workflow when a document is dropped into a folder.

The easy answer

The embedded jBPM workflow engine that’s embedded in Alfresco is flexible and powerful. Creating a new process and wiring it into the Alfresco Explorer or Alfresco Share user interface is straightforward once you’ve done it a time or two. If you need a refresher on the details, see “Get your Alfresco ‘flow on“. For the purposes of this discussion, just know that your process definition can have process variables which you can read and set from the user interface and from code within the process definition.

It’s actually really easy to start a workflow when objects are created or updated in a folder. That’s because the Alfresco JavaScript API can run the “start-workflow” action, and Alfresco JavaScript can be invoked from a rule that runs when something is updated in a folder. The JavaScript to start a workflow using this approach looks like this:

var startWorkflowAction = actions.create("start-workflow");
startWorkflowAction.parameters.workflowName = "jbpm$wf:adhoc";
startWorkflowAction.parameters["bpm:assignee"] = assignee;
startWorkflowAction.parameters["bpm:workflowDescription"] = description;
startWorkflowAction.execute(document);

The catch is that when you use that approach, the developer writing the JavaScript has to know at design-time what values to provide to the workflow when it starts up (in this case, the assignee and the description). Ordinarily, when a workflow is manually invoked, the end-user starting the workflow is there to provide those values. If your particular workflow doesn’t require any variables to start, or the variables are known at design-time, the simple rule-invokes-JavaScript approach will work. Otherwise, something more is required.

What we needed for this client was not only the ability to start a workflow when a new object landed in a folder, but also the ability for an end-user to specify parameters for that workflow ahead of time. When discussing the functionality we called it “precompiled workflows” and “workflow templates”, which is pretty descriptive of what we needed. From the user’s perspective, we needed one user (let’s call her a Manager) to be able to say, “When an object is created in this folder, launch this specific workflow with these parameters”. When other users (or systems) create objects in that folder, the workflow needs to launch automatically without further input.

Options considered

As usual, there are a few different ways to go about this. One is the rule-invokes-JavaScript approach discussed earlier. The problem with this is that the parameters have to be specified in the JavaScript and that won’t work for non-technical end-users. This option was discarded early.

The next option we considered was creating a custom action. This almost gets us there. The Manager user can create a rule on a folder and select the custom action. The form service (Share was the main UI in this case) can be configured to let the Manager set the parameters to use when launching the workflow. Non-Manager users can then create objects and workflows will launch automatically.

The problem with this approach is that the list of parameters is not finite–this client had big plans for the workflow engine and they did not want to modify the custom workflow launching action every time they created a workflow that had a new parameter. Similar to the first option, this approach has worked well in the past, but it didn’t fit for this client, so we moved on.

The third option we looked at was a variation on the second. Rather than having the action handler responsible for grabbing parameter values, this approach uses the workflow model itself to persist a “workflow configuration” that the action can point to. The Manager would create a workflow configuration, then configure the rule to say, “When documents are added to this folder, start this workflow with this configuration”.

The workflow configuration could be a custom object. Briefly, we considered actually persisting objects that correspond to the types defined in the workflow model (normally, those aren’t saved anywhere), and I think if we ever revist this option, we may look at that again.

The problem with this approach was that the Manager has to think too much. Am I creating a workflow that’s going to launch other workflows? Okay, then I need to create a “workflow configuration” and configure a rule. Am I just routing something through a workflow? Okay, then I need to use “start workflow” like I normally would. That’s way too confusing. Next!

Implementation

Ultimately, we decided that the easiest route from both an implementation perspective and an end-user perspective was to rely on the business process to be smart enough to tell the difference between a package with a folder and a package with documents. When the workflow is run on a folder, it can iterate over the children of the folder, spawning new workflows and copying its process variables into the newly started workflows. It can then transition to a wait state until something tells it to go check for new children in the folder. When the same workflow is run on a document however, it proceeds down the “non-folder” route and performs the work it normally would.

Using this approach, every workflow is able to run as both a “folder monitor” and a “worker”. That way, when a Manager starts a workflow, she doesn’t have to think about whether she’s starting a “monitor” workflow or a one-time workflow, she just starts the workflow as normal and sets the parameters. The business process does the work of spawning additional workflows when it needs to and passes those parameters along. Now we’re talking!

High-level Recipe

Similar to my previous post on workflow dashboards, I may make this source available at some point, but until then, here are the high-level steps to do it yourself.

Create the jBPM process definition

This will work with any process definition, so I won’t describe anything business-specific here. Instead, I’ll talk about what you would add to your process definition to make this work.

The first thing the business process needs to do is make a decision: Am I running against a folder or a document? If the package contains documents, the workflow continues as it normally would. If, however, it contains a folder, the workflow iterates over the folder’s children and starts a new workflow instance for every child in the package, copying its process variables into the new workflows.

The decision is implemented as JavaScript. If we find one folder, we take the folder route. In this case, the users are only going to run the workflow on one folder at-a-time so we don’t have to worry about what to do if the workflow package contains a mix of documents and folders. The decision JavaScript looks like this:

var flag = false;
for (var i = 0; i < bpm_package.children.length; i++) {
    if (bpm_package.children[i].isContainer) {
        flag = true;
        break;
    }
}
isContainer = flag;

And the transitions for the decision look like this:

<transition to="forkWork" name="toForkWork"></transition>
<transition to="forkMonitor" name="toForkMonitor">
    <condition>#{isContainer == true}</condition>
</transition>

The forkMonitor node is a fork that creates a workflow dashboard task (see previous post) and simultaneously transitions to the spawnWorkflows node. The spawn workflows code is a little too lengthy to include here, but what it does is:

  • Grabs the parameters it needs to pass in to each of the newly started workflows
  • For each document in the workflow package…
  • Checks to make sure the document hasn’t already been processed (more on that later)
  • Checks to make sure the document isn’t already in a workflow
  • Starts the workflow using the out-of-the-box start-workflow action (see the code at the start of the article)

Once the workflows are spawned, the business process transitions to a wait state where the process sits indefinitely until it is told to check the folder for new children.

Create a custom action that will tell the process definition to check for children

How does the workflow know when new children have been added to the folder? I’m so glad you asked. After the workflow spawns new workflows for the children, it transitions to a wait state. To trigger the workflow to move off the wait state, we used a custom rule action. The rule action is set high enough up the folder hierachy that end-users don’t have to worry about it–it automatically inherits onto the folders created below it. The rule action is Java-based, and it takes two parameters: The name of a node and the name of a transition to take.

When a new document is added to a folder, the rule triggers the action. The action grabs the node’s parent and checks to see if it is involved in a workflow. If it is, it tries to find the workflow node named in the parameter, which will be the wait state mentioned in the previous step. If it finds that, it signals the node to take the specified transition. The transition will be to the “spawn workflows” node.

Workflow screenshotThe combination of the wait state and the spawn workflows node in the business process with the custom rule action that signals the wait state creates a cool little “interrupt” loop: The process spawns workflows for folder children, then waits until more children arrive, then spawns workflows for those children, and so on until someone kills the workflow.

Customize the Share user interface to allow workflows to start on folders

Since the dawn of time Alfresco’s UI has not allowed workflows to start on folders, but the workflow engine can handle it. For this approach to work we most definitely have to be able to start workflows on folders. That’s a pretty simple little config tweak to make that happen in Share–just copy the existing assign workflow action link from the document actionSet to the folder actionSet in documentlist.get.config.xml (and document-actions.get.config.xml), like this:

<actionSet id="folder">
    ...
    <action type="action-link" id="onActionAssignWorkflow" permission="permissions" label="actions.document.assign-workflow" />
    ...
</actionSet>

Voila! Now you can run workflows against folders.

Create a “workflow status” aspect to avoid processing docs more than once

The last step is to make sure that documents only run through the process once. Otherwise, every time the process spawns workflows for children in a folder, he’ll start one up for every child in the folder, regardless of whether or not its been through the workflow already. This may be what you want, but this particular client wanted docs to run through the process only once.

To do this, we created a simple aspect with a “workflow status” property. In the last node of the process, the property gets set. When the spawn workflows code runs, it filters out folder children that have the status set.

That’s it!

This approach puts the burden on the workflow designer to use some standard node names and logic in their process definitions. And, it will result in many in-flight workflows (at least one for every folder being monitored), although that shouldn’t be a big deal from a performance perspective (running workflows really aren’t “running”).

The important thing for this client is that it provides a nice way for users to essentially “pre-configure” workflows so that subsequent users can start workflows simply by adding documents to a folder, all without anyone having to learn any new “workflow configuration” constructs. And, workflow designers can easily make their workflows “folder aware” or “templatable”, depending on how you want to look at it, all within the process definition, without having to recompile any custom actions or tweak JavaScript.

Workflow Dashboards in Alfresco Share

Metaversant recently had a client with some interesting requirements around workflow. I thought I’d post what we did here and get a conversation going about the various pluses and minuses of the approach and find out what others have done when faced with similar needs.

There are two main buckets of requirements I want to focus on: Workflow Reporting/Dashboard and Folder Monitoring. I’ll talk about Workflow Reporting/Dashboard in this post and Folder Monitoring in the next.

Workflow Reporting/Workflow Dashboard

Workflow Reporting is something I’ve seen quite often and handled differently each time based on what the client is trying to do. Alfresco is pretty sparse on Workflow Reporting (capturing the data and making it easy to report on) and Workflow Dashboarding (presenting a dashboard of running workflows and/or workflow reports in the user interface). By “sparse” I mean there really is none. Recent releases have seen the addition of the “My Tasks” page, but that is limited to what it sounds like: A page listing the tasks the current user is assigned to.

What most people want when they ask for Workflow Reporting is the ability to capture workflow data both before and after a workflow has completed. This is significant because if you do nothing about it, data about running workflows disappears into the ether when the workflows complete. A Workflow Dashboard is the ability to see all workflows assigned to all users or a subset of users and perhaps some historical via Workflow Reports (maybe time started, current task, time on current task, current actor, etc.).

For this particular project, my client really only cared about running workflows, so we didn’t have to worry about capturing workflow stats before the workflow ended–they just wanted to see all workflows, no matter who started them, in a sortable list with a link to the workflow details and the ability to perform batch operations against the workflows (such as selecting all workflows and canceling). The twist, however, was that the client wanted to scope the list of running workflows to the Alfresco Share site they were started in. So if there were two Share sites, A and B, and Site A’s users had started 24 workflows on documents stored within the site while Site B’s users (which may overlap with Site A) had started 35 workflows on Site B documents, Site A’s Workflow Dashboard should show a list of 24 workflows while Site B’s should show 35.

The Solution

When workflow data needs to be persisted beyond the life of the workflow, we’ll typically just create some objects in the content model to persist the data and we’ll write to those objects from one or more actions defined within the business process definition. In this case, that wasn’t necessary.

The problem boiled down to how to capture the specific Share site a workflow was scoped to and, then, how to query for and display that information. Once that was resolved, the data could be displayed on a custom Share page.

I may post the code at some point, but for now, here’s the high-level recipe:

Step 1: Capture the Site ID in a Process Variable

Alfresco jBPM workflows have process variables that can store metadata about a running workflow. And, the workflow API allows me to query against that metadata. So, the first step was to capture which Share site the user was in when they launched the workflow.

To do this, I used the Form Service to define a hidden field on my workflow’s startup form. Then, I overrode Alfresco’s client-side JavaScript StartWorkflow component to add my own code that finds the hidden field and sets it with the Share site’s unique identifier (the Site ID field).

Initially, I used a straight hidden field for this. Later, I came back and created a new custom component that included not only the hidden field, but some additional markup and client-side logic that pulled back additional context about the Share site for that workflow so that when someone viewed the workflow or managed a task, they would know more about the Share site than just its Site ID.

Step 2: Create a web script that returns the workflow metadata

With the Site ID stored in a process variable, the next step was getting the workflow data to the front-end Share tier so it could be displayed in a dashboard. This involved creating a repository tier web script that accepted the Site ID as an argument and returned the data as JSON. There are some out-of-the-box web scripts related to workflows, but they return tasks assigned to the current user and for this I needed all running workflows for a given site ID, so that required a custom web script.

The JavaScript API has come a long way with respect to workflow, but the ability to apply a filter on task metadata isn’t there yet, so that meant my controller had to be Java-based.

The interesting part of that web script controller looks like this:

WorkflowTaskQuery tasksQuery = new WorkflowTaskQuery();
Map<QName, Object> processCustomProps = new HashMap<QName, Object>();
processCustomProps.put(SomeCoWorkflowModel.PROP_RELATED_SITE_ID, siteId);
tasksQuery.setProcessCustomProps(processCustomProps);
tasksQuery.setTaskName(SomeCoWorkflowModel.TYPE_DASHBOARD_TASK);
tasksQuery.setTaskState(WorkflowTaskState.IN_PROGRESS);
List<WorkflowTask> tasks = workflowService.queryTasks(tasksQuery);

This gives us all of the tasks that have the Site ID we’re looking for, but the Dashboard needs more than that–it needs the Workflow Instance for full context. No problem–the Workflow API can handle it. Here’s where the controller iterates overs the tasks and adds the workflows to a List of WorkflowMetadata objects that will get set on the web script’s model:

for (int i = startIndex; i < endCount; i++) {
WorkflowTask task = tasks.get(i);
WorkflowInstance wf = workflowService.getWorkflowById(task.getPath().getInstance().getId());
workflows.add(new WorkflowMetadata(wf, task));
}

One potential gotcha with this approach is that there has to be a task assigned to an actor in order for the workflow to show up in the WorkflowTaskQuery results. If a workflow were sitting on a wait-state, for example, that workflow wouldn’t be returned by the code above. To work around this, the workflows in this solution always queue up a “Dashboard Task” assigned to the initiator to guarantee that all workflows, whether they are currently sitting on a task node or not, always show up in the workflow dashboard.

The view for this web script simply returns the data as JSON, so I’ll spare you the details.

The other web script I wrote for this piece deletes workflows for a given set of workflow IDs. You’ll see where that comes in next, but, again, the logic of that controller (also Java-based) is very straightforward, so on to the next step!

Step 3: Create a custom Workflow Dashboard page

The workflow tasks are tagged with Share Site IDs and a repository-tier web script is in place that knows how to find those tasks and give back the Workflow Instance data as JSON. The final step is to render that as a dashboard. For this, I used standard Surf framework techniques to create a new page called “Workflows”. The page contains a client-side JavaScript component that renders a YUI DataTable. The data source for the DataTable is the web script created in the previous step.

Step 4: Create actions

The specifics for what someone might want to do to one or more workflows displayed in the dashboard varies. For this project, users needed to be able to view the workflow details. Initiators needed to be able to cancel the workflow. Viewing the workflow details is just a matter of building the right URL. Canceling the workflow is a little more involved–I used client-side JavaScript to make repo-tier web script calls to a custom web script that can cancel one or more workflows given the workflow IDs.

To make it so that only workflow initiators can cancel a workflow, I use the same mechanism that document actions uses: The XML configuration file that defines the cancel workflow action specifies that the user must be the workflow initiator. Then, the client-side JavaScript that builds the dashboard actions checks the workflow data to see if that’s the case and hides the link if the current user is not the initiator.

Here’s the XML for the cancel workflow action:
<toolbar>
<actionSet>
<action type="action-link" id="onActionCancel" permission="initiator" label="menu.selected-items.cancel" />
</actionSet>
</toolbar>

The client-side code that checks action permissions is a direct copy of Alfresco’s out-of-the-box logic that does the same.

The Result

Once fully-assembled, the workflow dashboard looks like this:

Workflow dashboard screenshot

In the screenshot, I’ve got multiple workflows selected and have clicked the “Selected Workflows…” link to show that you can cancel more than one workflow at-a-time.

Now you’ve seen a simple workflow dashboard implementation. My next post will be about launching workflows automatically when objects are dropped into a folder.