Category: Alfresco

Alfresco open source content management

Monitoring folders with Alfresco workflows

This is the second part of a two-part post on some recent work Metaversant did with Alfresco workflows. The first part was a post on Workflow Reporting. It outlined a high-level recipe for creating a workflow dashboard in Share that showed a list of workflows started for a specific Share site and allowed bulk actions (such as “cancel”). In this post, we’ll look at Folder Monitoring which deals with how to automatically start an Alfresco jBPM workflow when a document is dropped into a folder.

The easy answer

The embedded jBPM workflow engine that’s embedded in Alfresco is flexible and powerful. Creating a new process and wiring it into the Alfresco Explorer or Alfresco Share user interface is straightforward once you’ve done it a time or two. If you need a refresher on the details, see “Get your Alfresco ‘flow on“. For the purposes of this discussion, just know that your process definition can have process variables which you can read and set from the user interface and from code within the process definition.

It’s actually really easy to start a workflow when objects are created or updated in a folder. That’s because the Alfresco JavaScript API can run the “start-workflow” action, and Alfresco JavaScript can be invoked from a rule that runs when something is updated in a folder. The JavaScript to start a workflow using this approach looks like this:

var startWorkflowAction = actions.create("start-workflow");
startWorkflowAction.parameters.workflowName = "jbpm$wf:adhoc";
startWorkflowAction.parameters["bpm:assignee"] = assignee;
startWorkflowAction.parameters["bpm:workflowDescription"] = description;
startWorkflowAction.execute(document);

The catch is that when you use that approach, the developer writing the JavaScript has to know at design-time what values to provide to the workflow when it starts up (in this case, the assignee and the description). Ordinarily, when a workflow is manually invoked, the end-user starting the workflow is there to provide those values. If your particular workflow doesn’t require any variables to start, or the variables are known at design-time, the simple rule-invokes-JavaScript approach will work. Otherwise, something more is required.

What we needed for this client was not only the ability to start a workflow when a new object landed in a folder, but also the ability for an end-user to specify parameters for that workflow ahead of time. When discussing the functionality we called it “precompiled workflows” and “workflow templates”, which is pretty descriptive of what we needed. From the user’s perspective, we needed one user (let’s call her a Manager) to be able to say, “When an object is created in this folder, launch this specific workflow with these parameters”. When other users (or systems) create objects in that folder, the workflow needs to launch automatically without further input.

Options considered

As usual, there are a few different ways to go about this. One is the rule-invokes-JavaScript approach discussed earlier. The problem with this is that the parameters have to be specified in the JavaScript and that won’t work for non-technical end-users. This option was discarded early.

The next option we considered was creating a custom action. This almost gets us there. The Manager user can create a rule on a folder and select the custom action. The form service (Share was the main UI in this case) can be configured to let the Manager set the parameters to use when launching the workflow. Non-Manager users can then create objects and workflows will launch automatically.

The problem with this approach is that the list of parameters is not finite–this client had big plans for the workflow engine and they did not want to modify the custom workflow launching action every time they created a workflow that had a new parameter. Similar to the first option, this approach has worked well in the past, but it didn’t fit for this client, so we moved on.

The third option we looked at was a variation on the second. Rather than having the action handler responsible for grabbing parameter values, this approach uses the workflow model itself to persist a “workflow configuration” that the action can point to. The Manager would create a workflow configuration, then configure the rule to say, “When documents are added to this folder, start this workflow with this configuration”.

The workflow configuration could be a custom object. Briefly, we considered actually persisting objects that correspond to the types defined in the workflow model (normally, those aren’t saved anywhere), and I think if we ever revist this option, we may look at that again.

The problem with this approach was that the Manager has to think too much. Am I creating a workflow that’s going to launch other workflows? Okay, then I need to create a “workflow configuration” and configure a rule. Am I just routing something through a workflow? Okay, then I need to use “start workflow” like I normally would. That’s way too confusing. Next!

Implementation

Ultimately, we decided that the easiest route from both an implementation perspective and an end-user perspective was to rely on the business process to be smart enough to tell the difference between a package with a folder and a package with documents. When the workflow is run on a folder, it can iterate over the children of the folder, spawning new workflows and copying its process variables into the newly started workflows. It can then transition to a wait state until something tells it to go check for new children in the folder. When the same workflow is run on a document however, it proceeds down the “non-folder” route and performs the work it normally would.

Using this approach, every workflow is able to run as both a “folder monitor” and a “worker”. That way, when a Manager starts a workflow, she doesn’t have to think about whether she’s starting a “monitor” workflow or a one-time workflow, she just starts the workflow as normal and sets the parameters. The business process does the work of spawning additional workflows when it needs to and passes those parameters along. Now we’re talking!

High-level Recipe

Similar to my previous post on workflow dashboards, I may make this source available at some point, but until then, here are the high-level steps to do it yourself.

Create the jBPM process definition

This will work with any process definition, so I won’t describe anything business-specific here. Instead, I’ll talk about what you would add to your process definition to make this work.

The first thing the business process needs to do is make a decision: Am I running against a folder or a document? If the package contains documents, the workflow continues as it normally would. If, however, it contains a folder, the workflow iterates over the folder’s children and starts a new workflow instance for every child in the package, copying its process variables into the new workflows.

The decision is implemented as JavaScript. If we find one folder, we take the folder route. In this case, the users are only going to run the workflow on one folder at-a-time so we don’t have to worry about what to do if the workflow package contains a mix of documents and folders. The decision JavaScript looks like this:

var flag = false;
for (var i = 0; i < bpm_package.children.length; i++) {
    if (bpm_package.children[i].isContainer) {
        flag = true;
        break;
    }
}
isContainer = flag;

And the transitions for the decision look like this:

<transition to="forkWork" name="toForkWork"></transition>
<transition to="forkMonitor" name="toForkMonitor">
    <condition>#{isContainer == true}</condition>
</transition>

The forkMonitor node is a fork that creates a workflow dashboard task (see previous post) and simultaneously transitions to the spawnWorkflows node. The spawn workflows code is a little too lengthy to include here, but what it does is:

  • Grabs the parameters it needs to pass in to each of the newly started workflows
  • For each document in the workflow package…
  • Checks to make sure the document hasn’t already been processed (more on that later)
  • Checks to make sure the document isn’t already in a workflow
  • Starts the workflow using the out-of-the-box start-workflow action (see the code at the start of the article)

Once the workflows are spawned, the business process transitions to a wait state where the process sits indefinitely until it is told to check the folder for new children.

Create a custom action that will tell the process definition to check for children

How does the workflow know when new children have been added to the folder? I’m so glad you asked. After the workflow spawns new workflows for the children, it transitions to a wait state. To trigger the workflow to move off the wait state, we used a custom rule action. The rule action is set high enough up the folder hierachy that end-users don’t have to worry about it–it automatically inherits onto the folders created below it. The rule action is Java-based, and it takes two parameters: The name of a node and the name of a transition to take.

When a new document is added to a folder, the rule triggers the action. The action grabs the node’s parent and checks to see if it is involved in a workflow. If it is, it tries to find the workflow node named in the parameter, which will be the wait state mentioned in the previous step. If it finds that, it signals the node to take the specified transition. The transition will be to the “spawn workflows” node.

Workflow screenshotThe combination of the wait state and the spawn workflows node in the business process with the custom rule action that signals the wait state creates a cool little “interrupt” loop: The process spawns workflows for folder children, then waits until more children arrive, then spawns workflows for those children, and so on until someone kills the workflow.

Customize the Share user interface to allow workflows to start on folders

Since the dawn of time Alfresco’s UI has not allowed workflows to start on folders, but the workflow engine can handle it. For this approach to work we most definitely have to be able to start workflows on folders. That’s a pretty simple little config tweak to make that happen in Share–just copy the existing assign workflow action link from the document actionSet to the folder actionSet in documentlist.get.config.xml (and document-actions.get.config.xml), like this:

<actionSet id="folder">
    ...
    <action type="action-link" id="onActionAssignWorkflow" permission="permissions" label="actions.document.assign-workflow" />
    ...
</actionSet>

Voila! Now you can run workflows against folders.

Create a “workflow status” aspect to avoid processing docs more than once

The last step is to make sure that documents only run through the process once. Otherwise, every time the process spawns workflows for children in a folder, he’ll start one up for every child in the folder, regardless of whether or not its been through the workflow already. This may be what you want, but this particular client wanted docs to run through the process only once.

To do this, we created a simple aspect with a “workflow status” property. In the last node of the process, the property gets set. When the spawn workflows code runs, it filters out folder children that have the status set.

That’s it!

This approach puts the burden on the workflow designer to use some standard node names and logic in their process definitions. And, it will result in many in-flight workflows (at least one for every folder being monitored), although that shouldn’t be a big deal from a performance perspective (running workflows really aren’t “running”).

The important thing for this client is that it provides a nice way for users to essentially “pre-configure” workflows so that subsequent users can start workflows simply by adding documents to a folder, all without anyone having to learn any new “workflow configuration” constructs. And, workflow designers can easily make their workflows “folder aware” or “templatable”, depending on how you want to look at it, all within the process definition, without having to recompile any custom actions or tweak JavaScript.

Workflow Dashboards in Alfresco Share

Metaversant recently had a client with some interesting requirements around workflow. I thought I’d post what we did here and get a conversation going about the various pluses and minuses of the approach and find out what others have done when faced with similar needs.

There are two main buckets of requirements I want to focus on: Workflow Reporting/Dashboard and Folder Monitoring. I’ll talk about Workflow Reporting/Dashboard in this post and Folder Monitoring in the next.

Workflow Reporting/Workflow Dashboard

Workflow Reporting is something I’ve seen quite often and handled differently each time based on what the client is trying to do. Alfresco is pretty sparse on Workflow Reporting (capturing the data and making it easy to report on) and Workflow Dashboarding (presenting a dashboard of running workflows and/or workflow reports in the user interface). By “sparse” I mean there really is none. Recent releases have seen the addition of the “My Tasks” page, but that is limited to what it sounds like: A page listing the tasks the current user is assigned to.

What most people want when they ask for Workflow Reporting is the ability to capture workflow data both before and after a workflow has completed. This is significant because if you do nothing about it, data about running workflows disappears into the ether when the workflows complete. A Workflow Dashboard is the ability to see all workflows assigned to all users or a subset of users and perhaps some historical via Workflow Reports (maybe time started, current task, time on current task, current actor, etc.).

For this particular project, my client really only cared about running workflows, so we didn’t have to worry about capturing workflow stats before the workflow ended–they just wanted to see all workflows, no matter who started them, in a sortable list with a link to the workflow details and the ability to perform batch operations against the workflows (such as selecting all workflows and canceling). The twist, however, was that the client wanted to scope the list of running workflows to the Alfresco Share site they were started in. So if there were two Share sites, A and B, and Site A’s users had started 24 workflows on documents stored within the site while Site B’s users (which may overlap with Site A) had started 35 workflows on Site B documents, Site A’s Workflow Dashboard should show a list of 24 workflows while Site B’s should show 35.

The Solution

When workflow data needs to be persisted beyond the life of the workflow, we’ll typically just create some objects in the content model to persist the data and we’ll write to those objects from one or more actions defined within the business process definition. In this case, that wasn’t necessary.

The problem boiled down to how to capture the specific Share site a workflow was scoped to and, then, how to query for and display that information. Once that was resolved, the data could be displayed on a custom Share page.

I may post the code at some point, but for now, here’s the high-level recipe:

Step 1: Capture the Site ID in a Process Variable

Alfresco jBPM workflows have process variables that can store metadata about a running workflow. And, the workflow API allows me to query against that metadata. So, the first step was to capture which Share site the user was in when they launched the workflow.

To do this, I used the Form Service to define a hidden field on my workflow’s startup form. Then, I overrode Alfresco’s client-side JavaScript StartWorkflow component to add my own code that finds the hidden field and sets it with the Share site’s unique identifier (the Site ID field).

Initially, I used a straight hidden field for this. Later, I came back and created a new custom component that included not only the hidden field, but some additional markup and client-side logic that pulled back additional context about the Share site for that workflow so that when someone viewed the workflow or managed a task, they would know more about the Share site than just its Site ID.

Step 2: Create a web script that returns the workflow metadata

With the Site ID stored in a process variable, the next step was getting the workflow data to the front-end Share tier so it could be displayed in a dashboard. This involved creating a repository tier web script that accepted the Site ID as an argument and returned the data as JSON. There are some out-of-the-box web scripts related to workflows, but they return tasks assigned to the current user and for this I needed all running workflows for a given site ID, so that required a custom web script.

The JavaScript API has come a long way with respect to workflow, but the ability to apply a filter on task metadata isn’t there yet, so that meant my controller had to be Java-based.

The interesting part of that web script controller looks like this:

WorkflowTaskQuery tasksQuery = new WorkflowTaskQuery();
Map<QName, Object> processCustomProps = new HashMap<QName, Object>();
processCustomProps.put(SomeCoWorkflowModel.PROP_RELATED_SITE_ID, siteId);
tasksQuery.setProcessCustomProps(processCustomProps);
tasksQuery.setTaskName(SomeCoWorkflowModel.TYPE_DASHBOARD_TASK);
tasksQuery.setTaskState(WorkflowTaskState.IN_PROGRESS);
List<WorkflowTask> tasks = workflowService.queryTasks(tasksQuery);

This gives us all of the tasks that have the Site ID we’re looking for, but the Dashboard needs more than that–it needs the Workflow Instance for full context. No problem–the Workflow API can handle it. Here’s where the controller iterates overs the tasks and adds the workflows to a List of WorkflowMetadata objects that will get set on the web script’s model:

for (int i = startIndex; i < endCount; i++) {
WorkflowTask task = tasks.get(i);
WorkflowInstance wf = workflowService.getWorkflowById(task.getPath().getInstance().getId());
workflows.add(new WorkflowMetadata(wf, task));
}

One potential gotcha with this approach is that there has to be a task assigned to an actor in order for the workflow to show up in the WorkflowTaskQuery results. If a workflow were sitting on a wait-state, for example, that workflow wouldn’t be returned by the code above. To work around this, the workflows in this solution always queue up a “Dashboard Task” assigned to the initiator to guarantee that all workflows, whether they are currently sitting on a task node or not, always show up in the workflow dashboard.

The view for this web script simply returns the data as JSON, so I’ll spare you the details.

The other web script I wrote for this piece deletes workflows for a given set of workflow IDs. You’ll see where that comes in next, but, again, the logic of that controller (also Java-based) is very straightforward, so on to the next step!

Step 3: Create a custom Workflow Dashboard page

The workflow tasks are tagged with Share Site IDs and a repository-tier web script is in place that knows how to find those tasks and give back the Workflow Instance data as JSON. The final step is to render that as a dashboard. For this, I used standard Surf framework techniques to create a new page called “Workflows”. The page contains a client-side JavaScript component that renders a YUI DataTable. The data source for the DataTable is the web script created in the previous step.

Step 4: Create actions

The specifics for what someone might want to do to one or more workflows displayed in the dashboard varies. For this project, users needed to be able to view the workflow details. Initiators needed to be able to cancel the workflow. Viewing the workflow details is just a matter of building the right URL. Canceling the workflow is a little more involved–I used client-side JavaScript to make repo-tier web script calls to a custom web script that can cancel one or more workflows given the workflow IDs.

To make it so that only workflow initiators can cancel a workflow, I use the same mechanism that document actions uses: The XML configuration file that defines the cancel workflow action specifies that the user must be the workflow initiator. Then, the client-side JavaScript that builds the dashboard actions checks the workflow data to see if that’s the case and hides the link if the current user is not the initiator.

Here’s the XML for the cancel workflow action:
<toolbar>
<actionSet>
<action type="action-link" id="onActionCancel" permission="initiator" label="menu.selected-items.cancel" />
</actionSet>
</toolbar>

The client-side code that checks action permissions is a direct copy of Alfresco’s out-of-the-box logic that does the same.

The Result

Once fully-assembled, the workflow dashboard looks like this:

Workflow dashboard screenshot

In the screenshot, I’ve got multiple workflows selected and have clicked the “Selected Workflows…” link to show that you can cancel more than one workflow at-a-time.

Now you’ve seen a simple workflow dashboard implementation. My next post will be about launching workflows automatically when objects are dropped into a folder.

Does Alfresco Share need to go on a diet?

When Alfresco first told me about the Surf framework and the plan to build a new then unnamed collaborative client on top of Surf I liked the sound of it. Of course anything other than JavaServer Faces sounded pretty good at the time.

But things didn’t quite turn out the way I thought they would.

See, what I thought would happen was that Alfresco would release a bunch of “mini” clients–highly-specialized apps for the task at hand. Want RM? Here’s the RM client. Doing some team stuff? Here’s Share. Basic Document Management, here’s the DM client. Web Content Management. Digital Asset Management. You get the point. With all of these sitting on top of Surf, each client app would only have the code that made it unique for that particular use case. It’s like taking one Ritz cracker (Surf) and then having a veritable smorgasbord of delicious ECM toppings to choose from.

The Dagwood SandwichInstead, what’s happened is that Alfresco launched their first Surf-based client, Share, for team collaboration, and then, rather than go back to the platter for another cracker, they kept piling on and piling on until that once dainty hors d’oeuvre became a towering Dagwood sandwich.

Let’s face it: Clients love the Share interface. They love it so much they want all of their content-centric apps to be based on it. If the client wants basic customizations–some form tweaks, a new dialog here, a new page there–it isn’t so bad. But the more complex the changes are, the more cruft you have to sift through and either eliminate or work around. A quick perusal through the Share code will turn up tidbits that deal with Records Management, SharePoint, and Google Docs. All of these are optional add-ons to Alfresco, but have worked their way into the Share client “just in case” someone installs that extension.

Okay, so if Share has too much to serve as an agile base in some cases, why not drop down to the underlying Surf framework? Because sometimes, Surf can be too bare bones. Recently, I did an implementation for a client that was essentially a community solution. We used two customized versions of Share: One for the “admin” interface for the community and the other for the front-facing community itself. Share worked great for the admin interface–not much tweaking was needed there at all. The dashboard, document library, wiki, discussion, and data lists functionality all made sense in the context of administering content. The front-facing community, however, was another story. We didn’t need 80% of what was in Share out-of-the-box. But we didn’t drop down to Surf because we wanted blogs, discussions, and some of the Share-tier web scripts for data lists and whatnot. We knew gathering up all of the dependencies needed to “push down” those features into Surf would be a pain. The solution turned out great, but the ratio of used to unused code is kind of scary.

Alfresco seems too far down the every-new-feature-we-come-up-with-goes-into-Share path at this point. But I wonder if the concept of a “distribution” could apply to Share. This would mean stripping down Share to some sort of bare bones minimum, just slightly bulkier than raw Surf. Then, provide AMPs or Maven builds or scripts or something that developers can use to “build up” Share with only the functionality they need.

Or maybe the solution is to make things that are optional, truly optional. It would be nice, if, through a script or this “tear down, then build up” approach, you could completely remove things like:

  • Sharepoint integration
  • Google Docs integration
  • Records Management
  • Wiki
  • Blogs
  • Discussion
  • Links
  • Anything else that’s not about the document library, data lists, categories, tags, and search.

By “completely remove” I don’t mean “hide from the user”. I mean when I recursively grep the Share web app for “Sharepoint” (for example) I get zero hits.

The goal here is to cut way down on the amount of code developers have to sift through, override, and extend when starting with Share as a base. And, once deployed, reduce the amount of code that has to be maintained and upgraded going forward.

Maybe Alfresco should take a lesson from Drupal. Some would argue that the core of Drupal is already too big, but at least the majority of extensions are in (truly) optional modules. And there are a number of Drupal distributions that take core and bundle different sets of modules for specific use cases. Django has something similar with the pinax project.

What do you think? Am I just being a picky eater? Is it realistic to think that Alfresco can whittle Share down to a more suitable base for the rest of us to build on?

Book Review: Alfresco 3 Web Services

I’ve just finished reading Alfresco 3 Web Services, by Ugo Cei and Piergiorgio Lucidi, which Packt sent me to read and comment on.

Alfresco 3 Web Services is meant for developers with a very specific focus: Remotely talking to Alfresco. That might mean communicating via SOAP-based Web Services, RESTful web services (Web Scripts), or either protocol through CMIS. Whichever you choose, Ugo and Piergiorgio have you covered. I really like that the authors set out to write such a focused piece and stuck to it–it allows them to go deep on their topic and keeps the chapter length and total book length digestable.

The book is written very clearly and follows a logical progression. It starts out with SOAP-based services (first 5 chapters), including a chapter on .NET, and then moves on to web scripts (3 chapters). After that, the authors discuss AtomPub, which is an interesting, but not critical, background for the remaining 5 chapters of the book which focus on CMIS.

The discussion on web scripts covers both Java and JavaScript controllers, but JavaScript definitely gets more attention. Something developers new to the platform will find helpful is the chapter on FreeMarker. Most of Alfresco’s documentation and the books that are out there (including mine) relegate the FreeMarker API to appendices or just show examples and assume you’ll look up details later when you need it. Because it is typically such a key component of web scripts and other aspects of Alfresco, it was a good call to include it in the book.

The book comes with several small examples and a couple of tie-it-together examples. None of it runs out-of-the-box without some tweaking which isn’t a surprise given how rapidly things are changing in this area, particularly with regard to CMIS. At the time the book was written, OpenCMIS, the Java API for CMIS available as part of Apache Chemistry, had not been officially released. As I write this, the Chemistry team is about to release their second tagged build. The differences aren’t significant enough to cause confusion–most readers will be able to fidget with the import statements and the other changes required to get the sample code running.

I thought there was a little too much time devoted to SOAP-based Web Services, but that’s just personal preference and the fact that nearly all of my clients over the last four years have gone the RESTful route. The authors note that others may have the same preference and they make it easy to skip over those chapters if the reader wants to.

Although the chapters logically progress and build on each other, the code samples don’t–for the most part, each chapter’s code samples are self-contained. For example, I thought it was kind of strange that the code built in Chapter 13 for the CMIS Web Services binding examples isn’t used in Chapter 14 to build the CMIS Wiki example.

Many Alfresco implementation teams are divided into at least back-end and front-end teams and when the project is large enough, you can definitely have people focused on the middle. This book is perfect for that middle-ware team or for anyone who’s got a handle on the back- and front-ends and just needs to learn how to stitch them together. Nice job, Ugo and Piergiorgio.

7 mistakes developers make when customizing Alfresco Share

I’ve seen more than my–ahem–fair share of Alfresco Share over the last several months. Many clients feel that their needs are so close to what Share provides out-of-the-box, that they can save time and money by starting with Share as the basis for their custom content-centric application. Whether or not that’s a good idea is the subject of another post. This post assumes that, for whatever reason, you find yourself customizing Alfresco’s Share client and wondering what are some of the common pitfalls to avoid. Here’s seven. Feel free to add to the list.

1. Ignoring client-side JavaScript minification

Here is a massive understatement for you: Alfresco Share has a lot of client-side JavaScript files. Most, if not all, of these are minified, or compressed, to reduce the size of a given page and increase client-side performance. If you’ve ever looked at the FreeMarker source for one of Alfresco’s pages, you may have seen something like this:

<@script type="text/javascript" src="${page.url.context}/components/blog/blog-common.js">

It looks like an everyday JavaScript reference but what’s up with that “@script” tag? It’s a FreeMarker macro. It switches out the JavaScript source file for the minified version when debug is turned off and uses the original uncompressed source when debug is turned on, which makes stepping through the client-side JavaScript much more pleasant.

There are two things you need to be aware of here. First, if you find yourself tweaking Alfresco’s client-side JavaScript, you’ll need to remember to deploy both the expanded and minified version of the file. Otherwise, when people turn debug on and off, they’ll see different results. Second, when you create your own client-side JavaScript, you need to minify your own code for the same reason.

You could turn debug on and leave it on (bad idea) or you could use a “normal” script tag and point to the non-minified versions of your JavaScript, but it is really easy to add minification as a part of your build, so you might as well set that up early in the project and you won’t have to worry about it later.

There are several JavaScript compressors out there. Here’s a link to the YUI Compressor. You can drop the JAR into your project and then invoke it from Ant quite easily. Ask Google for some examples.

2. Assuming Alfresco and Share are on the same host

When you install Alfresco it deploys a web application in the “/alfresco” context–that’s your repository and the old Alfresco Explorer client–and a second web application in the “/share” context. Depending on what you’re doing you might deploy numerous additional web apps based on Share or Surf.

Regardless of how you choose to deploy, you need to remember that there is no guarantee your app and Alfresco will be on the same machine, app server, or port number. One of the beauties of the Surf architecture is that you can scale it out across multiple app servers and they can all talk to the same (or multiple) Alfresco repository servers. The underlying Surf framework on which Share is based has configuration and helper variables you can leverage to deal with this. You should not be hardcoding “localhost” or any other hostname in your Share code.

3. Incomplete theme customization

Alfresco Share 3.3 has user-selectable themes. As part of your customization effort you can define your own theme and then configure that to be the default. An easy way to create your own theme is to copy one of the out-of-the-box themes and then modify it to suit your needs. The keys to cloning a theme successfully are:

  1. Copy one of the themes other than “Default”
  2. Search and replace references to the old theme name in the new CSS files (login.css, presentation.css, and yui/assets/skin.css)
  3. In the previous step, don’t forget yui/assets/skin.css!

4. Duplicating, rather than extending, Alfresco web scripts

Suppose you want to change something in one of Alfresco’s web scripts. You may be tempted to change the out-of-the-box controller JavaScript or FreeMarker views, but don’t do it. A nice thing about the web script framework is that you can override even just a single file that is part of a web script by placing your version of the file with the same name in the same folder structure under web-extension. This also works on the repository tier, but instead of web-extension you use the “extension” directory.

For example, maybe I want to extend the document-actions config XML in Share with my own settings. I will NOT copy my version over the top of Alfresco’s. Instead, I’ll put my copy in a file named “document-actions.get.config.xml” under WEB-INF/classes/alfresco/web-extension/site-webscripts/org/alfresco/components/document-details. When Alfresco loads the web script, it will use my version of the config.

5. Not using the web-extension directory

While we’re on the topic, all of your custom Share config files go in web-extension under the Share web application. Don’t put them in $TOMCAT_HOME/shared/classes and don’t put them in the Share web app’s classes/alfresco directory. Use the web-extension directory to keep your stuff separate from Alfresco’s. I also recommend doing the same with your client-side files–create a directory called “extension” for your client-side JavaScript, images, CSS, and so-on.

6. Using the same Tomcat server as the Alfresco repository during development

This one isn’t going to cause you problems, but it sure will slow you down. Even if your production Share web app will run on the same Tomcat as the Alfresco WAR, do yourself a favor: While you’re coding, use two Tomcats. On port 8080, you’ll run Alfresco and out-of-the-box Share. On some other port you’ll run a second Tomcat server with your custom Share- or Surf-based web app. That way, when you need to restart your custom Share app, you don’t have to wait for the repository to start back up. You’ll cut way down on the time you spend waiting for Tomcat to restart which, over time, can speed up your development cycle tremendously.

7. Failing to test on Alfresco’s supported browsers

Have I mentioned how much client-side JavaScript there is in Share? Every time you touch Alfresco’s JavaScript or create your own, you’ll need to test it on the browsers your client intends to use. So there are two recommendations here: First, make sure you are testing across Alfresco’s supported browsers. Second, make sure your clients only expect to use Alfresco’s supported browsers. Failure to do both of these can lead to some missed expectations on both sides. The browsers Alfresco supports for 3.3 are on the supported stacks page on the Alfresco web site.

What am I missing? Add a comment with your Alfresco Share street smarts.

Administering Alfresco via JMX at the command line

Jared Ottley has a cool post on using jmxterm from CyclopsGroups.org. It lets you query and operate on Alfresco’s JMX beans from the command line. One of the examples he gives uses JMX to show how many users are logged in to Alfresco, a list of the logged in users, and then how to log off a specific user.

I used this recently to query the state of an Open Office process and then to restart the Open Office converter. It worked great.

JMX is only available in the Enterprise version of Alfresco.

Adobe acquires Day Software for $240 million

I’ll admit it. I did not see this one coming: Adobe announced today that it will acquire Day Software for $240 million USD. (Thanks, @pmonks, for the heads-up tweet).

Honestly, I thought Adobe would acquire Alfresco by the end of last year and I was surprised when it didn’t happen. They had done a big OEM deal making Alfresco part of LiveCycle and they did a gigantic Alfresco implementation as part of standing up Adobe’s acrobat.com site. Heck, Adobe even hosted Alfresco’s community event back in 2008. All small potatoes in the grand scheme of things, I know, but I can’t help but feel like the proud parent who’s daughter brought home a keeper, only to find out the guy’s been dating a hottie from Switzerland the whole time.

Day has a FAQ up on their site. As you would expect, Day promises that current customers have nothing to fear and that the products will continue to live on.

Day has some really cool stuff in both their commercial products and their open source projects (Sling, Jackrabbit). I hope the acquisition gives Day a huge injection of resources they are able to invest in the open source side of things.

Congrats to Erik Hansen, David Nuescheler, Kevin Cochrane, and the rest of the Day team!

Alfresco, NOSQL, and the Future of ECM

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

Introducing the Alfresco Community Committer Program

It’s been a little over two years since I wrote a blog post entitled, “Is Alfresco the ‘near beer’ of open source?“. In that post, I lamented the fact that the Alfresco code line is entirely closed to community developers and that Alfresco seems unwilling to relinquish any amount of control over the development of their open source product. Writing that post had me a bit riled up so during the Q&A session at the community meetup in San Jose later that week, I asked John Newton, Alfresco CTO, and former Alfrescan, Kevin Cochrane when and if it would ever be different. They said they were “working on it” (See Alfresco pledges to open community by 3.0).

I’m glad to say that, although it took a while, there is now a process by which your code can find its way into the Alfresco code base (Community, and even, potentially, Enterprise). It’s called the Alfresco Community Committer Program (ACCP). The ACCP is a motley crew of volunteers from Alfresco customers and partners around the world. Although not a requirement for membership, I think most of us have developed at least one open source add-on for Alfresco. Our goal is to help community-developed code find its way into the product. Does this mean Alfresco is now as open as “true” open source community projects like Apache and Drupal? No, and honestly, I’m not sure it will ever get there. But Alfresco’s support of the ACCP process is a start. Here’s how the process works.

First step: Nomination to the ACCP Incubator

Today, developers in the community create add-ons, utilities, extensions, language packs and all kinds of software built to work with Alfresco. Some of these might make great additions to the Alfresco product. At a high-level, what the ACCP seeks to do is to act as an on-ramp or incubator for that subset of projects. We want you, real world Alfresco developers and end-users, to nominate community-developed extensions that you find useful and that you would eventually like to see as part of the Alfresco product. The ACCP then reviews these nominations and votes for their inclusion into the incubator. The project’s developers can then decide to leave their code where it is (Google Code, Sourceforge, Alfresco forge, etc.) or they may choose to migrate to the Alfresco-hosted ACCP incubator subversion repository.

Projects accepted to the incubator so far include:

As a side note, it’s great that there are so many community-developed add-ons for Alfresco. But the lack of a central index makes it hard to see what’s available. As a related effort, Nancy Garrity is working on something that would provide a central index, support ratings, etc.

Second step: Community code line

Once a project has been in the incubator for a while, the ACCP may recommend its inclusion as part of Alfresco Community making it much easier for Alfresco Community users to leverage these add-ons. The exact nature of how these will be made available is still being worked out. You could imagine a “community-extensions” directory under the Alfresco Community subversion root or something similar. For certain types of contributions, maybe the installer could even provide an optional “install community extensions” step. Again, although we have recently voted some projects into the ACCP incubator, none have yet to reach Community so the details of exactly how those will be incorporated into the Community code base are still being worked out.

Third step: Enterprise code line

The ACCP may then recommend Enterprise adoption. This step is subject to Alfresco Engineering approval, which may be a significant hurdle for some, but if it happens, the entire Alfresco customer base gets the benefit of Alfresco’s ongoing support of the community-developed code. Note that the Enterprise approval step is the only one where Alfresco employees have a say about how an ACCP project is handled–per our charter, Alfresco employees cannot be voting members of the committee.

How you can get involved

First, and foremost, you can nominate an open source Alfresco add-on/extension/customization project. If you want to take an active role on the committee or know someone who would be a good addition, there are spots available. So, another way to help out would be to serve on the committee or nominate someone who should. The committee meets regularly to review and vote on project and committee member nominations. All you have to do is get in touch with me or one of the other regular members of the committee. You’ll find the list on the Alfresco Community Contributor Program wiki page.

We’ll be doing a webinar on July 27th to talk about this more and answer questions. Check out the Alfresco events page to register.

Metaversant is up-and-running

First off, thanks so much to my readers, clients, colleagues, and other friends in the community who have provided a wealth of support in terms of well-wishes and congratulations on the forming of my new company, Metaversant. Several people have asked how things are going so here’s a brief update…

Metaversant is up-and-running and I’m as busy as can be. I finally got a web site up, a logo designed, and business cards aren’t far behind. I’ve even got people to give them to which is an important pre-requisite to actually having business cards.

I’m currently billing on an Alfresco Share customization project. I can’t tell you who or exactly what but the pattern will be familiar: A company needs to manage digital assets. Some come from internal sources, some come from external sources, and all need metadata and security applied. The front-end communicates with the Alfresco repository via RESTful Web Script calls while back-office content providers and application administrators use Alfresco Share (customized here and there) to upload assets, set metadata, and manage the business process. It’s a pretty classic pattern and other than extraordinarily tight timeline pressure, it’s going well.

Beyond technical execution I’ve also conducted some Alfresco training for a pharma client in New England. It was just a quick engagement but it was fun to help a team that had been doing some playing with Alfresco on their own discover the capabilities of the platform and how they could be applied to their business problems. I also love to see the expression on people’s faces when it hits them: They don’t need Documentum for all things DM any more.

I had a great trip to New York City for the Alfresco Community Meet-up. I don’t know what the official ratio was but the customers seemed to heavily outnumber the partners as this particular meet-up which is good for everyone, I think. I caught up with a lot of old friends and met some new ones. I was particularly excited to come across someone who had some plans to leverage cmislib, my client-side CMIS API for Python, a project I’ve sorely neglected this Spring with all of the startup stuff going on. All of this Java code I’ve been writing has me missing Python–I will find time for cmislib soon.

Being fully billable while still having to find new business and take care of everything else about the business is tough, as I knew it would be. I’m loving every minute of it though. One thing I didn’t expect is the helpfulness of friends, former colleagues, and even strangers who have started their own businesses. The entrepreneur community is not unlike the open source community. Everyone loves to talk shop and trade tips and advice. It’s really cool.

My exercise regimen (a generous description) has suffered and it looks like my blogging velocity is on a similar trend. I feel like I’m getting into some regular rhythms though so maybe I can get things back in balance shortly. I’ve got all kinds of things that I need to write about: Alfresco 3.3 Enterprise is out, Alfresco hired the jBPM guys (I totally called it!), and I don’t think I’ve written anything at all about the Alfresco Community Committer Program. It’s going to be a busy Summer.