Category: Content Management

Enterprise Content Management (ECM), Web Content Management (WCM), Document Management (DM). Whatever you call it this category covers market happenings and lessons learned.

Thoughts on workflow and jBPM

I just got back from jBPM training at JBoss. The class itself needs a bit of streamlining but the instructors are aware of that and are working to improve the offering. The technology, though, is very cool. It really got me thinking about how my own workflow apps have evolved over the years and made me even more excited about the up-coming Alfresco 1.4 release (see Roadmap) which will include jBPM as its workflow engine.

Just about every application I’ve developed over the last thirteen years has had some type of workflow. In the early Lotus Notes days, Notes equalled “workflow”. What that meant to most Notes applications was that there was a field on a Notes document that kept track of the document’s status. Security on the document (or even fields on the document) would change based on that status field. Moving from state-to-state was handled by any number of bits of code in actions (macros) or form events. Every aspect of the business process was essentially diffused into every nook-and-cranny of the Notes database. There was no concept of a central definition of the business process–the process was everywhere. Ironically, Lotus Notes–the de facto standard for “workflow” applications–didn’t have a built-in workflow engine as we would identify one today.

Custom techniques for handling workflow evolved over time. In a custom content management application I was involved with, for example, we developed our own XML-based state engine to handle workflow which was a vast improvement over simple field-based state management. It had its own API you could call to move documents through the process and supported simple events like sending an email notification. At about that time, third-parties began offering add-on products for Lotus Notes and Domino that let you define, manage and execute a process in a similar fashion. Although loosely-coupled with the app, you were still tied to the underlying Notes/Domino platform.

Then I began working with Documentum. Documentum’s Workflow Manager (and, later, Business Process Manager) uses a graphical tool to define process templates which are then instantiated and executed in a run-time (Tomcat) that runs within the context of the overall Documentum content server. It worked pretty well for every project I ever used it with, but it has a few short-comings:

  • Although the marketing hype is that any business analyst can create and manage the workflow, this is true only for the simplest of processes. Every workflow I have ever worked on is complex enough to require automated activities (tasks written with code). In Documentum’s workflow tools automated tasks are not transparent to the analyst. So working with workflows is inherently a “development-oriented” task. Expecting that a non-technical business analyst can change a workflow without significant knowledge of the underlying application just isn’t realistic.
  • The process definition is proprietary. Your two choices for creating and managing process definitions in Documentum are to use Documentum’s graphical tools or by writing your own custom code against the Documentum API to create process definitions. On a related note, the definition isn’t human readable without the tool–you can’t just pull up a Documentum workflow template with a text editor to see what’s going on.
  • The concept of lifecycles and processes are separate. To me, the concept of what state a document is in (Documentum calls this a “lifecycle”) and the process a document goes through (“workflow”) are so interdependent they should be modeled together. In Documentum these are two separate concepts. I suppose it can be simplifying–if you only need states you can just use lifecycles without workflows. But I always found it a bit clumsy when dealing with the interactions between the two.
  • The process definitions are tightly-coupled with the Documentum platform. The obvious problem with this is that processes defined within Documentum are not portable to other content management systems or workflow engines. In my clients, I also saw this as a source of confusion–when should they use the “embedded” workflow within Documentum versus another workflow product?
  • The framework is limited and cannot be easily extended. Documentum’s workflow is purpose-built for moving documents around, not data. That’s great if you are working with files but there are many applications that need process functionality that aren’t document specific. If you’re working with the Documentum workflow engine and you just want to route some raw data around your only real choice is to put it in a “content-less object” and route that around. And because the framework is proprietary you couldn’t fix this if you wanted to.

Now I’m digging in to jBPM and I’m excited about what I’m finding. Loosely-coupling the workflow engine with the content management system, basing the process definition on open standards, making the implementation open and extensible, and providing a run-time that requires nothing more than a servlet container and a relational database creates a robust, flexible workflow engine that addresses many of the shortcomings of embedded, proprietary solutions. jBPM is one example of such a solution but there are others in the open source world.

jBPM process definitions can be defined graphically using an Eclipse plug-in. Because the process definition is expressed in XML, you also have the option of writing these by hand, programmatically, or with any tool that can output XML.

Don’t like the out-of-the-box jBPM implementation for a “split” or a “join”? No problem. Override their implementation with your own logic. In fact, adding your own logic to the process is usually as simple as pointing the event handler for a node or transition to a POJO that implements a simple, often single-method, interface.

Any application can take advantage of the engine. Integration is possible by directly talking to the jBPM API or through less tightly-coupled methods such as JMS and web services.

If you are a Documentum customer about to implement a process-centric application, should you ditch Business Process Manager and go with something like jBPM? I’m not ready to draw that conclusion. What gets me excited, though, is knowing that I can implement robust workflow in any application I build by leveraging an open tool like jBPM. And when open source projects like Alfresco incorporate it into their solutions, I don’t feel like I’m giving up anything when compared to proprietary competitors. In the case of Alfresco with jBPM compared to something like Documentum and its proprietary, embedded worklfow engine, it actually feels like I’m gaining functionality.

To learn more about jBPM, check out the jBPM Wiki. Specifically, the Getting Started Guide has everything you should need to start learning.

Alfresco web client customization

I implemented a simple Alfresco web client customization over the weekend. At Optaros, we’ve got two Alfresco repositories–one in North America and one in Europe. Alfresco doesn’t yet offer federated repositories, but we needed some way to make it easier for folks to jump between the two, and give at least a rough feeling of there being one repository, not two.

So, I added a new section to the “Shelf”. The shelf is a little piece of collapsible real estate in the Alfresco web client UI that contains things like recently viewed spaces (i.e., folders) and bookmarks. The new Shelf Item I added is called “Repositories”. It is essentially a list of links that point to all of the Alfresco repositories in your environment. Users can then click the repository name to open the home space for that repository.

Obviously, it doesn’t implement single sign-on, but at least people can jump between the repositories quickly. And, we should be able to leverage the config in the future to do things like federated search.

This type of customization is a decent way to learn about Alfresco UI customization because it is pretty constrained in scope and yet involves a good cross-section of Alfresco UI config elements like components, tag libs, the configuration service, and actions.

I originally wanted to extend the Alfresco JSPs, but as it turns out, the shelf is implemented in a “parts” JSP that is included in just about every other JSP. The maintenance pain of overriding the config for every out-of-the-box JSP is worse than the potential pain of simply overwriting their JSP with my customized JSP so I chose the latter path. Still, everything else follows their customization model.

According to the forum and this JIRA post, they are aware of the problem that those explicitly-included JSP files cause. No word on when it will be fixed.

I’ll write up the details for how I did the customization and post them here when I get a chance.

A few Alfresco tips

Here are a few Alfresco tips. Nothing Earth-shattering here–these issues are documented on the wiki or in the forums. I came across these issues while getting the 1.3.0 Alfresco WAR running on Tomcat 5.5.17 with MySQL 4.12 on Ubuntu 5.10 (“Breezy Badger”).

Use extension configuration files to override default repository settings. Refer to this wiki page to learn how to do this. If you are running the WAR file only, you can get some sample configuration files from either the source distribution or the Alfresco-Tomcat bundle. In the bundle, they are in /tomcat/shared/classes/alfresco/extension.

If you move your data directory, drop and re-create your alfresco database. There’s probably a way to avoid this if you need to relocate a production data directory, but if you are just working with test data, dropping and re-creating may be the quickest solution. The data directory stores Lucene full-text indexes as well as user account information. You can specify a location for your data directory using the dir.root property in an extension file.

Add “userServerPrepStmts=false” to your JDBC connection string. If you are using MySQL and you get a Hibernate exception like “Could not execute JDBC batch update…Incorrect arguments to mysql_stmt_execute” try making this change. You can tweak the default JDBC connection string by overriding the db.url configuration property.

Links to info on SharePoint Server 2007

Many in the content management industry are curious to see exactly what is going to be included as part of Microsoft Office SharePoint Server 2007. CMSWire has a few links to some Microsoft resources in this recent post.

At Navigator we had at least one client that was betting heavily on the new release. They believed Microsoft’s promise that it will include web content management, records management, workflow and better security. As more and more people get experience with Beta 2 we should start to find out how well they’re going to be able to keep that promise.

In the meantime, if you’re ready to do an MCMS 2002-to-Alfresco conversion, just let me know.

IBM developerWorks evaluates open source CMS

A team at IBM’s developerWorks has written an article on their evaluation of open source content management software. They were looking to build a closed community web site with freely-available software and ultimately chose Drupal for the task. The article covers the requirements, design, selection process, and gives a highlight of the customizations they made.

(I came across this while taking a look at Krugle, which is a code- and technology-centric search engine).

Alfresco promises better portal integration

Recently, John Newton sent me an email thanking me for my post on Alfresco’s JBoss Portal integration. He said they are looking at providing additional Alfresco portlets in up-coming releases. Being able to use Alfresco as a replacement JCR repository for JBoss Portal is also in the works. Apparently the Liferay-Alfresco bundle is configured in this way but I haven’t had a chance to take a peek yet.

JBoss Portal and Alfresco

I recently delved into JBoss Portal to put together a demo for a client. I started with JBoss Portal 2.2.1 without Alfresco just to get my feet wet. I was a bit underwhelmed. The documentation was spotty, which I expected. The admin UI was clunky at best and wholely non-functional in some instances (here’s a tip: stick to the XML descriptors and avoid the UI for now).

The bigger problem for my immediate need was that the out-of-the-box CMSPortlet instance couldn’t be easily customized through either XML or the admin interface to show anything but the default content stored in the embedded jackrabbit (JCR/JSR-170) repository. The problem was that the URL was configured as an initialization parameter instead of a portlet preference. To fix that I snagged the updated CMSPortlet from the 2.4.0 Alpha release and deployed it to my 2.2.1 instance which worked great.

My next source of frustration was the JBoss-Alfresco bundle. I didn’t know exactly what was going to be included in the bundled instance of JBoss Portal and Alfresco–in hindsight my expectations were set too high. What I was hoping for was that Alfresco would be configured as the replacement JCR repository for JBoss Portal and that there would be a set of useful portlets that exposed the Alfresco repository to portal users. At a bare minimum I would have expected an Alfresco search portlet and a trimmed down “spaces” portlet.

Instead, what’s included is a single “Alfresco Client” portlet that essentially wraps the entire Alfresco UI in a single portlet. The embedded jackrabbit repository still exists and can be used with the CMSPortlet, but Alfresco isn’t configured out-of-the-box to be used as the content repository for JBoss portal.

These annoyances can obviously be addressed with code. And because JBoss and Alfresco leverage open standards, that code will be easier to write and maintain. I was just hoping that the bundle would have been more tightly integrated. (In the immediate-term I was hoping for a more powerful demo with less sweat equity).

As a side note it makes me wonder: Does Alfresco already have these portlets (and other similar types of value-added code) in-house but not easily accessible (or accessible at all) by the community or do they not exist?

This really illustrates the need for services firms to help clients take open source components the “last mile” by adding glue-code, implementing useful add-ons (portlets, integrations, etc.), and beefing up documentation, all of which could and should be injected back into the community in some form or fashion.

High-end document management getting squeezed?

Tony Byrne points out something I’ve seen happening at my clients recently as well: collaborative tools like Sharepoint and eRoom are being leveraged for “informal” document management while high-end tools such as Documentum are used for “formal” document management.

This sounds familiar. For most of the 1990’s me and my colleagues were hardcore Lotus Notes developers. We never saw any competition in Documentum. At that time we saw Documentum as a niche solution for high-volume, highly-regulated content, or imaging.

The current release of Sharepoint lacks “table stakes” functionality for all but the most basic of document management needs. The two critically lacking features are document-level security and any semblance of a basic workflow. But Microsoft looks to be addressing both of those features and more with its 2007 release.

As the price gap between Sharepoint, eRoom, open source solutions and high-end ECM platforms increases, and as things like web services make integration less of a headache, could we be seeing a regression in how the market views offerings like Documentum, i.e., only for the 10% – 20% of the specialized document management needs in an enterprise?