Tag: Alfresco

Workflow Dashboards in Alfresco Share

Metaversant recently had a client with some interesting requirements around workflow. I thought I’d post what we did here and get a conversation going about the various pluses and minuses of the approach and find out what others have done when faced with similar needs.

There are two main buckets of requirements I want to focus on: Workflow Reporting/Dashboard and Folder Monitoring. I’ll talk about Workflow Reporting/Dashboard in this post and Folder Monitoring in the next.

Workflow Reporting/Workflow Dashboard

Workflow Reporting is something I’ve seen quite often and handled differently each time based on what the client is trying to do. Alfresco is pretty sparse on Workflow Reporting (capturing the data and making it easy to report on) and Workflow Dashboarding (presenting a dashboard of running workflows and/or workflow reports in the user interface). By “sparse” I mean there really is none. Recent releases have seen the addition of the “My Tasks” page, but that is limited to what it sounds like: A page listing the tasks the current user is assigned to.

What most people want when they ask for Workflow Reporting is the ability to capture workflow data both before and after a workflow has completed. This is significant because if you do nothing about it, data about running workflows disappears into the ether when the workflows complete. A Workflow Dashboard is the ability to see all workflows assigned to all users or a subset of users and perhaps some historical via Workflow Reports (maybe time started, current task, time on current task, current actor, etc.).

For this particular project, my client really only cared about running workflows, so we didn’t have to worry about capturing workflow stats before the workflow ended–they just wanted to see all workflows, no matter who started them, in a sortable list with a link to the workflow details and the ability to perform batch operations against the workflows (such as selecting all workflows and canceling). The twist, however, was that the client wanted to scope the list of running workflows to the Alfresco Share site they were started in. So if there were two Share sites, A and B, and Site A’s users had started 24 workflows on documents stored within the site while Site B’s users (which may overlap with Site A) had started 35 workflows on Site B documents, Site A’s Workflow Dashboard should show a list of 24 workflows while Site B’s should show 35.

The Solution

When workflow data needs to be persisted beyond the life of the workflow, we’ll typically just create some objects in the content model to persist the data and we’ll write to those objects from one or more actions defined within the business process definition. In this case, that wasn’t necessary.

The problem boiled down to how to capture the specific Share site a workflow was scoped to and, then, how to query for and display that information. Once that was resolved, the data could be displayed on a custom Share page.

I may post the code at some point, but for now, here’s the high-level recipe:

Step 1: Capture the Site ID in a Process Variable

Alfresco jBPM workflows have process variables that can store metadata about a running workflow. And, the workflow API allows me to query against that metadata. So, the first step was to capture which Share site the user was in when they launched the workflow.

To do this, I used the Form Service to define a hidden field on my workflow’s startup form. Then, I overrode Alfresco’s client-side JavaScript StartWorkflow component to add my own code that finds the hidden field and sets it with the Share site’s unique identifier (the Site ID field).

Initially, I used a straight hidden field for this. Later, I came back and created a new custom component that included not only the hidden field, but some additional markup and client-side logic that pulled back additional context about the Share site for that workflow so that when someone viewed the workflow or managed a task, they would know more about the Share site than just its Site ID.

Step 2: Create a web script that returns the workflow metadata

With the Site ID stored in a process variable, the next step was getting the workflow data to the front-end Share tier so it could be displayed in a dashboard. This involved creating a repository tier web script that accepted the Site ID as an argument and returned the data as JSON. There are some out-of-the-box web scripts related to workflows, but they return tasks assigned to the current user and for this I needed all running workflows for a given site ID, so that required a custom web script.

The JavaScript API has come a long way with respect to workflow, but the ability to apply a filter on task metadata isn’t there yet, so that meant my controller had to be Java-based.

The interesting part of that web script controller looks like this:

WorkflowTaskQuery tasksQuery = new WorkflowTaskQuery();
Map<QName, Object> processCustomProps = new HashMap<QName, Object>();
processCustomProps.put(SomeCoWorkflowModel.PROP_RELATED_SITE_ID, siteId);
tasksQuery.setProcessCustomProps(processCustomProps);
tasksQuery.setTaskName(SomeCoWorkflowModel.TYPE_DASHBOARD_TASK);
tasksQuery.setTaskState(WorkflowTaskState.IN_PROGRESS);
List<WorkflowTask> tasks = workflowService.queryTasks(tasksQuery);

This gives us all of the tasks that have the Site ID we’re looking for, but the Dashboard needs more than that–it needs the Workflow Instance for full context. No problem–the Workflow API can handle it. Here’s where the controller iterates overs the tasks and adds the workflows to a List of WorkflowMetadata objects that will get set on the web script’s model:

for (int i = startIndex; i < endCount; i++) {
WorkflowTask task = tasks.get(i);
WorkflowInstance wf = workflowService.getWorkflowById(task.getPath().getInstance().getId());
workflows.add(new WorkflowMetadata(wf, task));
}

One potential gotcha with this approach is that there has to be a task assigned to an actor in order for the workflow to show up in the WorkflowTaskQuery results. If a workflow were sitting on a wait-state, for example, that workflow wouldn’t be returned by the code above. To work around this, the workflows in this solution always queue up a “Dashboard Task” assigned to the initiator to guarantee that all workflows, whether they are currently sitting on a task node or not, always show up in the workflow dashboard.

The view for this web script simply returns the data as JSON, so I’ll spare you the details.

The other web script I wrote for this piece deletes workflows for a given set of workflow IDs. You’ll see where that comes in next, but, again, the logic of that controller (also Java-based) is very straightforward, so on to the next step!

Step 3: Create a custom Workflow Dashboard page

The workflow tasks are tagged with Share Site IDs and a repository-tier web script is in place that knows how to find those tasks and give back the Workflow Instance data as JSON. The final step is to render that as a dashboard. For this, I used standard Surf framework techniques to create a new page called “Workflows”. The page contains a client-side JavaScript component that renders a YUI DataTable. The data source for the DataTable is the web script created in the previous step.

Step 4: Create actions

The specifics for what someone might want to do to one or more workflows displayed in the dashboard varies. For this project, users needed to be able to view the workflow details. Initiators needed to be able to cancel the workflow. Viewing the workflow details is just a matter of building the right URL. Canceling the workflow is a little more involved–I used client-side JavaScript to make repo-tier web script calls to a custom web script that can cancel one or more workflows given the workflow IDs.

To make it so that only workflow initiators can cancel a workflow, I use the same mechanism that document actions uses: The XML configuration file that defines the cancel workflow action specifies that the user must be the workflow initiator. Then, the client-side JavaScript that builds the dashboard actions checks the workflow data to see if that’s the case and hides the link if the current user is not the initiator.

Here’s the XML for the cancel workflow action:
<toolbar>
<actionSet>
<action type="action-link" id="onActionCancel" permission="initiator" label="menu.selected-items.cancel" />
</actionSet>
</toolbar>

The client-side code that checks action permissions is a direct copy of Alfresco’s out-of-the-box logic that does the same.

The Result

Once fully-assembled, the workflow dashboard looks like this:

Workflow dashboard screenshot

In the screenshot, I’ve got multiple workflows selected and have clicked the “Selected Workflows…” link to show that you can cancel more than one workflow at-a-time.

Now you’ve seen a simple workflow dashboard implementation. My next post will be about launching workflows automatically when objects are dropped into a folder.

Does Alfresco Share need to go on a diet?

When Alfresco first told me about the Surf framework and the plan to build a new then unnamed collaborative client on top of Surf I liked the sound of it. Of course anything other than JavaServer Faces sounded pretty good at the time.

But things didn’t quite turn out the way I thought they would.

See, what I thought would happen was that Alfresco would release a bunch of “mini” clients–highly-specialized apps for the task at hand. Want RM? Here’s the RM client. Doing some team stuff? Here’s Share. Basic Document Management, here’s the DM client. Web Content Management. Digital Asset Management. You get the point. With all of these sitting on top of Surf, each client app would only have the code that made it unique for that particular use case. It’s like taking one Ritz cracker (Surf) and then having a veritable smorgasbord of delicious ECM toppings to choose from.

The Dagwood SandwichInstead, what’s happened is that Alfresco launched their first Surf-based client, Share, for team collaboration, and then, rather than go back to the platter for another cracker, they kept piling on and piling on until that once dainty hors d’oeuvre became a towering Dagwood sandwich.

Let’s face it: Clients love the Share interface. They love it so much they want all of their content-centric apps to be based on it. If the client wants basic customizations–some form tweaks, a new dialog here, a new page there–it isn’t so bad. But the more complex the changes are, the more cruft you have to sift through and either eliminate or work around. A quick perusal through the Share code will turn up tidbits that deal with Records Management, SharePoint, and Google Docs. All of these are optional add-ons to Alfresco, but have worked their way into the Share client “just in case” someone installs that extension.

Okay, so if Share has too much to serve as an agile base in some cases, why not drop down to the underlying Surf framework? Because sometimes, Surf can be too bare bones. Recently, I did an implementation for a client that was essentially a community solution. We used two customized versions of Share: One for the “admin” interface for the community and the other for the front-facing community itself. Share worked great for the admin interface–not much tweaking was needed there at all. The dashboard, document library, wiki, discussion, and data lists functionality all made sense in the context of administering content. The front-facing community, however, was another story. We didn’t need 80% of what was in Share out-of-the-box. But we didn’t drop down to Surf because we wanted blogs, discussions, and some of the Share-tier web scripts for data lists and whatnot. We knew gathering up all of the dependencies needed to “push down” those features into Surf would be a pain. The solution turned out great, but the ratio of used to unused code is kind of scary.

Alfresco seems too far down the every-new-feature-we-come-up-with-goes-into-Share path at this point. But I wonder if the concept of a “distribution” could apply to Share. This would mean stripping down Share to some sort of bare bones minimum, just slightly bulkier than raw Surf. Then, provide AMPs or Maven builds or scripts or something that developers can use to “build up” Share with only the functionality they need.

Or maybe the solution is to make things that are optional, truly optional. It would be nice, if, through a script or this “tear down, then build up” approach, you could completely remove things like:

  • Sharepoint integration
  • Google Docs integration
  • Records Management
  • Wiki
  • Blogs
  • Discussion
  • Links
  • Anything else that’s not about the document library, data lists, categories, tags, and search.

By “completely remove” I don’t mean “hide from the user”. I mean when I recursively grep the Share web app for “Sharepoint” (for example) I get zero hits.

The goal here is to cut way down on the amount of code developers have to sift through, override, and extend when starting with Share as a base. And, once deployed, reduce the amount of code that has to be maintained and upgraded going forward.

Maybe Alfresco should take a lesson from Drupal. Some would argue that the core of Drupal is already too big, but at least the majority of extensions are in (truly) optional modules. And there are a number of Drupal distributions that take core and bundle different sets of modules for specific use cases. Django has something similar with the pinax project.

What do you think? Am I just being a picky eater? Is it realistic to think that Alfresco can whittle Share down to a more suitable base for the rest of us to build on?

Book Review: Alfresco 3 Web Services

I’ve just finished reading Alfresco 3 Web Services, by Ugo Cei and Piergiorgio Lucidi, which Packt sent me to read and comment on.

Alfresco 3 Web Services is meant for developers with a very specific focus: Remotely talking to Alfresco. That might mean communicating via SOAP-based Web Services, RESTful web services (Web Scripts), or either protocol through CMIS. Whichever you choose, Ugo and Piergiorgio have you covered. I really like that the authors set out to write such a focused piece and stuck to it–it allows them to go deep on their topic and keeps the chapter length and total book length digestable.

The book is written very clearly and follows a logical progression. It starts out with SOAP-based services (first 5 chapters), including a chapter on .NET, and then moves on to web scripts (3 chapters). After that, the authors discuss AtomPub, which is an interesting, but not critical, background for the remaining 5 chapters of the book which focus on CMIS.

The discussion on web scripts covers both Java and JavaScript controllers, but JavaScript definitely gets more attention. Something developers new to the platform will find helpful is the chapter on FreeMarker. Most of Alfresco’s documentation and the books that are out there (including mine) relegate the FreeMarker API to appendices or just show examples and assume you’ll look up details later when you need it. Because it is typically such a key component of web scripts and other aspects of Alfresco, it was a good call to include it in the book.

The book comes with several small examples and a couple of tie-it-together examples. None of it runs out-of-the-box without some tweaking which isn’t a surprise given how rapidly things are changing in this area, particularly with regard to CMIS. At the time the book was written, OpenCMIS, the Java API for CMIS available as part of Apache Chemistry, had not been officially released. As I write this, the Chemistry team is about to release their second tagged build. The differences aren’t significant enough to cause confusion–most readers will be able to fidget with the import statements and the other changes required to get the sample code running.

I thought there was a little too much time devoted to SOAP-based Web Services, but that’s just personal preference and the fact that nearly all of my clients over the last four years have gone the RESTful route. The authors note that others may have the same preference and they make it easy to skip over those chapters if the reader wants to.

Although the chapters logically progress and build on each other, the code samples don’t–for the most part, each chapter’s code samples are self-contained. For example, I thought it was kind of strange that the code built in Chapter 13 for the CMIS Web Services binding examples isn’t used in Chapter 14 to build the CMIS Wiki example.

Many Alfresco implementation teams are divided into at least back-end and front-end teams and when the project is large enough, you can definitely have people focused on the middle. This book is perfect for that middle-ware team or for anyone who’s got a handle on the back- and front-ends and just needs to learn how to stitch them together. Nice job, Ugo and Piergiorgio.

7 mistakes developers make when customizing Alfresco Share

I’ve seen more than my–ahem–fair share of Alfresco Share over the last several months. Many clients feel that their needs are so close to what Share provides out-of-the-box, that they can save time and money by starting with Share as the basis for their custom content-centric application. Whether or not that’s a good idea is the subject of another post. This post assumes that, for whatever reason, you find yourself customizing Alfresco’s Share client and wondering what are some of the common pitfalls to avoid. Here’s seven. Feel free to add to the list.

1. Ignoring client-side JavaScript minification

Here is a massive understatement for you: Alfresco Share has a lot of client-side JavaScript files. Most, if not all, of these are minified, or compressed, to reduce the size of a given page and increase client-side performance. If you’ve ever looked at the FreeMarker source for one of Alfresco’s pages, you may have seen something like this:

<@script type="text/javascript" src="${page.url.context}/components/blog/blog-common.js">

It looks like an everyday JavaScript reference but what’s up with that “@script” tag? It’s a FreeMarker macro. It switches out the JavaScript source file for the minified version when debug is turned off and uses the original uncompressed source when debug is turned on, which makes stepping through the client-side JavaScript much more pleasant.

There are two things you need to be aware of here. First, if you find yourself tweaking Alfresco’s client-side JavaScript, you’ll need to remember to deploy both the expanded and minified version of the file. Otherwise, when people turn debug on and off, they’ll see different results. Second, when you create your own client-side JavaScript, you need to minify your own code for the same reason.

You could turn debug on and leave it on (bad idea) or you could use a “normal” script tag and point to the non-minified versions of your JavaScript, but it is really easy to add minification as a part of your build, so you might as well set that up early in the project and you won’t have to worry about it later.

There are several JavaScript compressors out there. Here’s a link to the YUI Compressor. You can drop the JAR into your project and then invoke it from Ant quite easily. Ask Google for some examples.

2. Assuming Alfresco and Share are on the same host

When you install Alfresco it deploys a web application in the “/alfresco” context–that’s your repository and the old Alfresco Explorer client–and a second web application in the “/share” context. Depending on what you’re doing you might deploy numerous additional web apps based on Share or Surf.

Regardless of how you choose to deploy, you need to remember that there is no guarantee your app and Alfresco will be on the same machine, app server, or port number. One of the beauties of the Surf architecture is that you can scale it out across multiple app servers and they can all talk to the same (or multiple) Alfresco repository servers. The underlying Surf framework on which Share is based has configuration and helper variables you can leverage to deal with this. You should not be hardcoding “localhost” or any other hostname in your Share code.

3. Incomplete theme customization

Alfresco Share 3.3 has user-selectable themes. As part of your customization effort you can define your own theme and then configure that to be the default. An easy way to create your own theme is to copy one of the out-of-the-box themes and then modify it to suit your needs. The keys to cloning a theme successfully are:

  1. Copy one of the themes other than “Default”
  2. Search and replace references to the old theme name in the new CSS files (login.css, presentation.css, and yui/assets/skin.css)
  3. In the previous step, don’t forget yui/assets/skin.css!

4. Duplicating, rather than extending, Alfresco web scripts

Suppose you want to change something in one of Alfresco’s web scripts. You may be tempted to change the out-of-the-box controller JavaScript or FreeMarker views, but don’t do it. A nice thing about the web script framework is that you can override even just a single file that is part of a web script by placing your version of the file with the same name in the same folder structure under web-extension. This also works on the repository tier, but instead of web-extension you use the “extension” directory.

For example, maybe I want to extend the document-actions config XML in Share with my own settings. I will NOT copy my version over the top of Alfresco’s. Instead, I’ll put my copy in a file named “document-actions.get.config.xml” under WEB-INF/classes/alfresco/web-extension/site-webscripts/org/alfresco/components/document-details. When Alfresco loads the web script, it will use my version of the config.

5. Not using the web-extension directory

While we’re on the topic, all of your custom Share config files go in web-extension under the Share web application. Don’t put them in $TOMCAT_HOME/shared/classes and don’t put them in the Share web app’s classes/alfresco directory. Use the web-extension directory to keep your stuff separate from Alfresco’s. I also recommend doing the same with your client-side files–create a directory called “extension” for your client-side JavaScript, images, CSS, and so-on.

6. Using the same Tomcat server as the Alfresco repository during development

This one isn’t going to cause you problems, but it sure will slow you down. Even if your production Share web app will run on the same Tomcat as the Alfresco WAR, do yourself a favor: While you’re coding, use two Tomcats. On port 8080, you’ll run Alfresco and out-of-the-box Share. On some other port you’ll run a second Tomcat server with your custom Share- or Surf-based web app. That way, when you need to restart your custom Share app, you don’t have to wait for the repository to start back up. You’ll cut way down on the time you spend waiting for Tomcat to restart which, over time, can speed up your development cycle tremendously.

7. Failing to test on Alfresco’s supported browsers

Have I mentioned how much client-side JavaScript there is in Share? Every time you touch Alfresco’s JavaScript or create your own, you’ll need to test it on the browsers your client intends to use. So there are two recommendations here: First, make sure you are testing across Alfresco’s supported browsers. Second, make sure your clients only expect to use Alfresco’s supported browsers. Failure to do both of these can lead to some missed expectations on both sides. The browsers Alfresco supports for 3.3 are on the supported stacks page on the Alfresco web site.

What am I missing? Add a comment with your Alfresco Share street smarts.

Administering Alfresco via JMX at the command line

Jared Ottley has a cool post on using jmxterm from CyclopsGroups.org. It lets you query and operate on Alfresco’s JMX beans from the command line. One of the examples he gives uses JMX to show how many users are logged in to Alfresco, a list of the logged in users, and then how to log off a specific user.

I used this recently to query the state of an Open Office process and then to restart the Open Office converter. It worked great.

JMX is only available in the Enterprise version of Alfresco.

Adobe acquires Day Software for $240 million

I’ll admit it. I did not see this one coming: Adobe announced today that it will acquire Day Software for $240 million USD. (Thanks, @pmonks, for the heads-up tweet).

Honestly, I thought Adobe would acquire Alfresco by the end of last year and I was surprised when it didn’t happen. They had done a big OEM deal making Alfresco part of LiveCycle and they did a gigantic Alfresco implementation as part of standing up Adobe’s acrobat.com site. Heck, Adobe even hosted Alfresco’s community event back in 2008. All small potatoes in the grand scheme of things, I know, but I can’t help but feel like the proud parent who’s daughter brought home a keeper, only to find out the guy’s been dating a hottie from Switzerland the whole time.

Day has a FAQ up on their site. As you would expect, Day promises that current customers have nothing to fear and that the products will continue to live on.

Day has some really cool stuff in both their commercial products and their open source projects (Sling, Jackrabbit). I hope the acquisition gives Day a huge injection of resources they are able to invest in the open source side of things.

Congrats to Erik Hansen, David Nuescheler, Kevin Cochrane, and the rest of the Day team!

Alfresco, NOSQL, and the Future of ECM

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

Introducing the Alfresco Community Committer Program

It’s been a little over two years since I wrote a blog post entitled, “Is Alfresco the ‘near beer’ of open source?“. In that post, I lamented the fact that the Alfresco code line is entirely closed to community developers and that Alfresco seems unwilling to relinquish any amount of control over the development of their open source product. Writing that post had me a bit riled up so during the Q&A session at the community meetup in San Jose later that week, I asked John Newton, Alfresco CTO, and former Alfrescan, Kevin Cochrane when and if it would ever be different. They said they were “working on it” (See Alfresco pledges to open community by 3.0).

I’m glad to say that, although it took a while, there is now a process by which your code can find its way into the Alfresco code base (Community, and even, potentially, Enterprise). It’s called the Alfresco Community Committer Program (ACCP). The ACCP is a motley crew of volunteers from Alfresco customers and partners around the world. Although not a requirement for membership, I think most of us have developed at least one open source add-on for Alfresco. Our goal is to help community-developed code find its way into the product. Does this mean Alfresco is now as open as “true” open source community projects like Apache and Drupal? No, and honestly, I’m not sure it will ever get there. But Alfresco’s support of the ACCP process is a start. Here’s how the process works.

First step: Nomination to the ACCP Incubator

Today, developers in the community create add-ons, utilities, extensions, language packs and all kinds of software built to work with Alfresco. Some of these might make great additions to the Alfresco product. At a high-level, what the ACCP seeks to do is to act as an on-ramp or incubator for that subset of projects. We want you, real world Alfresco developers and end-users, to nominate community-developed extensions that you find useful and that you would eventually like to see as part of the Alfresco product. The ACCP then reviews these nominations and votes for their inclusion into the incubator. The project’s developers can then decide to leave their code where it is (Google Code, Sourceforge, Alfresco forge, etc.) or they may choose to migrate to the Alfresco-hosted ACCP incubator subversion repository.

Projects accepted to the incubator so far include:

As a side note, it’s great that there are so many community-developed add-ons for Alfresco. But the lack of a central index makes it hard to see what’s available. As a related effort, Nancy Garrity is working on something that would provide a central index, support ratings, etc.

Second step: Community code line

Once a project has been in the incubator for a while, the ACCP may recommend its inclusion as part of Alfresco Community making it much easier for Alfresco Community users to leverage these add-ons. The exact nature of how these will be made available is still being worked out. You could imagine a “community-extensions” directory under the Alfresco Community subversion root or something similar. For certain types of contributions, maybe the installer could even provide an optional “install community extensions” step. Again, although we have recently voted some projects into the ACCP incubator, none have yet to reach Community so the details of exactly how those will be incorporated into the Community code base are still being worked out.

Third step: Enterprise code line

The ACCP may then recommend Enterprise adoption. This step is subject to Alfresco Engineering approval, which may be a significant hurdle for some, but if it happens, the entire Alfresco customer base gets the benefit of Alfresco’s ongoing support of the community-developed code. Note that the Enterprise approval step is the only one where Alfresco employees have a say about how an ACCP project is handled–per our charter, Alfresco employees cannot be voting members of the committee.

How you can get involved

First, and foremost, you can nominate an open source Alfresco add-on/extension/customization project. If you want to take an active role on the committee or know someone who would be a good addition, there are spots available. So, another way to help out would be to serve on the committee or nominate someone who should. The committee meets regularly to review and vote on project and committee member nominations. All you have to do is get in touch with me or one of the other regular members of the committee. You’ll find the list on the Alfresco Community Contributor Program wiki page.

We’ll be doing a webinar on July 27th to talk about this more and answer questions. Check out the Alfresco events page to register.

Metaversant is up-and-running

First off, thanks so much to my readers, clients, colleagues, and other friends in the community who have provided a wealth of support in terms of well-wishes and congratulations on the forming of my new company, Metaversant. Several people have asked how things are going so here’s a brief update…

Metaversant is up-and-running and I’m as busy as can be. I finally got a web site up, a logo designed, and business cards aren’t far behind. I’ve even got people to give them to which is an important pre-requisite to actually having business cards.

I’m currently billing on an Alfresco Share customization project. I can’t tell you who or exactly what but the pattern will be familiar: A company needs to manage digital assets. Some come from internal sources, some come from external sources, and all need metadata and security applied. The front-end communicates with the Alfresco repository via RESTful Web Script calls while back-office content providers and application administrators use Alfresco Share (customized here and there) to upload assets, set metadata, and manage the business process. It’s a pretty classic pattern and other than extraordinarily tight timeline pressure, it’s going well.

Beyond technical execution I’ve also conducted some Alfresco training for a pharma client in New England. It was just a quick engagement but it was fun to help a team that had been doing some playing with Alfresco on their own discover the capabilities of the platform and how they could be applied to their business problems. I also love to see the expression on people’s faces when it hits them: They don’t need Documentum for all things DM any more.

I had a great trip to New York City for the Alfresco Community Meet-up. I don’t know what the official ratio was but the customers seemed to heavily outnumber the partners as this particular meet-up which is good for everyone, I think. I caught up with a lot of old friends and met some new ones. I was particularly excited to come across someone who had some plans to leverage cmislib, my client-side CMIS API for Python, a project I’ve sorely neglected this Spring with all of the startup stuff going on. All of this Java code I’ve been writing has me missing Python–I will find time for cmislib soon.

Being fully billable while still having to find new business and take care of everything else about the business is tough, as I knew it would be. I’m loving every minute of it though. One thing I didn’t expect is the helpfulness of friends, former colleagues, and even strangers who have started their own businesses. The entrepreneur community is not unlike the open source community. Everyone loves to talk shop and trade tips and advice. It’s really cool.

My exercise regimen (a generous description) has suffered and it looks like my blogging velocity is on a similar trend. I feel like I’m getting into some regular rhythms though so maybe I can get things back in balance shortly. I’ve got all kinds of things that I need to write about: Alfresco 3.3 Enterprise is out, Alfresco hired the jBPM guys (I totally called it!), and I don’t think I’ve written anything at all about the Alfresco Community Committer Program. It’s going to be a busy Summer.

Review: Professional Alfresco, Wrox

I recently finished reading Professional Alfresco, the new Alfresco book written by some of the Alfresco engineers and John Newton, Alfresco’s CTO. Before I share my thoughts, a few disclaimers: First, I wrote my own book on Alfresco called the Alfresco Developer Guide (Packt, 2008). Second, Wrox provided the book to me free of charge in the hopes that I’d write something about it here. Third, I have a strategic relationship with Alfresco, part of which includes bringing each other business.

Okay, with that out of the way, let’s get to it. Professional Alfresco is a new book written about Alfresco 3.2 by a team of authors with a unique insider perspective: all are Alfresco employees. It’s not an end-user focused book nor is it strictly for developers–it’s actually aimed at several different audiences:

  • Part 1, “Getting to Know Alfresco”, is aimed at IT managers or other folks who might be evaluating Alfresco. It covers the business benefits of Alfresco and provides a high-level overview of the platform.
  • Part 2, “Getting Technical”, looks at the platform’s components and services at a closer level. These chapters are directed at Technical Architects or anyone who’s trying to figure out the technical capabilities of Alfresco.
  • Parts 3 & 4 are aimed squarely at Developers. More specifically, these 6 chapters cover Web Scripts (primarily JavaScript, but a Java example is given) and Alfresco Share. The last 3 chapters of Part 4 provide a step-by-step example of building a Knowledgebase application by customizing Alfresco Share, including a few (brief) pages on the new Form Service. A lot of people are working on Share customization projects these days so many will find this a welcome set of material.

While the majority of the technical how-to in the book is focused on Web Scripts and Share, I particularly liked the chapter on Advanced Workflows. It did a good job of explaining what you can do with the jBPM engine without getting too far into the weeds. The section on the Authentication subsystem, LDAP config, and chaining was also very good, particularly as the subsystem setup is a fairly recent development that not everyone is familiar with.

While I did find a lot to like about the book, there were a few things to pick on. First, if you read the book front-to-back, you’ll notice a significant amount of repetition. I suppose that could be a good thing when one of the three audiences the book is written for picks up the book and goes directly to their area of interest. I wonder, though, if some of it was due to having so many authors collaborating on the writing project. The repetition left me feeling like I was really slogging through the material rather than cutting to the chase.

I thought the content modeling chapter was thorough, but I had to wonder why the author chose to step us through the modelSchema.xsd file instead of providing example content model XML. It’s good to know the content model schema is there if I need it, but I think examples of what I can do with the XML are far more illustrative than walking through the schema.

The form service and Share/Surf aren’t covered in nearly enough detail. Other aspects of the platform simply aren’t addressed at all. I think some of that may be because of timing. The form service, for example, continues to evolve with version 3.3, and when you undertake a project this big, you have to draw a line somewhere. Plus, the focus on web scripts and Share is aligned with where Alfresco is focused right now.

The organization of the book is good. It follows a logical progression through the platform. And I like that the end-to-end Knowledgebase example is placed at the end as a sort of capstone applying the concepts learned earlier in the book. If you’re looking for a tutorial-style book though, you may be frustrated by the amount of theory up-front. It’s just not that kind of book. One side note on organization, Chapter 17 is a bit of an odd duck. It’s got interesting content–the chapter discusses various patterns of Alfresco implementation and integration with other systems. I just thought it was weird that it was at the end of the book instead of in one of the first two parts. Not a huge deal, and I’m glad they included it, even if its placement makes it seem like an after-thought.

Overall, Professional Alfresco is a good book appropriate to several different types of readers. Even though there were several authors that wrote it, other than the repetition issue I noted, I didn’t feel like the transitions between authors were very noticeable–the editors did a great job stitching everything together and making it seem like one voice.

The bottom line is that if you are evaluating Alfresco and are trying to understand the architecture of the platform, or if you are a developer focused on web scripts and Share, you’ll find this book to be a valuable resource.