Category: Content Management

Enterprise Content Management (ECM), Web Content Management (WCM), Document Management (DM). Whatever you call it this category covers market happenings and lessons learned.

June 8, 2009

Alfresco 3.1 clustering easier with JGroups

Optaros has worked on some of the largest and most complex Alfresco implementations anywhere. Projects where multi-node read-write clusters are required have been particularly challenging. So when Alfresco announced clustering improvements in 3.1 my interest was piqued.

I decided I’d do a simple test: Get a two-node read-write Alfresco 3.1 cluster running using a shared MySQL database and a shared file store (as opposed to a replicated database and a replicated file store). The process is mostly documented here but I thought I’d capture the steps I went through in case someone finds them helpful.

Prepare the virtual machines

If you already have virtual or physical machines ready to go, go on to “Setup the content store & database”.

I already had an Ubuntu server virtual machine image with everything I needed for the test. I upgraded it to Alfresco 3.1, cleared out the repository, and verified that everything was working okay. In order to share my data directory via NFS I did need to use apt-get to install nfs-kernel-server, nfs-common, and portmap, but that’s no big deal.

Once I had the first image all set it was time to create a second. I’m using Sun’s VirtualBox for virtualization. It doesn’t have a “clone” command in the UI and you can’t simply do a file copy of the VDI file. Instead, you have to use VBoxManage on the command line. The form of the command that uses the source VDI file name and target file name didn’t work, but using the source VDI UUID did:

BoxManage clonevdi 19a7646e-d5cb-4e01-90fd-2bcd556dc1d5 "Ubuntu Test Server Clone.vdi"

It was weird that I had to use the source UUID instead of the file name, but I got what I wanted.

Setup the network

I used VirtualBox “host only” networking for ease of setup. This allowed my host machine to see the images and the images to see each other.

My server image was originally set up to use DHCP. That appeared to be giving Alfresco and JGroups trouble so I converted the images to use static IP addresses, unique host names, and updated hosts files (I didn’t want to set up DNS). That left me with three machines (one host and two virtual machines called node1 and node2) that could ping each other by name.

Setup the content store & database

At this point I’ve got two identical Alfresco servers, but each have their own database and data store. For my test, they needed to point to the same database. They also needed to share the content store but have their own local Lucene index.

For this test I decided to use the database and file system on node1 for both nodes. In real life, that wouldn’t be a good setup because losing node1 would bring down the whole cluster. For a shared db/file system setup, you’d want separate nodes, each clustered, for the db and file system.

My Alfresco content store is in “/srv”. I wanted to use NFS to share the content store with the other nodes in my cluster, so I edited /etc/exports to add a new entry for the “/srv” directory. I used an IP address range here but I could have used explicit host names.

/srv 192.168.56.0/25(rw,no_root_squash,async)

You have to restart the nfs-kernel to make that change take effect:

/etc/init.d/nfs-kernel-server restart

Then, I split out the content and index stores into three directories:

/srv/alfresco-3.1-enterprise /srv/alfresco-3.1-enterprise-local-index /srv/alfresco-3.1-enterprise-local-index-backup

And updated custom-repository.properties accordingly:

dir.root=/srv/alfresco-3.1-enterprise dir.indexes=/srv/alfresco-3.1-enterprise-local-index dir.indexes.backup=/srv/alfresco-3.1-enterprise-local-index-backup

The second node will access the database remotely, so MySQL needed to know about that:

grant all on alfresco31e.* to 'alfresco31e'@'192.168.56.4' identified by 'alfresco31e' with grant option;

Later it seemed that node1 was accessing MySQL via its static IP address rather than localhost as it used to. Rather than figure out why or where that’s config’d, I just ran the same command as the above for node1’s static IP.

With node1 all set, it was time to give node2 some attention…

My original plan was to NFS mount the node1 data directory as something like “/srv/alfresco-labs-3d-shared” because using the same directory name I would have used on a single node seemed confusing. As it turned out, I think Alfresco must keep track of that data directory name because it complained that my “dir.root” was set incorrectly. So I wound up using the same directory names that I used on node1 and making the same update to custom-repository.properties:

dir.root=/srv/alfresco-3.1-enterprise dir.indexes=/srv/alfresco-3.1-enterprise-local-index dir.indexes.backup=/srv/alfresco-3.1-enterprise-local-index-backup

Then I mounted the data directory:

mount 192.168.56.3:/srv/alfresco-3.1-enterprise /srv/alfresco-3.1-enterprise

I didn’t do it, but it would be smart to update /etc/fstab so that the data directory would be automatically mounted on server startup.

With that the data directories are all set. Telling node2 to use the database on node1 instead of localhost was a simple custom-repository.properties change:

db.url=jdbc:mysql://node1.alfresco.jpotts.com/alfresco31e

Now node1 and node2 are pointing to the same content store and database, and each have their own Lucene index. The last step was to configure the cluster.

Configure the cluster

Configuring the cluster involved enabling the sample ehcluster-config.xml and making a few small changes to custom-repository.properties.

To enable the ehcluster-config, I copied the ehcluster-config.xml.sample file that came with the sample extensions to ehcluster-config.xml to my extensions directory. No other changes were needed in this particular case.

In custom-repository.properties, you have to assign a cluster name to activate the cluster. The index recovery mode needs to be set to AUTO so the indexes stay in sync:

alfresco.cluster.name=testcluster index.recovery.mode=AUTO

In Alfresco 3.1, Alfresco uses JGroups to discover and coordinate cluster members. It has configurable protocols it uses for cluster member communication. The default is set to UDP but I couldn’t get that to work, so I changed it to TCP. I also found that I had to list the hosts in my cluster in order for the two nodes to find each other:

alfresco.jgroups.defaultProtocol=TCP alfresco.tcp.initial_hosts=node1.alfresco.jpotts.com[7800],node2.alfresco.jpotts.com[7800]

As you can see, most of the work was really about networking and data setup. The cluster configuration itself is actually pretty minimal.

Test the cluster

Before starting Tomcat on the two nodes, I enabled a log4j logger so I could see nodes join and leave the cluster:

log4j.logger.org.alfresco.enterprise.repo.cache.jgroups=INFO

After starting up Tomcat, I eventually saw this in catalina.out:

06:24:52,043 INFO [repo.jgroups.AlfrescoJGroupsChannelFactory] Created JChannelFactory: Cluster Name: testcluster Stack Mapping: {DEFAULT=TCP} Configuration: file:/opt/apache/apache-tomcat-5.5.27/webapps/alfresco/WEB-INF/classes/alfresco/jgroups-default.xml

——————————————————-
GMS: address is 192.168.56.3:7800
——————————————————-

When the second node joined the cluster, the first node knew about it:

06:26:21,241 INFO [cache.jgroups.JGroupsKeepAliveHeartbeatReceiver] New cluster view with additional members: Last View: null New View: [192.168.56.3:7800|1] [192.168.56.3:7800, 192.168.56.4:7800]

Once the nodes could see each other it was time to test it out from an end-user perspective. Obviously, in production you’ll have a load-balancer in front of these two nodes. For testing the cluster, though, you want to be able to hit each node specifically. I used two different browsers on the host machine logging in as two different users. There are some short test scenarios on the Alfresco wiki. In addition to those, you might want to:

Create, delete, and update content while a second node is shut down. Start the second node and see if you can navigate to, search for, and read the properties of content as you would expect. Note that it may take a few seconds for the cache and Lucene index to update.
Check out content in one browser and verify that it is checked out on the other.
Simultaneously edit content properties.
Open the edit properties page in one browser and delete the object in another.

That’s it
In a real-world production environment you often have numerous networking issues to deal with that makes this more of a headache, but hopefully this gives you an idea of the basic steps involved, and shows you how to get familiar with it by setting up your own test cluster using virtual machines.

June 1, 2009

Keeping your Alfresco web scripts DRY

One of my teammates thoroughly drenched our under-the-desk development server with a giant cup of coffee once. Somehow, disaster was avoided, although the client’s carpet had a nice coffee-stain outline of the server’s footprint long after the app rolled out which afforded the rest of us endless opportunities to mercilessly haze the clumsy coder.

Keeping your Alfresco web scripts DRY is actually not about making your developers use spill-proof travel mugs, or better yet, virtual machines. DRY is a coding philosophy or principle that stands for Don’t Repeat Yourself. There are all kinds of resources available that go into more detail, and the principle is more broad than simply avoiding duplicate code, but that’s what I want to focus on here.

There are three techniques you should be using to avoid repeating yourself when writing web scripts: web script configuration, JavaScript libraries accessed via import, and FreeMarker macros accessed via import.

Web Script Configuration

Added in 3.0, a web script configuration file is an XML file that contains arbitrary settings for your web script. It’s accessible from both your controller and your view. Building on the hello world web script from the Alfresco Developer Guide, you could add a configuration script named “helloworld.get.config.xml” that contained:

<properties> <title>Hello World</title> </properties>

You could then access the “title” element from a JavaScript controller using the built-in E4X library:

var s = new XML(config.script); logger.log(s.title);

And, you could also grab the title from the FreeMarker view:

<title>${config.script["properties"]["title"]}</title>

The web script configuration lets you separate your configuration from your controller logic. And because you can get to it from both the controller and the view, you don’t have to stick configuration info into your model.

This example showed a script-specific configuration, but global configuration is also possible. See the Alfresco Wiki Page on Web Scripts for more details.

JavaScript Import

If your web script controllers are written in JavaScript, at some point you will find yourself writing JavaScript functions that you’d like to share across multiple web scripts. A common example is logic that builds a query string, executes the query, and returns the results. You don’t want to repeat the code that does that across multiple controllers. That’s what the import statement is for.

The syntax is easy:

<import resource="classpath:alfresco/extension/scripts/status.js">

This needs to be the first line of your JavaScript controller file. This example shows a JavaScript library being imported from the classpath, but you can also import from the repository by name or by node reference. See the Alfresco JavaScript API Wiki Page for more details.

FreeMarker Import

Of the three this is the one I see ignored most often. Let’s take the Optaros-developed microblogging component for Share. Its basic data entity is called a “Status” object. So web scripts on the repository tier return JSON that might have a single Status object or a list of Status objects. That’s two different web scripts and two different views, but the difference between a list of objects and a single object is really just the list-related wrapper–in both cases, the individual Status object JSON is identical. I’ve seen people simply copy-and-paste the FreeMarker from the single-object template to the list template and then just wrap that with the list markup. That’s bad. If you ever change how a Status object is structured, you’ve got to change it in (at least) two places. (Or it’ll be someone that comes after you that has to do it which makes it even worse. If you aren’t following the analogy, your redundant code is the coffee stain).

Instead of duplicating the logic, use a FreeMarker import and define a macro that formats the object. Then you can call the macro any time you need your object formatted. Here’s how it works for Status.

A FreeMarker file called “status.lib.ftl” contains the macros that format Status objects. It lives with the rest of the web script files and looks like this:

<#assign datetimeformat="EEE, dd MMM yyyy HH:mm:ss zzz"> <#-- Renders a status node as a JSON object --> <#macro statusJSON status> <#escape x as jsonUtils.encodeJSONString(x)> { "siteId" : "${status.properties["optStatus:siteId"]!''}", "user" : "${status.properties["optStatus:user"]!''}", "message" : "${status.properties["optStatus:message"]!''}", "prefix" : "${status.properties["optStatus:prefix"]!''}", "mood" : "${status.properties["optStatus:mood"]!''}", "complete" : ${(status.properties["optStatus:complete"]!'false')?string}, "created" : "${status.properties["cm:created"]?string(datetimeformat)}", "modified" : "${status.properties["cm:modified"]?string(datetimeformat)}" } </#escape> </#macro> <#-- Renders a status node as HTML --> <#macro statusHTML status> SiteID: ${status.properties["optStatus:siteId"]!''} User: ${status.properties["optStatus:user"]!''} Message: ${status.properties["optStatus:message"]!''} Prefix: ${status.properties["optStatus:prefix"]!''} Mood: ${status.properties["optStatus:mood"]!''} Complete: ${(status.properties["optStatus:complete"]!'false')?string} Created: ${status.properties["cm:created"]?string(datetimeformat)} Modified: ${status.properties["cm:modified"]?string(datetimeformat)} </#macro>

There’s one macro for JSON and another for HTML. It still bothers me that the same basic structure is repeated, but at least they are both in the same file and it feels better knowing that this is the one and only place where the data structure for a Status object is defined.

The FreeMarker that returns Status objects as JSON resides in status.get.json.ftl and looks like this:

<#import "status.lib.ftl" as statusLib/> { "items" : [ <#list results as result> <@statusLib.statusJSON status=result /> <#if result_has_next>,</#if> </#list> ] }

You can see the import statement followed by the JSON that sets up the list, and then the call to the FreeMarker macro to format each Status object in the list.

When someone posts a new Status, the new Status object is returned. Rather than repeat the JSON structure for a Status object, the status.post.json.ftl view simply calls the same macro that got called from the GET:

<#import "status.lib.ftl" as statusLib/> { "status" : <@statusLib.statusJSON status=result /> }

Now if the data structure for a Status object ever needs to change, it only has to be changed in one place.

Don’t Repeat Yourself

Take a look at your web scripts. Eliminate your duplicate code. And keep a lid on your mocha frappuccino.

May 11, 2009

Save 15% on Alfresco Developer Guide

Forgive this temporary transgression into a blatant sales pitch, but I’m trying to save you some money. Packt Publishing is offering ECM Architect readers 15% off my book, the Alfresco Developer Guide. To take advantage of the discount:

Visit the Alfresco Developer Guide Book Page
Click the “ADD TO CART” button to add the book to your shopping cart
Enter AlfrescoDG-3117 in the “Promotional Code” field and click the “Update Button”. The discounted price should now be reflected in your order.

After your book arrives, if you read it and decide it was the best Packt book published in 2008, you should take a minute to vote for the Alfresco Developer Guide in Packt’s Author of the Year awards. Voting ends May 25.

May 7, 2009

Highlights from the Alfresco Meetup in Chicago

Other than the drenching I received between the blue line and the hotel (both coming and going) I thoroughly enjoyed my time in Chicago with the Alfresco crew last week. I ran into some Optaros customers, past colleagues, blog and book readers, and even some consultants from another firm who told me they came across some code I wrote for a client many years ago that they now support. (It was a custom Documentum WDK app for a certain carbonated beverage company that was actually pretty cool at the time, but I’d love to re-do it on an open stack).

The day was well-spent with Matt Asay kicking off with a state of the union, of sorts, followed by a couple of sessions from Michael Uzquiano, Alfresco product manager. I thought Uzi’s discourse on the macro- and micro-economic climate was interesting, but a bit arduous. Yes, we’re in some tough times and that’s good for open source. Check. Next slide. My favorite part of his talks were on the Alfresco roadmap (see Highlights, below).

Ed Wentworth, from Orbitz, gave a cool talk on how they’re using Alfresco. Ed’s team is implementing what I think is a best practice pattern: Services-Oriented Content Management. Orbitz has a lot of dynamic sites built on content chunks, not complete, fully-baked pages. So they leverage Alfresco’s strengths as a core repository, and then expose the repository as a service to the front-end (and to other services). They use the web client only for atypical or administrative interactions, and custom Share components and Surf apps for content authoring.

Orbitz also leverages an “Enterprise Content Bus” comprised of a ServiceMix ESB, BPM, and a rules engine (more info on ECB here and here). The ECB is primarily focused on orchestrating content-centric processes like content routing and delivery. Not everyone is going to need an ECB but if you’ve got anything that resembles a content production pipeline, it can really make your processes much more flexible with less code.

TSG did a cool demo on an open source solution for annotating attachments (screencast). It looks like it is still in dev mode but it’s got a great start and hopefully they’ll continue to enhance this and make it available to the community.

I did the Alfresco-Drupal CMIS integration demo (screencast) which seemed to go over well. Most people in the room acknowledged that they have more than one CMS, so there was interest something that could enable multiple web sites talking to one or more back-end CMS repositories using a standard API (CMIS).

At the end of the day a sort of “Everything you wanted to know about Alfresco but were afraid to ask” roundtable formed which was cool. I tried to demo the Alfresco-Django integration we’ve been working on. It worked on the plane and works now, but the demo gods chose that very day to expire my Alfresco license, and although hard to miss, I didn’t notice that as the root of my problem until after the meeting. It’s all good now so if you’ll be at the San Francisco event next week and want to see Django and Alfresco working together, let me know.

Highlights of the Alfresco Talks

Alfresco now has 1000 paying customers. Matt says it took Documentum many years to get to that point. With the price disparity, obviously, Alfresco needs many more customers than Documentum to thrive, but still, that’s quite an achievement.
There’s a Groovy renderer coming to Surf after 3.2.
Speaking of Surf, I’m not sure when it is coming, but Spring WebFlow is now being integrated into Surf which is much-needed.
If you’re looking for CMIS resources, CMISDev.org is a new site aimed at pulling a lot of that stuff together.
3.2 Labs targeted for June
3.2 Enterprise targeted for September 2009
3.3 Labs/Enterprise targeted for 1Q 2010
4.0 Labs/Enterprise targeted for 2nd half of 2010

The 3.2 release is shaping up to be pretty major. Here are some of the things planned for 3.2 that caught my ear:

Surf Mobile. This is an addition to the Surf framework that makes it easier to develop Surf apps that work well on the iPhone, although audience members said the demo site was also usable on their Blackberries.

New Form Service. Today you’ve got “property sheets” in the Explorer client which are driven by the content model and the web-client configuration XML, web forms in WCM which are defined using XSD, and custom forms in Surf which aren’t defined using any kind of framework at all. The new form service will provide a single form definition/presentation framework across the entire Alfresco platform. It’s a Good Thing.

IMAP integration. This allows you to use your Outlook client (and Thunderbird and whatever else you use that knows IMAP) to get things into and out of the repository. One use for this is to archive email by dragging-and-dropping mail into mail folders which are backed by Alfresco. Another is to easily snag content from the repo without leaving the mail client. I don’t know if it will be in there, but a great feature would be to allow me to drag-and-drop a piece of Alfresco content onto my message which would then get translated into the download URL rather than a file attachment.

Clustering. Several attendees fell to the ground and wept, overcome with emotion, when Uzi said 3.2 would support real clustering of both the DM and WCM repositories. Clustering is something that has bedeviled complex Alfresco implementations. I’ll maintain my composure until I see this actually working.

Index refactoring. Ever do a search that fails to return a piece of content you know is in the repo? Or, the other way, when you see something that you know was deleted? Do you mark the time it takes to reindex your repository (to fix index inconsistencies) by how many times the moon rises and sets? I have no idea whether any of these issues will get better in 3.2 but hearing “index” and “refactor” in the same sentence makes me hopeful.

All-in-all I’d say it was one of the better Alfresco-led meetups. I’m looking forward to reconnecting with many of you next week in San Francisco. I’ll see you there.

May 6, 2009

OpenText administers Vignette mercy killing

OpenText issued a press release today saying it would buy Vignette for $12.70/share. Vignette has been struggling lately losing customers to complex and costly upgrades and laying off employees (CMSWire post) so I see this as a mercy killing of sorts. Naturally, OpenText views the acquisition more optimistically, saying it will “extend the breadth of our offerings and further Open Text’s positioning as the leading independent ECM vendor in the marketplace”.

I’ll be curious to see exactly what OpenText plans to do with the technology. I don’t follow them closely, so I’m not sure what they were after. Vignette’s Portal? Their collaboration stuff? I have no idea. I’m sure we’ll see some analysis today in the blogosphere.

I’ll bet we’ll see an increase in Vignette customer churn as is often the case in these acquisitions so this should be a good thing in Open Source ECM Land.

April 28, 2009

Drupal, Django, and Alfresco in Chicago

I’ll be in Chicago tomorrow for the Alfresco Meetup. I’ll be speaking during the Barcamp on Alfresco and Drupal integration with CMIS (module, screencast). I’ll also have the Alfresco-Django integration running on my laptop. I may not have time to show Alfresco-Django during my slot, but I’ll be happy to stick around and do informal demos and talk about either integration if you’re interested because I’d like your feedback on it.

April 27, 2009

Apache Chemistry: A Proposed Reference CMIS Implementation

Kas Thomas, over at CMSWatch, says a new Apache project is in the works. Chemistry is a reference CMIS implementation that has committers from Day, Alfresco, and Nuxeo. The goal is to provide a vendor-neutral reference implementation and compatibility tests around the proposed CMIS specification.

It looks like this will be a standard set of REST APIs and SOAP services that implement the CMIS spec and hook into a back-end repository. Not surprisingly, the first back-end the Chemistry team plans to support is Apache Jackrabbit, Apache’s reference implementation of the JCR API.

April 22, 2009

Customizing Alfresco Share: Five things to watch out for

Alfresco Share is a team-centric collaboration tool. It’s really cool and our clients have been reacting very positively to it. When customers see the AJAXalicious UI, a common reaction is to want to take the next 5 projects on their list and “do them on Share”.

In cases where the functional requirements closely resemble team collaboration, that can be a great choice. In others, it’s an abuse of the tool. Just like a lot of things in software and life, just because you can doesn’t mean you should. (Remind me to tell you the story about building a tennis court reservation system in Lotus Notes some time).

Anyway, let’s assume you’ve got a set of requirements that reasonably resembles team-based collaboration, but some of Share’s tools (wiki, blog, document library, calendar, and recently, bookmarks) don’t work exactly the way you need them to. I’m not talking about adding new, self-contained custom components. This is specifically about customizing the out-of-the-box Share components. With that in mind, here are five areas where even simple Share customization efforts could take longer than you might think.

Custom Metadata

In its current incarnation, if you have custom metadata you want to display when looking at document detail, that’s code you have to write. Alfresco’s Mike Hatfield said, via Twitter, that the 3.2 Forms Service will make this better, so that’s good. If your Share sites contain simple documents that use only out-of-the-box metadata, this won’t be an issue for you.

Custom Workflows

Currently, in Share, there are a couple of places where the jBPM workflow engine is used. First, when you invite someone to a site, that kicks off a workflow. Second, you can “assign” an advanced workflow to a document in the document library.

The first issue is that the workflow submission dialog includes only the two out-of-the-box, document-centric workflows, “Ad hoc” and “Review and Approve”. It won’t show any custom workflows you’ve deployed. The workflow modal is not inspecting the workflow UI configuration like the web client does, so even if you got your workflows to show up in that list, the form wouldn’t have the custom workflow metadata you need to launch your custom workflow properly.

When you log in to Share, you’ll see a “My Tasks” dashlet. This gives you hope that maybe that dashlet could manage tasks for any workflow. Unfortunately, it only works with the “invite user” workflow and the two document-centric, OOTB workflows.

Long story short, Share isn’t set up to work with custom workflows out-of-the-box. If you’ve got custom workflows that need to work in the Share context, you’ll need to write your own dialogs for launching the workflow and your own component for managing workflow tasks.

YUI Bubbling Events

Share makes heavy use of YUI Bubbling Events. This results in a great end-user experience–the Share components communicate with each other and refresh themselves via AJAX without page refreshes. But it does mean there’s a bit of a learning curve when following the same pattern to implement your customizations if your team has never worked with the bubbling library before. It can get kind of thick in places.

Incidentally, all of the YUI stuff is part of Share, not Surf, which is the framework used to build Share. If you’re building your own Surf app you’ll need to grab the YUI libraries (or any other libraries you want to use) yourself. It’s the same for the Flash pieces (multi-document upload, document preview). It keeps Surf light, but if you want to incorporate that kind of functionality into your Surf app, some assembly will be required.

Code Sprawl

In Share, every module has as least one JavaScript file. For example, the Document Library has six different JavaScript files weighing in at about 136KB. Sometimes what should be a simple change (adding a button, creating a new modal) requires changes to every one of those files. This combined with grokking the bubbling events translates into potentially lengthy development cycles for stuff that you wish would be quick.

Theming

The main CSS file for Share is in the themes directory. But changing that will only affect the global dashboard header and the site dashboard header. If you want to change the theme for everything in Share, including individual tools, you have to change each tool’s CSS file. Those CSS files live in the “modules” directory. It would be nice if it were easier to implement site-wide or global themes.

Adding your own Components/Tools

The impact of these issues are lessened somewhat if you are adding your own components or tools instead of customizing what’s already there. It’s easy to write your own dashlets that show up on the global dashboard or the site dashboard. And with a little work, you can write dashlets that talk to each other using YUI Bubbling Events, just like the OOTB dashlets. The area for improvement is in skinning, configuring, and extending the out-of-the-box tools.

Share Your Thoughts on Share

There’s no doubt that Share is a cool application for team-based collaboration. I didn’t expect it to be configurable to the Nth degree right away, and we may be pushing the limits of its intended use. I’m curious to hear from others who have been tweaking the app: Have you worked through these issues? Are there other examples of specific extension points Alfresco could address to make your lives easier?

April 20, 2009

Alfresco User Interface: What are my options?

People often need to build a custom user interface on top of the Alfresco repository and I see a lot of people asking general questions about how to do it. There are lots of options to consider. Here are four options for creating a user interface on top of Alfresco, at a high level:

Option 1: Use your favorite programming language and/or framework to talk to Alfresco via REST or Web Services. PHP? Python? Java? Flex? Whatever, it’s up to you. The REST API is nice because if you can’t find a URL that does what you need it to out-of-the-box, you can always roll-your-own with the web script framework. This option offers the most flexibility and creative freedom, but of course you might end up building constructs or components that you may have gotten “for free” from a higher-level framework. Optaros‘ streamlined web client, DoCASU, built on Ext-JS, is one freely-available example of a custom UI on top of Alfresco but there are others.

Option 2: Use Alfresco’s Surf framework. Alfresco’s Surf framework is just that–it’s a framework. Don’t confuse it with Alfresco Share which is a team-centric collaboration client built on top of Surf. And, don’t assume that just because a piece of functionality is in Share it is available to you in the lower-level Surf framework. You may have to do some extra work to get some of the cool stuff in Share to work in your pure Surf app. Also realize that Surf is brand new and still maturing. You’ll be quickly disappointed if you hold it to the same standard as a more widely-used, well-established framework like Seam or Django. Surf is a good option for quick, Alfresco-centric solutions, especially if you think you might want to leverage Alfresco’s browser-based site assembly tool, Web Studio, at some point in the future. (See Do-it-yourself Alfresco Surf Code Camp).

Option 3: Customize the Alfresco “Explorer” web client. There are varying degrees to which you can customize the web client. On one end of the spectrum you’ve got Freemarker “presentation templates” followed closely by XML configuration. On the other end of the spectrum you’ve got more elaborate enhancements you can make using JavaServer Faces (JSF). Customizing the Alfresco Explorer web client should only be considered if you can keep your enhancements to an absolute minimum because:

Alfresco is moving away from JSF in favor of Surf-based clients. The Explorer client will continue to be around, but I wouldn’t expect major efforts to be focused on that client going forward.
JSF-based customizations of the web client can be time-consuming and potentially complex, particularly if you are new to JSF.
For most solutions, you’ll get more customer satisfaction bang out of your coding buck by building a purpose-built, eye-catching, UI designed with your specific use cases in mind than you will by starting with the general-purpose web client and extending from there.

Option 4: Use a portal, community, or WCM platform. This includes PHP-based projects like Drupal (Drupal CMIS Screencast) or Joomla as well as Java-based projects like Liferay and JBoss Portal. This is a good option if you have requirements that match up well with the built-in (or easily added-on) capabilities of those platforms.

It’s worth talking about Java portal servers specifically. I think people are struggling a bit to find The Best Way to integrate Alfresco with a portal. Of course there probably is no single approach that will fit every situation but I think Alfresco (with help from the community) could do more to provide best practices.

Here are the options you have when integrating with a portal:

Portal Option 1: Configure Alfresco to be the replacement JSR-170 repository for the portal. This option seems like more trouble than it is worth. If all you need is what you can get out of JSR-170, you might as well use the already-integrated Jackrabbit repository that most open source portals ship with these days unless you have good reasons not to. I’m open to having my mind changed on this one, but it seems like if you want to use Alfresco and a portal, you’ve got bigger plans that are probably going to require custom portlets anyway.

Portal Option 2: Run Alfresco and the portal in the same JVM (post). This is NOT recommended if you need to scale beyond a small departmental solution and, really, I think with the de-coupling of the web script engine we should consider this one deprecated at this point.

Portal Option 3: Run the Alfresco web script engine and the portal in the same JVM. Like the previous option, this gives you the ability to write web scripts that are wrapped in a portlet but it cuts down on the size of the web app significantly and it frees up your portal to scale independently of the Alfresco repository tier. It’s a fast development cycle once you get it set up. But I haven’t seen great instructions for setting it up yet. Alfresco should document this on their wiki if they are going to support this pattern.

Portal Option 4: Write your own portlets that make services calls. This is the “cleanest” approach because it treats Alfresco like any other back-end you might want to integrate with from the portal. You write custom portlets and have them talk to Alfresco via REST or SOAP. You’ll have to decide how you want to handle authentication with Alfresco.

What about CMIS?

CMIS fits under the “Option 1: Use your favorite programming language” and “Portal Option 4: Write your own portlets” categories. You can make CMIS calls to Alfresco using both REST and SOAP from your own custom code, portlet or otherwise. The nice thing about CMIS is that you can use it to abstract the underlying repository so that (in theory) your front-end code will work with different CMIS-compliant back-ends. Just realize that CMIS isn’t a fully-ratified standard yet and although a CMIS implementation is in the Enterprise version of Alfresco, it isn’t clear to me whether or not you’d be supported if you had a problem. (The last response I saw on this specific question was a Peter Monks tweet saying, “I don’t think so”).

The CMIS standard should be approved by the end-of-the-year and if Alfresco’s past performance is an indicator of the future, they’ll be the first to market with a production-ready, fully-supported CMIS implementation based on the final spec.

Pick your poison

Those are the options as I see them. Each one has trade-offs. Some may become more or less attractive over time as languages, frameworks, and the state of the art evolve. Ultimately, you’re going to have to evaluate which one fits your situation the best. You may have a hard time making a decision, but you have to admit that having to choose from several options is a nice problem to have.

April 16, 2009

Optaros press release: InfoWorld relaunches on Drupal

Another large, well-known site has moved to Drupal. Optaros and IDG Communications issued the press release today, but the new InfoWorld site went live on the popular open source community platform last week. Congrats to the IDG and Optaros teams as well as the Drupal Community!