Category: Content Management

Enterprise Content Management (ECM), Web Content Management (WCM), Document Management (DM). Whatever you call it this category covers market happenings and lessons learned.

Getting involved with a local Alfresco community

Even though there are still two weeks to go in this year’s Alfresco Community Survey, I couldn’t help but start to review the 1200 or so responses we’ve received so far. There are some great insights and suggestions coming through, but there’s one I wanted to jump on right away: It’s clear that a significant portion of the Community would like to see more local, Alfresco-focused, non-marketing,  gatherings (aka, meetups). And I’m right there with you. I think it is extremely important that local groups of people interested in Alfresco are able to get together regularly to share tips and tricks, to network, and to have fun. In this post I want to outline my perspective on events, my plan for local meetups, and some ideas on how to get involved with a local Alfresco community.

Alfresco Community Meetups are different from other events

Alfresco drives many types of events worldwide, including presence at third-party conferences, lunch-and-learns, training, and webinars. We also do an annual developer’s conference called Alfresco DevCon. Last year DevCon was in New York and Paris. We’re starting to plan for this year’s DevCon. We’re still finalizing cities and dates and I’ll let you know when that happens.

The events I’ve listed so far are completely driven by Alfresco. But there are several groups around the world that get together and talk about Alfresco on their own. These are grassroots, locally-organized meetups. Some meet more regularly than others. Some are a handful of people getting together for an informal happy hour while others are large groups with formal agendas, name tags, and everything.

In addition to these locally-run meetups, in the past, Alfresco has conducted “Community Meetups” that were really more like mini-conferences that happened in multiple geographies. These were fun and informative events, but they can’t happen with the frequency and scale that locally-driven meetups can.

Going forward, I’d like you, the community, to drive local meetups. And I’d like to see these happening more frequently, in more parts of the globe, for technical and non-technical audiences regardless of the Alfresco product they use. I want more people to feel that sense of family that I feel when I walk into a room full of people who share the same hopes, joys, and frustrations with Alfresco.

Local Alfresco communities should be driven by the local community

In short, I don’t want Alfresco to own, control, or constrain local Alfresco communities in any way. Ideally, anywhere there are two or more people that care about Alfresco, a local meet-up would form and those people would get together fairly regularly and, hopefully, grow to include others over time.

Alfresco’s role is to foster and support these local communities. I think we can add value in the following ways:

  • Alfresco can serve as a “connector”, matching up groups of interested community members with people willing to organize the local community
  • Alfresco can supply presentation content and, in some cases, people to deliver it in-person
  • Alfresco can help promote your meetup and drive attendance
  • Alfresco can support communities with Alfresco-branded giveaways and other small incentives

What we lack is the hyper-local perspective into the topics the local community is most interested in, the ability to know all of the cool projects going on in your area, and the feet on the ground to make every meeting a success. That’s where you come in. Local community events shouldn’t be driven by Alfresco’s Marketing team–they should be driven by you, the community, and Alfresco will do everything we can to support you.

So, as part of this, I’ve been reaching out to various communities around the world. If they haven’t met in a while, I’m encouraging them to get together, even if it is an informal meet-and-greet. If it is a group that was just thinking about getting together, I’m asking them to take that first step. And, if it is a group that has been meeting a while, I’m asking what, if anything, you need from me to keep it going.

How can you get involved?

This wiki page is the master list of existing local communities we know about as well as communities that people are interested in forming. If you are participating in a local community or are interested in forming one and that’s not reflected on the list, please update the wiki page.

Take the first step

If you are lucky enough to live near an established community, sign up and attend. If there isn’t a meeting happening any time soon ask the innocent question, “Why isn’t there a meeting happening any time soon?”. Maybe you’ll be the spark that gets it going again.

If you want to organize a meetup, it’s pretty easy. Decide on a time and a place, then let everyone know about it. You can use sites like Meetup.com or Google Groups to facilitate sign-up and collaboration, but that’s not a requirement.

If there isn’t a meetup already organized near you and you’d like to find out if others are interested, go to http://www.meetup.com/Alfresco, search for your city, and add your name to the list.

Decide where to take it from here

That first meeting doesn’t have to be a big production. It isn’t much work to get together and talk about what you are doing with Alfresco. While you’re talking, you may want to:

  • Set a focus. Is the goal to network, to learn from others, or something more specific? For example, I have been talking to multiple communities about organizing Alfresco-focused hack-a-thons/code sprints that would have a goal of creating new or contributing to existing Alfresco community projects.
  • Decide how often you want to get together. Meet too often and you’ll burn out the group. Don’t meet often enough and your group will lose interest. Somewhere in the neighborhood of monthly or quarterly is probably best.
  • Decide on an agenda for future meetings (or whether to have an agenda at all). You might have an end-user focused group that discusses tips/tricks for using the product and walks through case studies. Or, you might have a more technical group that dives into the details of a different part of the platform each meeting.
  • Establish ground rules. Maybe for your group, the rules are there are no rules. Or maybe a couple of common sense ground rules would help. It depends on the focus you’ve set. For example, you might want to ban blatant sales pitches and recruiters.
  • Pick an organizer. Someone needs to be on point for reminding the group about upcoming meetings. If you’ve decided on a more formal sort of group, that person will also need to facilitate setting the agenda and find people to speak. I’d recommend rotating this responsibility every 3 to 6 months, but you can decide.

Keep me posted

If you get a meetup going I want to know about it so I can support your group in the ways I’ve outlined above. Who knows, maybe I’ll even show up in person at one of your meetings.

Take the 2011 Alfresco Community Survey

When I announced my new role as Alfresco’s Chief Community Officer I mentioned that I would be asking you, the community of end-users, developers, partners, and Alfresco employees, for your input on how to make the Alfresco Community the example for all other commercial open source companies to follow. Obviously, I’ll take feedback in any form I can get it, but what would be great is if you would take 15 minutes to complete this survey. If you complete the survey by May 31, 2011, you could win one of two $250 Amazon gift cards.

Now, I know surveys can be a beating. But the information it helps me gather will allow us to plan all kinds of great things for the Alfresco community, from events to community tools. So, please speak up and give me your opinion and I’ll promise to listen and then to push for changes that matter most to the community. I’ll also summarize and present this data back to the community. If we do this year after year, we can hopefully see some cool trends emerge as we make progress.

Will Abson’s Wonderful World of Dashlets

Back when Alfresco first launched Surf, the framework on which Alfresco Share is based, a handful of us went to Chicago to hang out in a conference room on a ship in the harbor where we did a deep dive on the framework and then came up with proposed add-ons that would leverage it. I was at Optaros at the time. Our add-on was the Alfresco Share microblogging component and we also did some Surf Code Camps. The goal, of course, was to get the word out about Surf and encourage others to develop and contribute Share customizations.

The deep dive was great and the code camps that followed were valuable and well-attended. What I think the approach missed was that you don’t need to be a Surf expert to code some simple dashlets. We were handing out “How to Fly the Space Shuttle” when we probably should have started with “Building and Launching Your First Model Rocket”.

That’s why Will Abson is my current Alfresco community hero. At this year’s Alfresco Kickoff meeting in Orlando (notes), Will showed a project he and a few others have been working on called Share Extras. Share Extras is a collection of small projects ranging from “Hello World” dashlets to custom theme, data lists, and document action examples.

For example, the list of what I’d call simple, mash-up examples includes things like:

  • Twitter Feed Dashlet – Shows a specific Twitter user’s feed.
  • Twitter Search Dashlet – Shows a Twitter feed based on a hashtag.
  • BBC Weather Dashlet – Shows weather feed from BBC.
  • Flickr Dashlets – Shows flickr photos in a slideshow.
  • Google Site News – Shows the last ten blog posts from Google News.
  • iCal Feed – Shows entries from an iCal feed.
  • Notice Dashlet – Stores/shows arbitrary text, like what you’d use for a maintenance message or an announcement.
  • Train times – Shows the National Rail train schedule.

From there, you can move on to more extensive examples. For example, rather than simply displaying data from public services, these examples start to store/retrieve data in the underlying Alfresco repository:

  • Site Tags Dashlet – Displays a tag cloud consisting of tags used in your site.
  • Site Poll Dashlet – Uses a custom data list type called Poll to configure a simple poll. Shows results in bar chart.
  • Document Geographic Details – Adds a map using the document’s geocoding metadata just below the permissions section.
  • Sample Data Lists – A simple data list example that lets you capture info on Books (author, title, ISBN).
  • Execute Script Custom Document Action – Shows an example of adding a custom action to the action list that runs server-side JavaScript against a node.

The nice thing is that (almost) every one of these extensions deploys as a self-contained JAR file. Will’s build assumes you are running the repository and the Share web apps in the same container, so it deploys the JAR to $TOMCAT_HOME/shared/classes/lib, but you can obviously tweak that if your config is different. The ability to run everything out of a JAR, including what would normally be file system based resources like CSS, client-side JavaScript, and images is a relatively new feature (3.3, I think). It’s much nicer than fooling with AMPs.

Here is a list of my five favorites from the collection:

  • Node Browser – A port of the Explorer client’s node browser to the Share UI. I like this one because it brings an extremely useful developer tool into Share, which is where most of us are spending time these days. It also shows how you can plug your own tools into Share’s admin console.
  • Red Theme – A simple custom theme example. This is on my favorites list because creating a custom theme is something that is requested often and should be easy to do. Follow this example to create your own.
  • Site Geotagged Content Dashlet – Adds a dashlet that shows a map of geotagged content contained in the document library. I can’t help it. I like maps.
  • Site Blog Dashlet – Dashlet that shows site blog posts. This is a favorite because it plugs a hole in the product. If you’re going to use the blog tool in a Share site, you’re going to want to show those posts somewhere and a dashlet makes a lot of sense.
  • Wiki Rich Content – Automatically puts a table of contents at the top of a wiki page based on the headings contained within the page. Also does a nice job with pre-formatted text. This is another example of a feature that should probably be in the core product.

The Google Code project includes screenshots for each of these projects, but it is really easy to do a checkout on the code, import the projects into Eclipse, create a build.properties file in your home directory to override the tomcat.home prop, then run “ant hotcopy-tomcat-jar” to deploy one and see it in action for yourself. I tried them all out on Alfresco 3.4d Community and they worked great. I think all but one or two will work on 3.3.

The Share Extras project includes a Sample Project with a folder structure and Ant build that you can clone and use as a starting point for your own development. If you create something cool, you should share it on Google Code and then let me know about it. Or give it to Will and he can add it to his ever-growing pile of cool Share add-on examples.

Trying out Activiti: Examples that leverage Alfresco’s new workflow engine

I’ve been playing with Activiti. It’s an open source, BPMN 2.0 compliant business process engine. The project is sponsored by Alfresco, who hired Tom Baeyens and Joram Barrez, the founders of jBPM, to create the Apache-licensed engine (take a look at the rest of Activiti’s all-star cast).

The first thing I did was head over to Activiti’s site and read through the user guide. I followed the tutorial and got a standalone instance of Activiti going with very little fuss. The concepts and terminology aren’t terribly different from jBPM, so if you’ve used jBPM, you’ll be familiar with the basics of Activiti in no time. The user guide is well-written so I urge everyone to start there.

Last week, Alfresco released a preview release of their Community product, labeled 3.4.e. This release, which I stress is only for preview purposes, was made available to let everyone get a first look at Alfresco’s integration of Activiti. If you watched the screencast showing an Alfresco workflow based on Activiti you may have thought, “Gee, that looks just like a jBPM-based workflow,” and you’re right–from a user standpoint, it is nearly identical. The difference, of course, is how the processes are described and the underlying implementation that executes the processes.

The screencast showed that the end users won’t see much of a change. That’s good, but I was anxious to find out how big a deal this transition will be from a developer’s perspective. The 3.4.e release gave me the perfect opportunity to dig in. I decided to take the examples from the Advanced Workflow chapter in the Alfresco Developer Guide (2008, Packt) and make them work with Alfresco’s embedded Activiti engine in 3.4.e. In this post, I’ll talk about how that went and I’ll give you the code so you can try it out yourself.

The code that accompanies this blog post includes the same set of four workflows implemented both in jBPM and Activiti as well as a readme that explains how to install and run everything. I’ll let you inspect that to see what the exact differences are rather than go over them here. Instead, I’ll spend the rest of the post covering the major differences in general.

Before we go any further, I guess we should have a quick terminology discussion. First, in jBPM, everything is a node. Specialized node types do different things like joins, splits, decisions, wait-states, sub-processes, and enclose tasks that get assigned to humans. In Activiti (and really, in BPMN) there are essentially events (start, stop, timer), tasks, and gateways. Of course, I’m simplifying greatly here–you should read the spec and the Activiti user guide. The important thing to note for people coming from jBPM is that in Activiti a “task” might be something a human does (“userTask”) or it could be automated (“scriptTask”, “serviceTask”, etc.). In jBPM connections between nodes are called “transitions” while in Activiti they are called “sequenceFlows”.

Designing Processes

I use Eclipse, so the first step was to get the Activiti BPMN 2.0 Designer plug-in working. Installation is well-documented on the Activiti wiki and it installs just like any other Eclipse plug-in, so it went fairly smooth. I had some sort of dependency conflict that I had to deal with, but nothing major.

All in all, designing processes in Activiti works just like it does in jBPM. The tool is different, but you’re still laying out a business process graphically, connecting steps in the workflow, and setting properties on those objects.

There are some known issues with the Designer that made creating and editing processes painful at times. I’m not going to call every one of those out in this post because this is a preview release–I expected to work through a few bumps. I will warn you of a few to hopefully save you some time:

  • You cannot save the diagram until it is syntactically correct. This means the BPMN 2.0 XML will not get generated until the diagram is correct. On a new process, when the editor complains about the diagram, you’d kind of like to just drop in to the XML source and fix what needs fixing. If that’s what you want to do, you have to open the .activiti file in the XML editor, make the change, re-open in the diagram, and then make a change and save to force the generation of the BPMN 2.0 XML.
  • You cannot change things like IDs, names, form keys, and task assignment in the BPMN 2.0 XML. You have to change these in the Activiti diagram. If you change the BPMN 2.0 XML the settings in the Activiti diagram will overwrite the BPMN XML. This doesn’t sound like a big deal until you come across the next issue.
  • There is a known problem enabling the properties for an object in the diagram: clicking an object in the diagram doesn’t refresh the properties view. I worked around it by first clicking some other tab in the properties view, then double-clicking on the object (and sometimes repeating that) until the properties view refreshed with the appropriate property set.

Again, I didn’t expect everything to be fully functional, so I am not complaining. I just want you to have your expectations properly set when you play with this on your own.

I should mention that the overall look-and-feel of the Activiti Designer seems a lot crisper and more visually appealing than the JBoss Graphical Process Designer (GPD) Eclipse plug-in. As an example, I loved the alignment helper rules. And I liked that you can bend sequence flows.

Adding Business Logic to Processes

My goal was to take four Alfresco jBPM processes and port them to Activiti. The first three are variations on Hello World. The fourth is a more real-life process that is used to review and approve whitepapers. In the book, the Publish Whitepaper workflow uses an action to set properties on the approved whitepaper. And I show how to combine a wait state with a mail action and a web script to allow third parties without direct access to Alfresco to participate in a workflow. For the initial cut at this exercise, I skipped all of that. For now, I really wanted to focus on the basics of the workflow engine. But the state idea and the web script interaction are interesting so I’ll do that later and will provide the update in a future blog post.

Challenge 1: Alfresco JavaScript in automated steps

The first problem I came to was how to handle workflow steps that have no human intervention. In jBPM those steps are implemented as nodes. Alfresco JavaScript can live inside events within the node or on transitions between nodes. Tasks assigned to users are typically enclosed in a task-node. In Activiti, tasks assigned to users are called userTasks. All of Alfresco’s sample Activiti workflows consist entirely of userTasks. But Activiti includes several node types that aren’t user tasks: a scriptTask uses JavaScript or Groovy to implement its logic and a serviceTask delegates to a Java class. My helloWorld processes consist entirely of automated steps, so a scriptTask sounded good to me. The problem was that scriptTask uses Activiti’s JavaScript implementation, not Alfresco’s JavaScript. So doing something simple like invoking the “logger” root object doesn’t work in a scriptTask.

Fine, I thought, I’ll use one of Alfresco’s listener classes to wrap my logger call and stick that listener in the scriptTask. But that didn’t work either because in the current release Alfresco’s listener classes don’t fully implement the interface necessary to run in a scriptTask.

After confirming these issues with the Activiti guys I decided I’d put my Alfresco JavaScript in listeners either on a userTask or on a sequenceFlow (we called those “transitions” in jBPM) depending on what I needed to do. Hopefully at some point we’ll be able to use scriptTask for Alfresco JavaScript because there are times when you need automated steps in your process that can deal with the Alfresco JavaScript root objects you’re used to.

Challenge 2: Processes without user tasks

As I mentioned, my overly simple Hello World examples are nothing but automated steps. I could implement those without userTasks by placing my Alfresco JavaScript on sequenceFlows. But Alfresco complained when I tried to run workflows that didn’t contain at least one user task. I didn’t debug this, and it is possible I could have worked through it, but I decided for now, the Activiti versions of my Hello World examples would all have at least one userTask.

Challenge 3: Known issue causes iBatis exceptions

In 3.4.e, there is a known issue in which user tasks will cause read-only iBatis exceptions unless you set the due date and priority. Search my examples for “ACT-765” to find the workaround.

Challenge 4: Letting a user pick between multiple output paths

Suppose you have a task in which a human must decide whether to “Approve” or “Reject”. In Alfresco jBPM, you’d simply have two transitions and you’d set the label for those transitions in a properties bundle. In Alfresco Activiti that is handled a bit differently. Instead of having two transitions leaving the task, you have a single transition to an “exclusive gateway” (called a “decision”, in polite company). The task presents the “outcome” options–in this case “Approve” and “Reject”–to the user in a dropdown, as if it were any other piece of metadata on the task. Once the user picks an outcome and completes the task, the exclusive gateway checks the outcome value and takes the appropriate sequence flow. This difference will impact your business process logic, your workflow content model, and your end user experience so it is a significant difference.

For comparison, here’s what this looks like in the Alfresco Explorer UI for jBPM (click to enlarge):

And here is what it looks like in the Alfresco Explorer UI for Activiti (click to enlarge):

So in Explorer, with jBPM, the user can just click “Approve” or “Reject” while in Activiti, the user must make a dropdown selection and then click “Next”.

Here is the same task managed through the Alfresco Share UI for jBPM:

Versus Alfresco Share for Activiti:

Similar to the Explorer differences, in Share, with jBPM, the user gets a set of buttons while with Activiti, the user makes a dropdown selection.

One open question I have about this is how to localize the transition steps for Activiti workflows if the steps are stored as constraints in the content model. On a past client project we implemented a Share-based customization to localize constraint list items but our approach won’t work in Explorer. Maybe the Activiti guys can help me out on that one.

Exposing Process to the Alfresco User Interface

And that brings us to user interface configuration. Overall, the process is exactly the same. First, you work on your process definition, then you create a workflow content model. Once the workflow content model is in place, you expose it to the user interface through the normal Alfresco user interface configuration approach. For the Explorer client that means web-client-config-custom. For the Share client that means share-config-custom. Labels, workflow titles, and workflow descriptions are localized via properties bundles.

One minor difference is that in jBPM, task names are identical to corresponding type names in your workflow content model. In Activiti, a userTask has an attribute called “activiti:formKey” that is used to map the task to the appropriate content type in the workflow content model.

Assigning Tasks to Users and Groups

The out-of-the-box workflows for both jBPM and Activiti show how to use pickers to let workflow initiators assign users and groups to workflows. My example workflows use hardcoded references rather than pickers so that you’ll have an example of both approaches. In my Hello World examples, I assign the userTask to the workflow initiator. This is done by using the “activiti:assignee” attribute on userTask, like this:

<userTask id="usertask3" name="User Task" activiti:assignee="${initiator.properties.userName}" activiti:formKey="bpm:task">

If you need to use a more complex expression there’s a longer form that uses a “humanPerformer” tag. See the User Guide.

In the Publish Whitepaper example I use pooled group assignment by using the “activiti:candidateGroups” attribute on userTask, like this:

<userTask id="usertask7" name="Operations Review" activiti:candidateGroups="GROUP_Operations" activiti:formKey="scwf:activitiOperationsReview">

Again, if you need to, there’s a longer form that uses a “potentialOwner” tag.

In my jBPM examples I use swimlanes for task assignment. I didn’t get a chance to use the equivalent in Activiti.

Deploying Processes

In standalone Activiti there are multiple options for deploying process definitions to the engine, including uploading a BAR (Business Archive) file into the running engine. I couldn’t find the equivalent of that in Alfresco’s embedded Activiti implementation or the equivalent of the jBPM deployer servlet, so for this exercise I used Spring configuration for both Activiti and jBPM processes. I hope by the time the code goes into Enterprise there will be a dynamic deployment option because that’s really helpful during development.

Workflow Console

Alfresco’s workflow console is a critical tool for anyone doing anything with advanced workflow. It has always been a puzzle to me as to why the workflow console (along with others) can only be navigated to directly using an unpublished URL. That head-scratcher still remains, but rest assured, all of your favorite console commands now work for both jBPM and Activiti workflows.

Summary

I hope this post has given you a small taste of the new Activiti engine embedded in Alfresco. I haven’t spent any time talking about the higher level benefits to Activiti. And there are many more details and features I didn’t have time to go into. My goal was to give all of you who have experience with Alfresco jBPM some start at getting your head around the new option for advanced workflow.

If you haven’t done so, grab a copy of Alfresco 3.4.e, download these examples, and play around. The zip is an Eclipse project that will deploy the workflows and associated configuration to your Alfresco and Share web applications via ant. The included readme file has step-by-step directions for running through each jBPM and Activiti example.

It is entirely possible that I’ve done something boneheaded. If so, do let me know so that all of us can benefit.

Resources

Three watershed moments in my career (Hint: One just happened)

I’ve recently made a big shift in the career department. But rather than tell you what it is right off, I want to build up to it. I think it’s kind of a cool story, so if you’ll bear with me, here are the three watershed moments of my career thus far…

Watershed moment #1: Specialization leads to consulting

In 1992, I graduated college and went to work for Texas Instruments working on mainframes. Somehow, I got exposed to Lotus Notes development. I loved it. I dove in deep, eventually leaving for a job where I could be completely focused on Notes. Notes taught me a lot about managing unstructured data and how people collaborate to get work done. I learned that, for me, interesting IT problems are those where humans and systems have to work together to get something done. And it taught me a lot about what a passionate technical community looks like. Ultimately it led to a job at a small, but up-and-coming consulting firm where I would spend the next nine years. That decision to focus on Notes development was a watershed moment.

Watershed moment #2: My blog gets me a job in open source

Fast-forward to 2001. My content management practice was making a shift. Notes was falling out of favor and many of our clients were looking at WCM and DM solutions from large proprietary vendors. We started looking at open source technologies as well, but it was a tough sell to our traditional clients who had never heard of open source, and if they had, were skeptical or even fearful. We started implementing Documentum-based solutions and did that for the next three years, but I continued to dabble in open source. A revolution seemed afoot, but I couldn’t figure out the best way to jump in.

I started blogging in 2001, stopped, then started again in 2002. My rationale was simple: Writing helped me learn. And, for virtually no added cost, I could multiply the benefit by sharing what I learned–particularly with coworkers, but if others got value out of it, that was okay too. The idea that if my writing helped enough people it might help the open source movement in some tiny way was a romantic notion, but seemed remote.

Then I came across Alfresco. In October of 2005 I wrote my first Alfresco-related blog post. It said simply, “Alfresco is an open source enterprise content management solution founded by one of the co-founders of Documentum,” and then included a lengthy excerpt from a Gilbane post on Alfresco’s release candidate. A month later I published a more detailed review of the product. After three or four years of blogging, I was starting to find my voice. Little did I know that I had also found a passion.

By 2006, my firm had been acquired and Alfresco was starting to look like it had legs. I looked back on my past Documentum projects and realized that Alfresco was a viable alternative as the underlying repository in every case. Open source had been around for years but it had been sneaking quietly in the back doors of my clients in the form of operating systems, developer libraries, databases, and tooling. Alfresco, and other commercial open source companies, were poised to crash through the front door with business-facing open source applications. I wanted in. I left my firm to join Optaros, an open source consultancy I had discovered through fellow content management blogger and then Optaros employee, Seth Gottlieb. My blog had gotten me a job working with a technology I loved. That was the second watershed moment.

Watershed moment #3: Wait for it…

My four years at Optaros gave me the opportunity to focus on Alfresco full-time. Not just implementing projects, although there were many. Just as important, I was able to fully-engage with the Alfresco community. I wrote blog posts and tutorials. I created add-ons and integrations and released those as open source projects. I wrote a book. I conducted code camps. I attended every event Alfresco ever put on and gave talks at most of those. I didn’t set out to be an evangelist, but that’s what I became. Did it benefit me, Optaros, and later, my own start-up, Metaversant? Of course it did. But, here’s the kicker: Acting in my own self-interest turned out to be a huge benefit to the greater Alfresco community. And I’m not alone. Many people all around the world are participating in the community in all kinds of ways to everyone’s mutual benefit.

Which brings us to the next watershed moment: Alfresco has hired me as their new Chief Community Officer. My mission is essentially to make the Alfresco community an example for all other commercial open source companies to follow. It’s a significant challenge, and I’m going to need your help. Alfresco may sign my check, but I work for the community. Therefore, you’ve got to tell me where we should take this thing. We have our ideas but yours are critical.

What this means

I’ll give specifics on how you can help in a future post. I expect that the specific strategies we undertake together will fall roughly into these buckets:

  • Motivating community members, regardless of skill set or relationship to Alfresco to engage more deeply in the community
  • Enabling the community with tools, resources, and product enhancements that leverage community contributions
  • Exposing the greatness already existing in the community, whether that’s in the form of contributions that have been made that people just don’t know about or shining a light on community contributors doing awesome things

And, of course I get to continue to work on my own community contributions like my work with Apache Chemistry, my Google Code projects, the blog, and new stuff I haven’t even thought of yet.

It was a tough decision to put the growth of my content management-focused consulting firm, Metaversant, on hold, but when Alfresco approached me about this opportunity, I had to take it. My career and my passion are already dovetailed. I do what I love, and for that I am very lucky. Who wouldn’t take the opportunity to make that an even tighter fit?

I am very excited about what this means for the community and the importance Alfresco places on its growth and well-being. I hope you are excited too. Actually, “hope” is the wrong word–I need you to be excited. Who’s with me? Ready to pitch in?

Improving Alfresco Share performance by using getChildren

I had a client that was seeing response time in the neighborhood of several seconds for the Alfresco Share document library and data list pages across all of his sites. The client’s Share install had just over 1,000 Share sites. The volume of the data lists in each site was insignificant. This is the story of how we resolved the issue, but note that the resolution may not be appropriate for everyone in all cases.

The Symptoms

The client was seeing slow performance of the document library and data list pages in Share. They noticed that the folder tree in the document library view responded quickly but the actual document list itself took a long time to render. This was happening for all users in all sites.

Looking for a quick resolution, even if that meant solving the symptom but not necessarily the underlying cause, we decided to see if we could optimize the repository tier web script that returns the document library contents to see if we could get it to perform a little closer to what we were seeing with the folder tree. I made a copy of the repository tier’s doclist.get web script into our project’s extension directory and started tweaking.

The Bunny Trail

First, I’ll fess up to a mistaken assumption I had: I thought that everything in Alfresco always went through Lucene. I knew that separating out full-text index searches and property searches into Lucene queries and DB queries, respectively, was on the roadmap, but I had it in my head that in 3.4, even a call to something like ScriptNode.getChildren() was ultimately a Lucene index hit. If everything is a Lucene hit, I figured, there had to be a different reason for the folder list control to perform so much better than the document library list.

So, instead of starting with what, in hindsight, would have yielded the most bang for the buck, I started tuning what turned out to be little things. For example, our app didn’t use favorites, so I removed any references to the preferences service. Our app didn’t allow users to check out documents so out went any logic that dealt with that. Goodbye, Google Docs code blocks. Adios, type and aspect checking for types and aspects our app doesn’t support. Farewell, filters. I hardcoded permissions to avoid the lookup. I set created by and modified by values to empty strings to avoid the lookup to the person object. I jettisoned anything that wasn’t crucial to simply producing a list of the folder contents. All of this did speed up the repository-tier web script, but only a little bit. I needed an order of magnitude improvement.

The Lightbulb

Next, I did what I should have done initially: Add some simple log statements to see which part of the code was taking the longest to execute. Of course, it was the query. As it turns out, it is much, much faster to ask a node for its children (which is what the treenode web script does) than it is to do a Lucene search with a PARENT clause that yields the same result set (which is what the doclist web script does). On a dev machine with a small dataset, you don’t notice the difference. But on our integration and prod servers the difference is huge.

The Fix

The Share document library page uses a YUI data table to produce the list of documents for the currently selected folder. The data table is bound to a web script that lives on the repository tier that is responsible for returning the requested data as JSON. Out-of-the-box, the repository tier web script that returns the document list calls a function called getFilterParams which is responsible for setting up a bunch of query predicates based on the document library filter the user has selected in the Share UI. The script then asks the filterParams object for the Lucene query it needs to run to return the document list. It then uses the search service to invoke the query and return the results.

My optimization was to bypass building and executing the query completely because, in our case, we don’t care about filters. All we want is the list of children in the current folder, and ScriptNode already has a function to do that called getChildren. So instead of performing a Lucene search, we ask the current “root node” for its children. We then iterate over the results and filter out a couple of content types that otherwise would have been excluded had we used the Lucene query instead of getting all children.

Oh man, that did it. The document library went from rendering in 6+ seconds to rendering in less than 1 second.

I gave the data lists web script the same treatment. In that case, our customized Share app still makes use of filters, so the “getChildren bypass” is only used when the “All” filter is selected. When any other filter is selected the original out-of-the-box Lucene query is used.

Now, again, I completely acknowledge that we may have succeeded in speeding up performance for those two cases, but failed to resolve the underlying issue, and addressing that may result in a system-wide performance boost, but it was good to get the quick fix in place and it should be easy enough to revert if and when we resolve the underlying index issue, if one exists.

Here’s a code snippet from the custom doclist.get.js controller if you are curious:


if (parsedArgs.path == "")
{
    parentNode = parsedArgs.rootNode;
}
else
{
    parentNode = parsedArgs.rootNode.childByNamePath(parsedArgs.path);
}      
// We are iterating over the parent node's children instead of iterating
// over search results... 
for each (node in parentNode.getChildren())
{
   try
   {
      // ...so we need to filter out some system types that would have otherwise been
      // filtered out by the lucene query
      if (node.typeShort == "cm:systemfolder" || node.typeShort == "cm:thumbnail")
      {
         // do nothing. we don't want these.
      }
      else if (node.isContainer || node.typeShort == "app:folderlink")
      {
         folderNodes.push(node);
      }
      else
      {
         documentNodes.push(node);
      }
   }
   catch (e)
   {
      // Possibly an old indexed node - ignore it
   }
}

Book Review: Alfresco 3 Records Management by Dick Weisinger

Packt Publishing sent me a copy of Dick Weisinger’s new book, Alfresco 3 Records Management, to read and review. Now, before I tell you what I thought of the book I have to say that I don’t perceive a huge demand for Alfresco’s Records Management offering and I haven’t heard them talk about it much lately. I don’t know if my perception matches reality and, if it does, why we don’t hear about it more often. An open source, DoD-certified, freely-available Records Management product definitely sounds unique and compelling. It could be such a focused niche that there’s a lot of activity around it that I’m simply not aware of.

Still, I was curious to learn more about the topic and Alfresco’s offering, so I gave it a go.

Let me start by saying that Dick does a fabulous job of defining a topic area and a target audience and sticking to that. The book is 457 pages without the online appendices but it feels concise. I think that’s because it is well organized, clearly written, and flows logically and seamlessly from one chapter to the next.

The book is written for both Records Managers and Software Developers. You’d think that having such disparate audiences would be a problem, but Dick handles it very well. Every chapter starts out with an end-user description of the Records Management functionality and then when everything’s been covered, shifts to a “How Does it Work?” section that dives into the technical details behind the functionality covered in the first half. Too often, books written for both technical and non-technical audiences munge their material together in such a way that is frustrating to both audiences. In this case, the content is separated cleanly so it works very well. If you are a Records Manager, and you have no desire to peek under the hood, you can read this book and easily skip the “How Does it Work?” section in each chapter without being confused or distracted.

But here’s the other thing that’s going on, which I thought was really cool. Even though the book is about Records Management, Dick’s managed to write a Share customization primer. As I was reading, I was thinking an alternate title for the book could be, “Learning Share Development by Deconstructing the Alfresco Records Management Application” (if you’re not into the whole brevity thing).

Alfresco’s Records Management add-on is actually just a set of repository tier and Share tier customizations. So, if you learn how the Records Management app is built, you learn how to build other Share-based solutions. In the book, each chapter’s “How Does it Work?” section covers a different example of Share functionality and how it works behind the scenes. So, someone who’s interested in learning about Share customizations can read this book from that perspective, and essentially use Records Management as one big sample application.

For example, in Chapter 5 the Records Manager learns how to set up a file plan. In the same chapter, a developer learns how the YUI data table that renders the file plan actually works. In Chapter 9 the Records Manager learns about holds/freezes, retention, and reviews while the developer learns how to configure scheduled jobs and how Share UI actions work. Every chapter works this way–I’m just picking out a couple of examples.

Don’t get me wrong. I’m not saying that you can go from zero to Badass Share Developer with this book alone. That’s not going to happen. But as Dick says in the summary of the last chapter, it’s a good start. The technical sections give you a pointer to the pieces that make up a particular area of functionality. That’s useful if you want to change how Records Management works but you can also use it as an example for adding similar functionality to your own Share-based app.

One thing that frustrates developers trying to customize Share (and Records Management) is the question, “Given what I’m looking at in the browser, where does the code reside that makes it work?”. The book shows the technical reader how to go from Share page to template to component to Spring Surf web script which is something you do over and over when you are first learning how Share is put together. Being pointed in the right direction and being shown the general pattern that the Share developers followed is really valuable.

So, if you’re thinking about implementing Records Management, and you need to know whether or not Alfresco’s offering will fit the bill, this would be a great book for you to read during your evaluation or after the selection has been made and you need to learn how to install and configure the product. If you are the technical person on the implementation team, the book will give you the end-user context as well as a peek under the hood. If you need to customize the product and you’ve never done Share customization work before, you’ll learn how Spring Surf works and, more importantly, given a piece of functionality, you’ll know where to look when you need to change it.

Well done, Dick!

Collaborative content creation with Amazon Mechanical Turk

Amazon’s Mechanical Turk has been intriguing to me since I first heard about it. I think it is because the idea of essentially having a workflow with tasks that can be handled by any one of potentially hundreds of thousands of people has mind-blowing potential.

If you’re not familiar, Mechanical Turk (MTurk) is essentially a marketplace that matches up work requests (called HITs) with human workers (called “Turkers”). The work requests are typically very short tasks that require human intelligence like identifying, labeling, and categorizing images or transcribing audio. Amazon is the middleman that matches up HITs with Turkers. From a coding standpoint, your app makes calls to Amazon’s Web Services API to submit requests and to respond to completed work. Turkers monitor the available HITs, select the ones that look interesting to them and then complete the tasks for which they are paid, usually pennies per task.

Sorting through images or performing other simple tasks is one thing, but what about more complex tasks, like, say, writing an article? Here’s a story about some guys who have created a framework called CrowdForge to do just that. CrowdForge is a Django implementation based on research that one of the authors did at Carnegie Mellon. In a nutshell, their approach splits complex problems into smaller problems until they are small enough to be successfully handled by MTurk, then aggregates the results to form the answer to the original problem. It’s Map Reduce applied to human tasks instead of data clusters.

You should read the original post, but to summarize it, the story talks about an experiment that the team did around collaborative content creation. They applied their framework to the task of writing travel articles. They split the task into 36 sub-tasks and gave each sub-task to an author, then aggregated the results into a coherent article. The partitioning, writing, and re-assembly (the “reduce” part of Map Reduce) was all done through Mechanical Turk by CrowdForge. Total cost for each article? About $3.26.

Then, for comparison, they assigned individual authors to write articles on the same topics using the traditional approach of one author per article paying roughly what they paid for the collaboratively created content. When the results were reviewed, the crowd sourced content beat the single author content in terms of quality. It’s important to note that in both cases, authors were Turkers. This wasn’t Mechanical Turk versus Rick Steves. But still, the researchers were able to use Mechanical Turk to break the problem down, perform each task, and then clean up the result, all for about the same cost without sacrificing quality. That’s pretty cool.

As you know, I’m a huge fan of Django, and I think it is more than okay for the presentation tier of a solution like this. But it seems like a workflow engine like Activiti or jBPM would be a better tool for implementing the actual process flow for a framework like CrowdForge because it could potentially mean less coding and maybe more accessibility by business analysts. Imagine using a process modeling tool to lay out your business process and then dropping in a “Mechanical Turk Partition Task” node, graphically connecting it with a “Mechanical Turk Map Task”, and then hooking that to a “Mechanical Turk Reduce Task”. In and around those you’re wiring up email notifications, internal review tasks, etc.

Metaversant has been working with a client who’s doing something very similar. Editors make writing assignments which are outsourced to Mechanical Turk. When the assignments are complete, they are published to one or more channels. Instead of the Django CrowdForge framework, we’re using Alfresco and the embedded jBPM workflow engine. Alfresco stores the content while the jBPM workflow engine orchestrates the process, making calls to Mechanical Turk and the publishing endpoints.

This approach can be generalized to apply to all kinds of problems beyond content authoring. If you are an Alfresco, jBPM, or Activiti user, and you have a business problem that might lend itself to being addressed by a micro task marketplace like Mechanical Turk, let me know. Maybe we can get my client to open source the specific integration between jBPM and Mechanical Turk. If you’ve already done something like this, let me know that too. I’m interested to hear how others might be integrating content repositories and BPM engines with Mechanical Turk.

Book Review: Alfresco 3 Business Solutions by Martin Bergljung

[UPDATED: To remove my comment about the absence of workflow config in Share, which Martin does cover. Sorry about that, Martin]

Packt Publishing sent me a copy of Martin Bergljung’s new book, Alfresco 3 Business Solutions. I just finished reading it, so I thought I’d write a quick review.

Overall, I think it is a good book with a lot of useful information across a variety of topics. The preface says the book is for “systems administrators and business owners”. Inclusion of “business owners” is a stretch–they’d have to be pretty technical to get something out of this book. I think the intent of including “business” in the title and the target audience is to set up the book as more of a solution-oriented look at Alfresco and less of an exhaustive technical how-to.

Bergljung attempts to organize the book around “business solution” focused chapters. For example, if your main concern is letting your users access the repository through file protocols, then Chapter 5, File System Access Solutions, is for you. If you are doing a content migration to Alfresco, Chapter 8, Document Migration Solutions outlines the different approach available for doing that. I think this is a good approach for the stated audience and most of the chapters fit the approach.

The book starts out with an overview of the Alfresco platform and various repository concepts. Although there are places that risk going into too much detail too early, that first chapter would be a good read for anyone new to Alfresco. The chapters on authentication and synchronization (Chapter 4) and CIFS/WebDAV (Chapter 5) are very thorough and provide some of the best coverage of those topics I’ve seen in any of the Alfresco books. The vigor with which Bergljung attacked those topics makes me think those areas are a particular passion for him. If you are strict about the target audience, Chapters 4 & 5 are definitely the strongest in the book.

However, at various points, the book strays from its intended audience and starts to go into developer topics. Don’t get me wrong, that’s the most interesting stuff to me, but I think it is potentially confusing (or, at best, superfluous) to system administrators. For example, the end of Chapter 1 covers the underlying database schema. Maybe it is a good idea to discuss what the tables are and how they are used so that a DBA gets a feel for the schema and can tune the database appropriately. But Alfresco’s schema is not public and shouldn’t be accessed directly unless you know what you’re doing and if you are willing to sign up for the inevitable maintenance down the road when the schema changes without warning. Bergljung gives a soft warning to this effect at the start of the section but the negative effects could have been emphasized more.

There are a few other developer-centric concepts in the book that I just simply don’t agree with. The first is about AMPs. In a couple of places, Bergljung implies that the Java Foundation API and AMPs are somehow dependent on each other. The statement, “The Foundation API is only used when deploying extensions as an AMP,” is just not true. Later, another statement compounds the problem by saying, “AMP extensions require Java instead of JavaScript”, which, again is not accurate. For some reason, the author is trying to link an API (Java, JavaScript) with a deployment approach (AMPs) which are not related or dependent on each other at all.

Another piece I don’t agree with is about $TOMCAT_SHARED. To be fair, I’ve seen this in other places and have heard certain Alfresco Engineers encouraging the use of $TOMCAT_SHARED for things I think belong in the web app instead. Regardless of where it comes from, I think it’s really bad advice to tell people to use $TOMCAT_SHARED for anything other than alfresco-global.properties and server- or environment-specific settings. Proponents of $TOMCAT_SHARED will say they like deploying their customizations there for two reasons. First, when Alfresco and Share are deployed in the same Tomcat instance, you can deploy your extensions as one package and both web apps will use it. Second, your extensions go into an extension directory external to the Alfresco WAR, which keep them well away from Alfresco’s code. In my option, both of these are actually reasons NOT to use $TOMCAT_SHARED. Why? As to the Alfresco/Share sharing bullet point, why unnecessarily couple those two web apps together? The Alfresco and Share WAR are built to run on completely separate nodes which is helpful. We shouldn’t ruin that by making them both rely on the same shared directory.

As for the “keep your extensions away from Alfresco’s” reason, that’s what the extension directory is for. I can keep my customizations separate and still have them reside in Alfresco’s WAR. In fact, most clients I’ve dealt with prefer that because they have IT Operations teams that only want to deal with self-contained WARs. Being a “special case” is not how you win the hearts and minds of your infrastructure team.

Now, with these picked nits out of the way, I should say that there are some developer-oriented topics that were very good. Bergljung has some good Java Foundation API and JavaScript API examples in Chapter 2 covering Node Service and other commonly-used services. And I like the section in Chapter 3 that talks about setting up Apache Hudson for continuous integration. Chapters 9 through 11 provide good coverage of Advanced Workflows, from designing workflows with swimlane diagrams to a lengthy example showing super states, sub-processes, and custom workflow management dashlets.

In keeping with the “solutions” approach, I think I would have combined the portlet chapter and the mobile app/Grails chapter into a single “integration solutions” chapter and talked less about the specific implementation details and more about the touch points: options for integration (Web Services, CMIS, custom web scripts), single sign-on approaches, what’s available out-of-the-box, caveats, etc. Most of this is covered one way or another between the two chapters. It just seems like a common thing people think through is “What’s the best approach for doing X on top of Alfresco” where X is a portal like Liferay, a community platform like Drupal, a mobile app, and so on.

So, if you’re a “business owner” and by that you mean “non-technical end-user”, you’d probably be better off with Munwar’s book, which is definitely end-user focused. If you are a “system administrator” or someone who needs to know the capabilities of Alfresco and how to integrate Alfresco with various touch points (LDAP, Active Directory, portals), it’s definitely worth a read, particularly if you need to deal with external authentication sources or you are responsible for getting CIFS working. Developers will benefit from the API examples and the chapters on Alfresco’s embedded jBPM engine.

Congrats to Martin and Packt. It’s good to see another title (I think we’re up to 8 or 9 now) added to the Alfresco bookshelf.

Top Takeaways from the Alfresco Kickoff

Alfresco kicked off their fiscal year with a meeting last week in Orlando. About 100 Alfresco employees and 50 partners attended two days of Alfresco-led talks on business and technical topics. As an aside, the conference food at the JW Marriott was maybe the best I’ve ever had at any tech event. Lobster Corn Dog. Enough said.

More importantly, the trip helped clarify in my mind Alfresco’s message around “social content management”. I now see it as taking two different forms: social content management and social content management. The first is basically just marketing what they’ve already got: content services, collaboration, task tracking, wikis, and blogs, exposed through a modern user interface that’s closer to the experience most users have come to expect from using consumer-facing sites and services. The idea is that when you add things like comments, tags, and ratings you go from boring, old-school Enterprise Content Management to fun and exciting social content management. Obviously, there’s other stuff going on here about how when people collaborate, they don’t do it in a vacuum, they do it around content.

The second form–social content management–is when you need to manage content that is published to one or more social channels. For example, Marketing might have a press release, a video, and a tweet that all need to go live at the same time. Order matters, and if one step fails, none of the steps should be performed. Alfresco is building a social publishing framework to handle this kind of use case. So, in this example, “social” doesn’t describe the features of the system–it describes the type of content being managed.

Alfresco didn’t explicitly differentiate between these two forms of social content management but they have current and future functionality that addresses both.

One of my other purposes of the trip was to find out what’s coming in Project Swift, which is the code name for an up-coming release of Alfresco. It sounds like Marketing has the final say on what specific release Swift will be, but after hearing what’s slated for the release, most of us in the room agreed it should be labeled as a major release (4.0). We’ll see.

So what’s going to be in Swift? Lot’s of cool stuff, but here are the top five technical takeaways from the Project Swift Roadmap that jumped out at me:

#5: CIFS, SharePoint, and FTP will be clusterable. CIFS and SharePoint performance are both issues at one of my clients so this one caught my eye.

#4: New Share extension points are coming in Swift including a framework for custom actions, dialogs, and evaluators. The goal is to reduce the amount of copy-and-paste that goes on during typical customizations of Share and to make upgrades a bit easier.

#3: Alfresco is developing a “social content publishing framework” with publishing endpoints for YouTube, Facebook, Twitter, Drupal, and more to address the social content management use case I described earlier in this post. I like this one a lot because I think a lot of people have this problem and because it leverages Alfresco in exactly the “right” way.

#2: Swift will sport a new Apache-licensed workflow engine called Activiti, which is a separate Alfresco-sponsored open source project founded by the creators of JBoss jBPM, which is currently the workflow engine embedded in Alfresco. With Swift, both engines will exist side-by-side. It sounds like you may be able to have jBPM continue to handle running workflow instances and use Activiti to handle new instances if you want to. Activiti will show up in Community soon for people to start playing with.

#1: Apache Solr will be implemented as an optional, separate shared search server and index. As part of this, Lucene will no longer be updated in the same transaction. Instead, the index will be eventually consistent. This should result in a huge performance gain and easier clustering. You’ll also get better control over what gets indexed. In Swift you’ll be able to configure full-text indexing by things like content type and path. The Solr server will accept CMIS Query Language and Alfresco FTS queries but not the current raw Lucene syntax so it might make sense to start moving your queries over to one of these two options if you anticipate leveraging Solr when it is available. Note that it is possible Alfresco may choose to make the Solr server an Enterprise-only feature. It didn’t sound like a final decision had been reached on that.

A Community release of Swift should happen some time in August, but we should start seeing a lot of activity in subversion starting in April. The Enterprise release is slated for mid-November. I predict some late nights ahead for QA and Engineering between Thanksgiving and Christmas. I know there’s not a huge difference between November and January but I’d love to see Swift go GA before year-end.

One comment about Share customizations: I get asked a lot about when I will be updating the Alfresco Developer Guide to include a chapter on Share. I have most of the SomeCo examples ported to Share in an as-yet-unshared code base but, as you can see from some of the changes coming in Swift, Share is still changing a lot with respect to customizations, so I’ve been hesitant to update the book. If you’re looking for Share examples you should take a look at Will Abson’s Share Extras project on Google Code. He’s got about 18 different examples of varying complexity and type. I believe each one is individually deployable.

Not bad for the price of a couple of days in Orlando.