Month: June 2014

June 26, 2014

Alfresco Anti-Patterns: When You Probably Shouldn’t Use Alfresco

There are plenty of write-ups listing what Alfresco can do–I thought it might be instructive to list the things people often try to use Alfresco for but shouldn’t. I’ve got five examples in my list. The first two are common mistakes people make during product selection. The last three are more architectural.

Anti-Pattern #1: Dynamic Web Content Management (like Drupal or WordPress)

I think this is happening less, but every once in-a-while I’ll still see people trying to compare Alfresco to dynamic WCM platforms like Drupal or WordPress. Alfresco has very little in common with systems like these. If you install Alfresco and expect it to serve up a pretty web site out-of-the-box with downloadable themes and tons of modules or widgets you can use to add features to your web site, you’ll be disappointed. This isn’t a shortcoming of the tool, it’s just not what it was built for.

There are plenty of people who use Alfresco to manage assets that are eventually served up to the web. They’ll use Alfresco Share or a custom UI as the “administrative” interface for managing content. Then, they’ll push that content out to some other system on the presentation tier (Saks Fifth Avenue and New York Philharmonic are two examples).

There are partners who have created WCM solutions on top of Alfresco (see Crafter). Solutions like that leverage the power of Alfresco as a content repository and then add in the missing pieces, which are mostly about presentation layer, site building, and content creation.

The bottom-line is if you find yourself comparing out-of-the-box Alfresco to systems like Drupal or Wordress you have made a mistake in your evaluation.

Anti-Pattern #2: Full-featured wiki, portal, blog, forums, or calendar

I’ve encountered several people looking to replace major collaboration systems in their IT footprint with Alfresco. Maybe they’ve decided to use Alfresco for document management, but they want to see what else they might be able to replace. They have a wiki they want to replace, they see Alfresco has a wiki. Problem solved, right? This is where box-checking against a feature list gets you into trouble.

Alfresco is a document management repository with a powerful embedded workflow engine. Alfresco Share, the web client that sits on top of Alfresco, is great for basic document management, processes around documents, and team collaboration.

For teams and projects, Alfresco Share uses a “site” metaphor to keep everything related to that team or project together. Each site has a dashboard. Out-of-the-box “dashlets” can be used to summarize or highlight information stored in the site. Out-of-the-box, everyone sees the same dashboard for a site, which is configured by a site manager. There is no easy way for a power user to specify which dashlets should be restricted to which users or groups of users through the UI like there would be in a portal, for example. So, although dashlets look like “portlets” Alfresco Share doesn’t really have much else in common with portals. If you what you really want is a full-blown portal server you should look at something like Liferay or Exo.

Each site can also be configured with a number of collaborative tools such as discussions, blog, wiki, and calendar. These are more than adequate to facilitate most of what a team, project, or department needs. But none of them individually are going to replace full-featured, standalone systems. If you need the power of a full wiki, install MediaWiki. If you need a blog server, install WordPress. And so on.

Those are two where I see people making adjustments in their expectations early in the product evaluation phase. Now let’s look at a few that may not get uncovered until an architect or developer gets involved…

Anti-Pattern #3: Highly relational solutions

Alfresco relies on three main pillars to deliver its functionality: The file system, a search engine (Lucene or Solr), and a relational database. But you won’t be touching any of those directly. Instead, you’ll work with an abstraction which is simply, “the repository”.

Don’t be misled by the inclusion of a relational database as one of its dependencies. It is there to manage metadata. As you start to customize Alfresco to meet your specific requirements, you’ll define the content model. Alfresco will do the work of reading your content model and storing metadata for instances of those content types in the database.

Objects in the repository can be related to each other through “associations”. These are essentially pointers between one or more objects. There are a couple of challenges with these. First, they cannot easily be queried. You can ask an object for its associations and then you can iterate over those, but you cannot do a traditional “join” across objects.

For example, suppose you have a “whitepaper” object that has an association to one or more “product” objects. You cannot execute a single query that says “Give me all whitepapers containing the word ‘performance’ that are associated with the product named ‘Acme Widget'”.

One way people work around this is to de-normalize their data, then implement code that keeps it in sync. In this example, you could add a multi-value property on the whitepaper object that would store the names of the products a whitepaper is related to. Then you’d be able to run that example query.

If the name stored on the product object changes, your code would trigger an update on all corresponding whitepapers to keep the product name in sync. If you have a small number of such relationships with a reasonable number of objects on either side of the relationship this is fine, but you can see how it might quickly get out-of-hand.

So if your underlying data is highly-relational, don’t try to force it into an Alfresco content model. Instead, move the relational data to a database and use Alfresco only for the content pieces.

Anti-Pattern #4: JSON/XML object store

It’s really common to store chunks of JSON or XML as content in Alfresco. For example, maybe you have some data that isn’t expressed well as name-value pairs. Or maybe the content you need to manage just happens to be in one of those formats. But if that’s all you need to persist in the repository you really ought to be asking yourself why you are using Alfresco when there are many lighter-weight, more scalable technologies that are purpose-built for this.

One limitation of storing JSON or XML as content in Alfresco is that the repository has no semantic understanding of the content. For example, suppose you have a book object that is represented by JSON and you store that JSON as content. It’s likely that the JSON would contain properties like “title”, “author”, or “ISBN”. Out-of-the-box, none of those will be queryable by property. Alfresco will simply attempt to full-text index the content like any other content stream. It doesn’t understand the difference between “title” and “author” because that meaning is embedded in the content itself, not the object. The same is true for XML.

You can work around this by setting up metadata extractors to grab data out of the JSON or XML and store it in properties on the object. Then, you can query the object’s properties through Alfresco. But if all of your objects are similarly-structured it might make more sense to use a document-oriented NoSQL repository or an XML database instead. When you store a JSON document in something like Elasticsearch, Couch, or MongoDB, no extra work is necessary because those systems natively understand JSON.

Anti-Pattern #5: Storing lots of content-less objects

A content-less object is an object that lacks a content stream. It’s common to have one or two types of content-less objects in your Alfresco-based solution because there are usually good reasons to have objects that don’t have a file associated with them. Maybe you are storing some configuration as properties on an object, for example. But if you need to store nothing but content-less objects, you are throwing away many of the benefits you get from a repository like Alfresco that is built specifically for managing file-based content like full-text search, transformations, and file-based protocols.

If you just need to store objects that have properties but no file-based content, you might be better of with a document-oriented NoSQL repository or a key-value store.

Summary

As I mentioned at the start of the post, there are a lot of cases where Alfresco makes sense and you can find many of these around the net. The goal of this post was to list common misconceptions or even misuses of Alfresco that can cost you time and money.

Any time you invest in a platform you’ll find corner cases that the platform wasn’t meant to address and you can often work around those with code. What you don’t want to do, though, is have your entire system be a corner case relative to the platform’s sweet spot. That’s no fun for anybody.

June 13, 2014

How I successfully studied for the Alfresco Certified Engineer Exam

Back in March I blogged about why I took the Alfresco Certified Administrator exam (post). Today I passed the Alfresco Certified Engineer exam. I took it for the same reasons I took the ACA exam, as outlined in that post, so in this post, I thought I’d share how I studied for the test.

Let me start off with a complaint: There is nowhere I could find that describes which specific version of Alfresco the test covers. This wasn’t that big of a deal for the ACA exam, but for the ACE exam, I felt a little apprehensive not knowing.

I know Alfresco probably doesn’t want to lock the exam version to an Alfresco version. But the blueprint really needs to give people some idea. Ultimately, I decided 4.1 was a safe bet.

I can’t tell you what was on the test, but I can tell you how I studied.

First, review the blueprint

The exam blueprint is the only place that gives you hints as to what’s on the test. If you look at the blueprint, you’ll see that the test is divided into five areas: Architectural Core, Repository Customization, Web Scripting, UI Customization, and Alfresco API.

The blueprint breaks down each of those five areas into topics, but they are still pretty broad. Some of them helped me figure out what to review and some of them didn’t. For example, under Architectural Core, topics like “Repository”, “Subsystems”, and “Database” were too vague to be that helpful in guiding my study plans.

Next, identify your focus areas

Looking at the blueprint, most of those topics have been in the product since the early days and haven’t changed much. I figured I could take the test cold and pass those. But Share Configuration and Customization has changed here and there between releases. With a lot of different ways to do things, and ample opportunity for testing around minutiae, I figured this would be where I’d need to spend most of my study time. I also wanted to spend time reviewing the various API’s listed under Architectural Core because I typically just look those up rather than commit the details to memory.

To validate where I thought my focus areas should be I took the sample test on the blueprint page, which was helpful.

Now, study

For Architectural Core, I spent most of my time reviewing the list of public services in the Foundation API found in Appendix A of the Alfresco Developer Guide, the JavaScript API (also in Appendix A as well as the official documentation), and the Freemarker Templating API documentation.

For the Repository Customization I figured I had most of that down cold and just spent a little time reviewing Activiti BPM XML and associated workflow content models. The workflow tutorial on this site is one place with sample workflows to review and obviously the out-of-the-box workflows are also good examples.

According to the blueprint, the UI Customization section is now focused entirely on Alfresco Share, so I didn’t spend any time reviewing Alfresco Explorer customization. Instead, I read through the Share Configuration and Share Customization sections of the documentation. There are now tutorials on Share Customization in the Alfresco docs so I went through those again just to make sure everything was fresh. The Share configuration examples in my custom content types tutorial are another resource.

The Alfresco API section consists of questions about the Alfresco REST API and CMIS. This is only 5% of the test so I spent no time reviewing this. I also ignored Web Scripts, figuring my existing knowledge was good enough.

After studying the resources in my focus areas I took the sample test once more. It’s always the same set of questions, so taking it repeatedly isn’t a great way to prove your readiness, but at least you know you won’t miss those questions if they show up on the real test.

Feel ready? Go for it

If you get paid to work with Alfresco, you really ought to take this exam (and the ACA exam). Obviously, what I’ve reviewed here is a study plan for someone who has significant experience with the platform doing real world projects. If you are new to Alfresco you’ll have to adjust your plan and preparation time accordingly. Better yet, get a few projects under your belt first. I think it would be tough for someone with no practical experience to pass the test with any amount of study time, which is the whole point.

So there you go, that’s how I studied. Your mileage will vary based on what your focus areas need to be. Now go hit the books!