Tag: Optaros

Packt Author of the Year Award

I mentioned it on Twitter yesterday but I definitely wanted to spend more than 140 characters saying thanks. If you missed the tweet, what I’m talking about is that Packt announced that I won the Author of the Year Award for the Alfresco Developer Guide.

Earlier this year they had a nomination process which resulted in readers choosing me as a finalist. Getting that far was really cool and I definitely want to say thanks to the readers of the book and the blog for making that happen.

The next step was a round of judging. This took a while and I can’t imagine the process whereby you get a panel of judges, with different backgrounds and full-time jobs (I assume) to look at six technical books that cover a wide range of topics (Alfresco, Drupal, Ext JS, JavaScript, and SOA) and subsequently make sense of the feedback. Definitely a big thanks to Packt and the panel of judges who worked hard to make this happen.

The book was certainly a team effort. I had a great team of technical reviewers, and they were instrumental. I think it’s also important to recognize my firm, Optaros, because the project couldn’t have happened without their support and encouragement. The entire Alfresco team also got behind it in one way or another and that was important to the success as well.

Alright, enough said. I’ll quit before I start working my way down the family tree. Maybe Twitter’s length restriction really is the best thing for an acceptance speech. Thanks, again, to everyone involved!

Understanding the differences between Alfresco’s repository implementations

People new to Alfresco are often unaware of the existence of two different repository implementations within the product. One, which I’ll call the “DM Store”, is the classic store, the one that’s been used by Alfresco since the beginning. The other, the “WCM Store” or, as it is often referred to in API-speak, the “AVM Store”, was born with the addition of the Alfresco WCM product offering. Whether you are doing document management or web content management, you use the same Explorer client, but under the covers, your content lives in two very different types of repositories.

The Alfresco story on why a second repository implementation was created is that the Engineers writing WCM didn’t believe the DM store was capable of providing the kind of support for versioning, branching, and layering functionality they needed (hence, the AVM acronym, which stands for Advanced Versioning Manager) so they created an entirely new repository implementation to support WCM.

Why does this matter, apart from being a possible topic of conversation at your next get-together (“Healthcare is easy to fix. Do you think Alfresco will ever unify their two repository implementations?”)? It matters because the “two sides” of Alfresco are not equivalent in terms of functionality and depending on what you need to do, you may find yourself performing unnatural acts to work around the disparity.

Many projects will be completely unaffected by the differences between Alfresco DM and Alfresco WCM. But it is important to know what these differences are when you first begin to plan your solution to avoid uncomfortable conversations between you and your customer when you realize you’ve made a bad assumption.

I’ll assume you know the high-level capabilities of both Alfresco DM and Alfresco WCM. Obviously there are some things one product can do that the other can’t that are by design (sandboxes and virtualization in WCM, for example). What’s more important to understand are the subtle (and sometimes not-so subtle) differences between the two. Here’s the list and a table that summarizes, if you are into the whole brevity thing:

Content Modeling. Alfresco DM uses a proprietary XML-based description of the content model while Alfresco WCM uses XML Schema. On the surface this isn’t a big deal, but it does mean if your repository contains a mix of DM- and WCM-stored data, you won’t have a single model that defines it all and you could possibly have duplication between the two.

Custom Content Types. In Alfresco DM, when you create content, you tell Alfresco what its content type is. If you’ve extended the out-of-the-box model, you can have any number of business-specific content types with your own custom metadata. In Alfresco WCM, custom content types are not supported. In WCM, your content type is your web form. Interestingly, although the “Type” dropdown is shown in the “Create Web Content” dialog, and it will contain custom content types you’ve defined using the Alfresco DM model, your selection will not be honored. All AVM content is created as an instance of the “avmplaincontent” content type no matter what you select. However, although you must do it through an API call, you can apply custom aspects to AVM content.

User Interface Configuration. Alfresco DM uses a proprietary XML-based configuration file to define the “property sheets” that display metadata in the Alfresco Explorer client for a given content type or aspect. Alfresco WCM uses the embedded Chiba XForms engine to inspect the XML Schema (XSD) and automatically create a web form that will produce data that conforms to the XSD. XSD annotations can be used to influence the presentation of the form fields. One outcome of this is that it is much easier to localize things like property labels in Alfresco DM than it is in Alfresco WCM.

User Interface Extension. If you need to change how the Alfresco Explorer client behaves, there are some things you can do through XML, but advanced customizations will require JavaServer Faces (JSF) development. Alfresco DM and WCM both use the same Explorer client so this applies to both (See “Alfresco User Interface: What are my options?”). However, if you need to change how the web form engine works, you may need to write new Chiba XForms widgets. For instance, Optaros developed a web form used to describe points and regions on Google Maps. That kind of thing requires you to understand how to extend Chiba.

Structured (XML) data entry. Data entered in an Alfresco WCM web form is saved as XML that conforms to the XSD you’ve defined. There is no similar facility for capturing data as XML available within Alfresco DM. At one point the Community code line had “ECM Forms” which was essentially WCM web forms for the DM side of the house, but that’s disappeared in the latest Community release. On the DM side, when you edit metadata you are editing object properties whose values get stored in the database, not as XML.

Transformations. You can use either Freemarker or XSLT to transform Alfresco web form XML into other formats. That transformation is defined as part of the web form configuration which you do within the Explorer client. In Alfresco DM, transformations are more about binary file transformations (DOC to PDF or GIF to PNG, for example). If you want to do Freemarker or XSLT transformations on XML content stored in Alfresco DM, you’ll need to write that yourself (an Action would do the trick). If you want to do DM-style transformations on binary files in Alfresco WCM, that’s not out-of-the-box. You’ll have to do that using the API.

Rule actions. Alfresco DM allows you to configure rules on folders to trigger actions (out-of-the-box or custom) to operate against newly-added, updated, or deleted documents. Alfresco WCM does not support rule actions at all.

Auditing. Alfresco DM has a granular auditing sub-system. You can configure it to audit just about anything you want. Anything except WCM. You can audit web project creation, but not changes to individual web assets within a web project. At least not out-of-the-box.

Object-level permissions. In Alfresco DM you can assign users and groups to roles at the folder and file level. In Alfresco WCM, the UI will only let you go as low as the web project level. The API supports more granular security but you have to implement that yourself with custom code.

Search. Everything in Alfresco DM is full-text indexed and searchable. In Alfresco WCM, only the Staging Sandbox of each web project is indexed. You can do a search from your user sandbox but you’re really searching the Staging Sandbox. If you have any content you’ve created in your user sandbox that you have not yet committed to Staging, web project search won’t find it. Another limitation is that you cannot search across web projects. That search box that’s visible in the far upper right-hand corner of the Alfresco Explorer client is the Alfresco DM search–it won’t find anything in any of your web projects.

Advanced Workflow. Alfresco DM and Alfresco WCM use the same JBoss jBPM workflow engine so there’s no functional difference between what you can do with workflow on either side. The only catch is that in Alfresco DM, all deployed workflows show up in the “Start Advanced Workflow” dialog whereas in WCM, you have to tell Alfresco which deployed workflows are okay to use for WCM. That’s covered in the Alfresco Developer Guide and on the wiki.

File protocols. CIFS and FTP are the only two file protocols supported by both Alfresco DM and Alfresco WCM. Similar protocols supported by Alfresco DM such as WebDAV, inbound SMTP, and IMAP, are not supported by Alfresco WCM.

Deployment. Some people use Alfresco DM to manage content that is published to the web because they don’t need the additional features WCM offers, or they have some other reason to export content to another server. Unfortunately, Alfresco DM does not yet offer a deployment component like the one in Alfresco WCM. If you want to export content from Alfresco DM to some other destination in a systematic way, you’ll have to roll your own solution.

As John and Paul said, “It’s getting better”

Some of these differences will become less drastic in coming releases. For example, Alfresco is implementing a new form service that will be used to define the content model and user interface across the entire product line, so that helps. The WCM deployment functionality is also being refactored and will ultimately work for both DM and WCM. And at every community event Alfresco talks about “repository unification” as a goal for the future, although the timeline is lightyears away in terms of software releases.

As I said, depending on what you’re doing these differences may not affect you at all. Just make sure you don’t assume that a given feature is available everywhere, and make sure you’ve made a conscious decision about what content to put in which repository (DM or WCM) based on your requirements.

Yet another reason to love Open Source Content Management

Man, I don’t miss delivering solutions on top of Documentum. After reading Laurence Hart’s post on Documentum Developer Edition, I’m reminded how much I take for granted working exclusively in the open source content management world.

Laurence’s post was intended to discuss the ins and outs of Documentum’s efforts to make it easier for developers, and, as usual, he’s done a good job of that. But it also underscores the benefits enjoyed by those who work in open source land. In case you don’t know how good you’ve got it, my open source brothers and sisters, check it out:

Developers working with closed source ECM vendors have to pay to get the software

As Laurence points out,

“There are lots of independent consultants out there that have trouble keeping-up with the technology because they can’t afford to become partners for the requisite fee.”

If you are a developer looking to go deep on closed source software, you have no choice but to pay. There’s no other way to get access to the software. Sometimes you can’t even get access to the documentation or the bug database without a paid-up partner account (or a client that lets you use theirs).

[UPDATE: Jerry Silver, from EMC, points out that the Documentum Developer Edition is a free download. My original post made it sound like you had to be part of the partner program to obtain the download.]

With open source, the barrier to entry is much lower. You pay nothing to get the software. It’s all about the time and energy you put into learning the product and implementing cool solutions.

To be fair, commercial open source vendors often charge partner fees as well, but the bottom line is that it costs nothing to get started with the code.

Developers working with closed source ECM vendors struggle with giant developer footprints

I feel sorry for Laurence’s laptop:

“The complete Development install calls for 3GB of RAM (after a 1.7+GB download).  That is no small thing for a development laptop.  It needs to be on a newer machine.  If you can move the database service to a different box, that will make your life easier.”

Oh dear. A 1.7GB download for a developer setup? Am I downloading a VM image or a content management server? Let’s look at Alfresco for a comparison. Assuming you are starting from scratch, and assuming you are going to go full-on with the Alfresco platform, your total download is right around 300MB. That includes:

  • Alfresco SDK
  • Alfresco WAR
  • Alfresco WCM (Deployment listener and add-on to core repo)
  • Apache Tomcat
  • Sun JDK
  • MySQL (Server and connector)

All of which runs comfortably in 2GB of RAM and won’t even cause your fan to kick on in 4GB.

Developers working with closed source ECM vendors have less choice

Optaros consultants are now split fairly evenly in their choice of OS across Windows, Mac OS X, and some flavor of Linux. Some people prefer MySQL and some prefer PostgreSQL. Mostly we use Eclipse for Java development but everyone’s got a preference. I use Tomcat for everything locally while others like JBoss. The point is, developers want to use their tools the way they want to. It’s not a stubbornness thing it’s an efficiency thing.

Within my CMS I want the same flexibility. I want to tweak settings. I want to name my database what I want. I want the flexibility to deploy across as many (or as few) nodes as I need to. From Laurence’s post, it sounds like Documentum clearly falls down here.

Developers working with closed source ECM vendors can’t see the code

It’s obvious, I know. For developers that work with open source it is extremely natural to use the CMS source code when debugging or for reference. You don’t even think about it–it’s just there and you use it. Imagine the frustration of someone who works with closed source CMS who has to routinely decompile classes to figure out what’s going on. That truly sucks. What good is a “Developer Edition” that doesn’t come with source code?

Partner defections from closed source are on the rise

I’ve seen recent announcements from multiple partners who were previously exclusive to closed source vendors but are now adding open source to their partner list. This is a reflection of increasing demand by customers who are realizing the business value of open source, especially in tough economic times as well as partners’ desire to make up for sagging demand in the proprietary world. But could it also be that more firms are realizing how much more productive and pleasant it is to work with open source content management?

Help your employer/client see the light

Open source ECM technologies like Alfresco, Drupal, Liferay, Lucene, and many others, are now at or beyond their closed source equivalents. If you are a developer who’s sick of the shackles closed source CMS places on you, why not suggest exploring open source alternatives?

Alfresco Developer Guide source reorg and 3.2 Community update

[UPDATE: Added a link to the source code that works with 3.2 Enterprise]

I originally wrote the Alfresco Developer Guide source code for Alfresco 2.2 Enterprise and Alfresco 3 Labs. The code was pretty much the same regardless of which one you were running. For things that did happen to be different, I handled those with separate projects: one for community-specific stuff and one for enterprise-specific stuff. This was pretty much limited to minor web script differences for the “client extensions” projects and LDAP configuration differences for the “server extension” project.

With the release of 3.2 Community, I realized:

  • The number of different flavors of Alfresco any given reader might be running are going up, not down. Who knows when 2.2 Enterprise will be sunset.
  • It is no longer as easy as “Enterprise” versus “Labs/Community” because multiple releases of the same flavor are prevalent (2.2E, 3.0E, and 3.1E, for example).
  • Tagging my code in Subversion by Chapter alone is no longer enough–I need to tag by Chapter and by Alfresco version.
  • Sending the publisher the code one chapter at-a-time and expecting them to manage updates and deciding how to organize all of the chapter code was a bad idea.

So, I’ve done some work to make this better (reorg the projects, restructure the download files). I’ve also tested the example code from each chapter against the latest service packs for all releases since 2.2 Enterprise. That includes making some small updates to get the examples running on 3.2 Community.

You can now download either all of the source for every version I tested against, or, download the source that works for a specific version. It may take the official download site at Packt a while to get the new files, so here are links to download them from my site:

Alfresco Developer Guide example source code for…

  • Alfresco 2.2 Enterprise (~5.3 MB, Download)
  • Alfresco 3.0 Labs (~5.6 MB, Download)
  • Alfresco 3.0 Enterprise (~5.7 MB, Download)
  • Alfresco 3.1 Enterprise (~5.6 MB, Download)
  • Alfresco 3.2 Community (~5.7 MB, Download)
  • Alfresco 3.2 Enterprise (~5.9 MB, Download)
  • All of the above, combined (~28.1 MB, Download)

Hopefully this makes it easier for you to grab only what you need and makes it clear that each Eclipse project contains only what’s needed to work with that version of Alfresco. Deployment is easier too. Most of the time, it’s just the “someco-client-extensions” project that you deploy.

Now that I’ve got everything structured like I want it, as new versions of Alfresco are released, it should be much easier to keep up.

ECM vendors have their heads in the cloud, can you see through the fog?

The hype around cloud computing has reached a fevered pitch so it is natural that ECM vendors try to take advantage of that as much as they can. Some examples from the open source ECM world:

  • Alfresco always seems to be partnering with one cloud vendor or another. I went to a brief session on Alfresco, GoGrid, and ParaScale earlier this year. (As an aside, those GoGrid cycling socks, which I thought was a strange giveaway at the time, are awesome).
  • At the end of last year eZ Publish announced a partnership with Mamut to provide eZ as SaaS.
  • Just last week Nuxeo announced a cloud edition of its product.

Clearly, ECM vendors are busy figuring out how to take advantage of the cloud. But what does it mean for ECM to be “in the cloud”? When might it work for you?

Cirrus, Stratus, or Cumulonimbus

The first thing you need to realize is that when people say “cloud” they often mean very different things. Generally, there are three types of clouds: Software-as-a-Service (Saas), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS).

Software-as-a-Service (SaaS) is the same model that’s been around for years but has lately taken advantage of the cloud moniker. Google Apps and Salesforce.com are the big SaaS players but there are SaaS offerings for all kinds of business applications, including content management.

The allure of SaaS ECM is the same as that of SaaS in general:

  • Lower up-front costs
  • Someone else gets to worry about running and scaling the infrastructure
  • Depending on the vendor, you may only have to pay for what you use

The challenges of SaaS ECM include things like:

  • The ability to do heavy customization and complex workflows
  • Ease of integration with other systems
  • Client perceptions (and real issues) around data security
  • Data portability/vendor lock-in

Open Source CM vendors Nuxeo and eZ Systems have SaaS offerings as do proprietary vendors such as SpringCM, CrownPeak, Clickability, and PaperThin, to name a few. Beyond just general-purpose document and content management, I think you’ll also see vendors build verticalized SaaS offerings on top of hosted content management technology.

The next type of cloud is Platform-as-a-Service (PaaS). The two best examples of PaaS are Google App Engine (GAE) and Salesforce.com’s force.com platform. With PaaS, you provide the code and the PaaS provider does the rest. Of course this means your code has to follow certain standards and is often subject to limitations, but the beauty is that you get a completely custom solution without worrying about any of the infrastructure.

I like GAE. For certain applications, the benefits of instantaneous, global scale far outweigh the limitations of the platform. But I don’t expect ECM vendors that would do well in SaaS or IaaS clouds to do much with PaaS. You can’t take an Alfresco or a Drupal and run it on a PaaS cloud. I do think we will see PaaS-native content management systems. For example, I’ve seen apps in the Salesforce.com AppExchange that are basically tools for building a web site that’s tightly integrated with Salesforce.com. I think you’ll also see solutions that leverage a PaaS for certain components or sub-systems.

The third type of cloud is Infrastructure-as-a-Service (IaaS). An IaaS cloud is about providing virtual servers on-demand. Examples include things like Amazon’s EC2, Rackspace Cloud, and GoGrid. With these services you can instantly provision as many servers as you need. What you do with them is up to you. When you’re done, you turn them off. Specifics vary but you are essentially billed for CPU time.

The way people leverage IaaS differs. Some people will provision a server and install their ECM software of choice and stop there. Other than dealing with different file storage approaches of various IaaS vendors, this is really no different than running your own virtual servers. So when someone says they are running XYZ CMS “in the cloud” and it turns out to be a single node on a virtual machine, I can barely stifle a yawn. It’s fast and convenient to set up, yes, but technically it’s pretty boring.

The more interesting way to use ECM in an IaaS cloud is to leverage the ability of the infrastructure to scale on-demand. That’s the real value of “the cloud” after all. For example, at Optaros we run an IaaS-hosted solution called OView that syndicates content and content-centric applications to web sites. When a client places that content or app on Yahoo’s home page we get a huge spike in traffic. We run the solution on Amazon EC2 images and we use RightScale to dynamically provision additional nodes when traffic warrants.

The degree to which a specific ECM vendor can operate in a dynamically-scaled infrastructure varies greatly. Simply “running in the cloud” is easy. Scaling your ECM infrastructure automagically is harder.

What do you really need?

If the list of SaaS benefits have a lot of appeal to you and the challenges and potential limitations aren’t much of a bother, SaaS ECM might be worth evaluating. This will most likely be a better fit for clients with limited IT resources and simple to moderate requirements around ECM.

On the IaaS front, if it is just an issue of externally-hosting your ECM infrastructure, make sure the cloud is what you want. The best use case for the cloud is when demand is temporary or unpredictable with huge spikes. I would argue that for your core ECM infrastructure demand is neither temporary nor unpredictable.

If “scale” is your issue, I would challenge you to think about exactly what needs to be scaled. If it is just content delivery of static content, maybe you could get by with a CDN. If your content management system can separate authoring from dynamic delivery of content, maybe only the dynamic content delivery mechanism needs to be able to scale quickly.

You might have certain processes (large-scale video transcoding, for example, or other types of periodic batch processing) that you could leverage the cloud for without cloud-enabling your entire ECM infrastructure. Acquia‘s hosted spam filtering service, Mollum, and their newly-released hosted-search offering are two examples where only specific pieces of your infrastructure are off-loaded to the cloud.

If it turns out that you need to scale the whole ball of wax, fine, it can be done, but have a good reason.

ECM in the cloud is, um, cloudy

The cloud as a style of computing is exciting. The cloud as a “feature” is potentially confusing. ECM vendors are going to do what they can do have it somewhere “on the box”. But it’s not something you can simply check off. The next time you hear an ECM vendor say, “cloud-ready”, ask them what they mean. Then figure out whether or not that has any relevance at all to your real requirements.

Is the cloud on your horizon? Let me know if/how the cloud relates to your ECM strategy.

Alfresco-Django integration now available on Google Code

The Alfresco-Django code I demo’d in the screencast yesterday is available at Google Code. It includes the core Django integration, the sample site, an AMP file you can use to deploy the web scripts and the sample site bootstrap data to Alfresco, and documentation which you can build using Sphinx.

This should work with Alfresco Labs 3D Stable, Alfresco 3.0.1 Enterprise, and Alfresco 3.1 Enterprise.

My Optaros colleague, Sean Creeley, did most of the work, so thanks, Sean. Obviously, thanks to Justin, JC, and the rest of the Neiman Marcus team as well.

This is the initial public release of this thing so we welcome feedback in all forms, whether that’s suggestions for the roadmap, bug reports/fixes, enhancements, doc, etc. With your help, I think we could make this a really sweet Alfresco front-end development kit.

Screencast: Alfresco Django integration

I’ve created a screencast over at Optaros Labs that shows a simple web site, powered by Django, that pulls all of its content from Alfresco.

At Optaros, we see Django and Alfresco as a powerful combination for building content-centric applications. The integration shown in the screencast is based on work we did for our friends at Neiman Marcus. An open source version of this integration will be available within a week or so.

Alfresco 3.1 clustering easier with JGroups

Optaros has worked on some of the largest and most complex Alfresco implementations anywhere. Projects where multi-node read-write clusters are required have been particularly challenging. So when Alfresco announced clustering improvements in 3.1 my interest was piqued.

I decided I’d do a simple test: Get a two-node read-write Alfresco 3.1 cluster running using a shared MySQL database and a shared file store (as opposed to a replicated database and a replicated file store). The process is mostly documented here but I thought I’d capture the steps I went through in case someone finds them helpful.

Prepare the virtual machines

If you already have virtual or physical machines ready to go, go on to “Setup the content store & database”.

I already had an Ubuntu server virtual machine image with everything I needed for the test. I upgraded it to Alfresco 3.1, cleared out the repository, and verified that everything was working okay. In order to share my data directory via NFS I did need to use apt-get to install nfs-kernel-server, nfs-common, and portmap, but that’s no big deal.

Once I had the first image all set it was time to create a second. I’m using Sun’s VirtualBox for virtualization. It doesn’t have a “clone” command in the UI and you can’t simply do a file copy of the VDI file. Instead, you have to use VBoxManage on the command line. The form of the command that uses the source VDI file name and target file name didn’t work, but using the source VDI UUID did:

BoxManage clonevdi 19a7646e-d5cb-4e01-90fd-2bcd556dc1d5 "Ubuntu Test Server Clone.vdi"

It was weird that I had to use the source UUID instead of the file name, but I got what I wanted.

Setup the network

I used VirtualBox “host only” networking for ease of setup. This allowed my host machine to see the images and the images to see each other.

My server image was originally set up to use DHCP. That appeared to be giving Alfresco and JGroups trouble so I converted the images to use static IP addresses, unique host names, and updated hosts files (I didn’t want to set up DNS). That left me with three machines (one host and two virtual machines called node1 and node2) that could ping each other by name.

Setup the content store & database

At this point I’ve got two identical Alfresco servers, but each have their own database and data store. For my test, they needed to point to the same database. They also needed to share the content store but have their own local Lucene index.

For this test I decided to use the database and file system on node1 for both nodes. In real life, that wouldn’t be a good setup because losing node1 would bring down the whole cluster. For a shared db/file system setup, you’d want separate nodes, each clustered, for the db and file system.

My Alfresco content store is in “/srv”. I wanted to use NFS to share the content store with the other nodes in my cluster, so I edited /etc/exports to add a new entry for the “/srv” directory. I used an IP address range here but I could have used explicit host names.

/srv 192.168.56.0/25(rw,no_root_squash,async)

You have to restart the nfs-kernel to make that change take effect:

/etc/init.d/nfs-kernel-server restart

Then, I split out the content and index stores into three directories:

/srv/alfresco-3.1-enterprise
/srv/alfresco-3.1-enterprise-local-index
/srv/alfresco-3.1-enterprise-local-index-backup

And updated custom-repository.properties accordingly:

dir.root=/srv/alfresco-3.1-enterprise
dir.indexes=/srv/alfresco-3.1-enterprise-local-index
dir.indexes.backup=/srv/alfresco-3.1-enterprise-local-index-backup

The second node will access the database remotely, so MySQL needed to know about that:

grant all on alfresco31e.* to 'alfresco31e'@'192.168.56.4' identified by 'alfresco31e' with grant option;

Later it seemed that node1 was accessing MySQL via its static IP address rather than localhost as it used to. Rather than figure out why or where that’s config’d, I just ran the same command as the above for node1’s static IP.

With node1 all set, it was time to give node2 some attention…

My original plan was to NFS mount the node1 data directory as something like “/srv/alfresco-labs-3d-shared” because using the same directory name I would have used on a single node seemed confusing. As it turned out, I think Alfresco must keep track of that data directory name because it complained that my “dir.root” was set incorrectly. So I wound up using the same directory names that I used on node1 and making the same update to custom-repository.properties:

dir.root=/srv/alfresco-3.1-enterprise
dir.indexes=/srv/alfresco-3.1-enterprise-local-index
dir.indexes.backup=/srv/alfresco-3.1-enterprise-local-index-backup

Then I mounted the data directory:

mount 192.168.56.3:/srv/alfresco-3.1-enterprise /srv/alfresco-3.1-enterprise

I didn’t do it, but it would be smart to update /etc/fstab so that the data directory would be automatically mounted on server startup.

With that the data directories are all set. Telling node2 to use the database on node1 instead of localhost was a simple custom-repository.properties change:

db.url=jdbc:mysql://node1.alfresco.jpotts.com/alfresco31e

Now node1 and node2 are pointing to the same content store and database, and each have their own Lucene index. The last step was to configure the cluster.

Configure the cluster

Configuring the cluster involved enabling the sample ehcluster-config.xml and making a few small changes to custom-repository.properties.

To enable the ehcluster-config, I copied the ehcluster-config.xml.sample file that came with the sample extensions to ehcluster-config.xml to my extensions directory. No other changes were needed in this particular case.

In custom-repository.properties, you have to assign a cluster name to activate the cluster. The index recovery mode needs to be set to AUTO so the indexes stay in sync:

alfresco.cluster.name=testcluster
index.recovery.mode=AUTO

In Alfresco 3.1, Alfresco uses JGroups to discover and coordinate cluster members. It has configurable protocols it uses for cluster member communication. The default is set to UDP but I couldn’t get that to work, so I changed it to TCP. I also found that I had to list the hosts in my cluster in order for the two nodes to find each other:

alfresco.jgroups.defaultProtocol=TCP
alfresco.tcp.initial_hosts=node1.alfresco.jpotts.com[7800],node2.alfresco.jpotts.com[7800]

As you can see, most of the work was really about networking and data setup. The cluster configuration itself is actually pretty minimal.

Test the cluster

Before starting Tomcat on the two nodes, I enabled a log4j logger so I could see nodes join and leave the cluster:

log4j.logger.org.alfresco.enterprise.repo.cache.jgroups=INFO

After starting up Tomcat, I eventually saw this in catalina.out:

06:24:52,043 INFO [repo.jgroups.AlfrescoJGroupsChannelFactory]
Created JChannelFactory:
Cluster Name: testcluster
Stack Mapping: {DEFAULT=TCP}
Configuration: file:/opt/apache/apache-tomcat-5.5.27/webapps/alfresco/WEB-INF/classes/alfresco/jgroups-default.xml

——————————————————-
GMS: address is 192.168.56.3:7800
——————————————————-

When the second node joined the cluster, the first node knew about it:

06:26:21,241 INFO [cache.jgroups.JGroupsKeepAliveHeartbeatReceiver]
New cluster view with additional members:
Last View: null
New View: [192.168.56.3:7800|1] [192.168.56.3:7800, 192.168.56.4:7800]

Once the nodes could see each other it was time to test it out from an end-user perspective. Obviously, in production you’ll have a load-balancer in front of these two nodes. For testing the cluster, though, you want to be able to hit each node specifically. I used two different browsers on the host machine logging in as two different users. There are some short test scenarios on the Alfresco wiki. In addition to those, you might want to:

  • Create, delete, and update content while a second node is shut down. Start the second node and see if you can navigate to, search for, and read the properties of content as you would expect. Note that it may take a few seconds for the cache and Lucene index to update.
  • Check out content in one browser and verify that it is checked out on the other.
  • Simultaneously edit content properties.
  • Open the edit properties page in one browser and delete the object in another.

That’s it
In a real-world production environment you often have numerous networking issues to deal with that makes this more of a headache, but hopefully this gives you an idea of the basic steps involved, and shows you how to get familiar with it by setting up your own test cluster using virtual machines.

Keeping your Alfresco web scripts DRY

One of my teammates thoroughly drenched our under-the-desk development server with a giant cup of coffee once. Somehow, disaster was avoided, although the client’s carpet had a nice coffee-stain outline of the server’s footprint long after the app rolled out which afforded the rest of us endless opportunities to mercilessly haze the clumsy coder.

Keeping your Alfresco web scripts DRY is actually not about making your developers use spill-proof travel mugs, or better yet, virtual machines. DRY is a coding philosophy or principle that stands for Don’t Repeat Yourself. There are all kinds of resources available that go into more detail, and the principle is more broad than simply avoiding duplicate code, but that’s what I want to focus on here.

There are three techniques you should be using to avoid repeating yourself when writing web scripts: web script configuration, JavaScript libraries accessed via import, and FreeMarker macros accessed via import.

Web Script Configuration

Added in 3.0, a web script configuration file is an XML file that contains arbitrary settings for your web script. It’s accessible from both your controller and your view. Building on the hello world web script from the Alfresco Developer Guide, you could add a configuration script named “helloworld.get.config.xml” that contained:


<properties>
<title>Hello World</title>
</properties>

You could then access the “title” element from a JavaScript controller using the built-in E4X library:

var s = new XML(config.script);
logger.log(s.title);

And, you could also grab the title from the FreeMarker view:

<title>${config.script["properties"]["title"]}</title>

The web script configuration lets you separate your configuration from your controller logic. And because you can get to it from both the controller and the view, you don’t have to stick configuration info into your model.

This example showed a script-specific configuration, but global configuration is also possible. See the Alfresco Wiki Page on Web Scripts for more details.

JavaScript Import

If your web script controllers are written in JavaScript, at some point you will find yourself writing JavaScript functions that you’d like to share across multiple web scripts. A common example is logic that builds a query string, executes the query, and returns the results. You don’t want to repeat the code that does that across multiple controllers. That’s what the import statement is for.

The syntax is easy:

<import resource="classpath:alfresco/extension/scripts/status.js">

This needs to be the first line of your JavaScript controller file. This example shows a JavaScript library being imported from the classpath, but you can also import from the repository by name or by node reference. See the Alfresco JavaScript API Wiki Page for more details.

FreeMarker Import

Of the three this is the one I see ignored most often. Let’s take the Optaros-developed microblogging component for Share. Its basic data entity is called a “Status” object. So web scripts on the repository tier return JSON that might have a single Status object or a list of Status objects. That’s two different web scripts and two different views, but the difference between a list of objects and a single object is really just the list-related wrapper–in both cases, the individual Status object JSON is identical. I’ve seen people simply copy-and-paste the FreeMarker from the single-object template to the list template and then just wrap that with the list markup. That’s bad. If you ever change how a Status object is structured, you’ve got to change it in (at least) two places. (Or it’ll be someone that comes after you that has to do it which makes it even worse. If you aren’t following the analogy, your redundant code is the coffee stain).

Instead of duplicating the logic, use a FreeMarker import and define a macro that formats the object. Then you can call the macro any time you need your object formatted. Here’s how it works for Status.

A FreeMarker file called “status.lib.ftl” contains the macros that format Status objects. It lives with the rest of the web script files and looks like this:


<#assign datetimeformat="EEE, dd MMM yyyy HH:mm:ss zzz">
<#--
Renders a status node as a JSON object
-->
<#macro statusJSON status>
<#escape x as jsonUtils.encodeJSONString(x)>
{
"siteId" : "${status.properties["optStatus:siteId"]!''}",
"user" : "${status.properties["optStatus:user"]!''}",
"message" : "${status.properties["optStatus:message"]!''}",
"prefix" : "${status.properties["optStatus:prefix"]!''}",
"mood" : "${status.properties["optStatus:mood"]!''}",
"complete" : ${(status.properties["optStatus:complete"]!'false')?string},
"created" : "${status.properties["cm:created"]?string(datetimeformat)}",
"modified" : "${status.properties["cm:modified"]?string(datetimeformat)}"
}
</#escape>
</#macro>
<#--
Renders a status node as HTML
-->
<#macro statusHTML status>
SiteID: ${status.properties["optStatus:siteId"]!''}<br />
User: ${status.properties["optStatus:user"]!''}<br />
Message: ${status.properties["optStatus:message"]!''}<br />
Prefix: ${status.properties["optStatus:prefix"]!''}<br />
Mood: ${status.properties["optStatus:mood"]!''}<br />
Complete: ${(status.properties["optStatus:complete"]!'false')?string}<br />
Created: ${status.properties["cm:created"]?string(datetimeformat)}<br />
Modified: ${status.properties["cm:modified"]?string(datetimeformat)}<br />
</#macro>

There’s one macro for JSON and another for HTML. It still bothers me that the same basic structure is repeated, but at least they are both in the same file and it feels better knowing that this is the one and only place where the data structure for a Status object is defined.

The FreeMarker that returns Status objects as JSON resides in status.get.json.ftl and looks like this:

<#import "status.lib.ftl" as statusLib/>
{
"items" : [
<#list results as result>
<@statusLib.statusJSON status=result />
<#if result_has_next>,</#if>
</#list>
]
}

You can see the import statement followed by the JSON that sets up the list, and then the call to the FreeMarker macro to format each Status object in the list.

When someone posts a new Status, the new Status object is returned. Rather than repeat the JSON structure for a Status object, the status.post.json.ftl view simply calls the same macro that got called from the GET:

<#import "status.lib.ftl" as statusLib/>
{
"status" : <@statusLib.statusJSON status=result />
}

Now if the data structure for a Status object ever needs to change, it only has to be changed in one place.

Don’t Repeat Yourself

Take a look at your web scripts. Eliminate your duplicate code. And keep a lid on your mocha frappuccino.

Save 15% on Alfresco Developer Guide

Forgive this temporary transgression into a blatant sales pitch, but I’m trying to save you some money. Packt Publishing is offering ECM Architect readers 15% off my book, the Alfresco Developer Guide. To take advantage of the discount:

  1. Visit the Alfresco Developer Guide Book Page
  2. Click the “ADD TO CART” button to add the book to your shopping cart
  3. Enter AlfrescoDG-3117 in the “Promotional Code” field and click the “Update Button”. The discounted price should now be reflected in your order.

After your book arrives, if you read it and decide it was the best Packt book published in 2008, you should take a minute to vote for the Alfresco Developer Guide in Packt’s Author of the Year awards. Voting ends May 25.