Tag: WCM

Quick look at Acquia Reservoir, a Headless Drupal Distribution

Drupal is a very popular open source Web Content Management system. One of its key characteristics is that it owns both the back-end repository where content is stored and the front-end where content is rendered. In CMS parlance this is typically called a “coupled” CMS because the front-end and the back-end are coupled together.

Historically, the coupled nature of Drupal was a benefit most of the time because it facilitated a fast time-to-market. In many cases, customers could just install Drupal, define their content types, install or develop a theme, and they had a web site up-and-running that made it easy for non-technical content editors to manage the content of that web site.

But as architectural styles have shifted to “API-first” and Single Page Applications (SPAs) written in client-side frameworks like Angular and React and with many clients finding themselves distributing content to multiple channels beyond web, having a CMS that wants to own the front-end becomes more of a burden than a benefit, hence the rise of the “headless” or “de-coupled” CMS. Multiple SaaS vendors have sprung up over the last few years, creating a Content-as-a-Service market which I’ve blogged about before.

Drupal has been able to expose its content and other operations via a RESTful API for quite a while. But in those early days it was not quite as simple as it could be. If you have a team, for example, that just wants to model some content types, give their editors a nice interface for managing instances of those types, and then write a front-end that fetches that content via JSON, you still had to know a fair amount about Drupal to get everything working.

Last summer, Acquia, a company that provides enterprise support for Drupal headed up by Drupal founder, Dries Buytaert, released a new distribution of Drupal called Reservoir that implements the “headless CMS” use case. Reservoir is Drupal, but most of the pieces that concern the front-end have been removed. Reservoir also ships with a JSON API module that exposes your content in a standard way.

I was curious to see how well this worked so I grabbed the Reservoir Docker image and fired it up.

The first thing I did was create a few content types. Article is a demo type provided out-of-the-box. I added Job Posting and Team Member, two types you’d find on just about any corporate web site.

My Team Member type is simple. It has a Body field, which is HTML text, and a Headshot field, which is an image. My Job Posting type has a plain text Body field, a Date field for when the job was posted, and a Status field which has a constrained list of values (Open and Closed).

With my types in place I started creating content…

Something that jumped out at me here was that there is no way to search, filter, or sort content. That’s not going to work very well as the number of content items grows. I can hear my Drupal friends saying, “There’s a module for that!”, but that seems like something that should be out-of-the-box.

Next, I jumped over to the API tab and saw that there are RESTful endpoints for each of my content types that allow me to fetch a list of nodes of a given type, specific nodes, and the relationships a node has to other nodes in the repository. POST, PATCH, and DELETE methods are also supported, so this is not just a read-only API.

Reservoir uses OAuth to secure the API, so to actually test it out, I grabbed the “Demo app” client UUID, then went into Postman and did a POST against the /oauth/token endpoint. That returned an access token and a refresh token. I grabbed the access token and stuck it in the authorization header for future requests.

Here’s an example response for a specific “team member” object.

My first observation is that the JSON is pretty verbose for such a simple object. If I were to use this today I’d probably write a Spring Boot app that simplifies the API responses further. As a front-end developer, I’d really prefer for the JSON that comes back to be much more succinct. The front-end may not need to know about the node’s revision history, for example.

Another reason I might want my front-end to call a simplified API layer rather than call Drupal directly is to aggregate multiple calls. For example, in the response above, you’ll notice that the team member’s headshot is returned as part of a relationship. You can’t get the URL to the headshot from the Team Member JSON.

If you follow the field_headshot “related” link, you’ll get the JSON object representing the headshot:

The related headshot JSON shown above has the actual URL to the headshot image. It’s not the end of the world to have to make two HTTP calls for every team member, but as a front-end developer, I’d prefer to get a team member object that has exactly what I need in a single response.

One of the things that might help improve this is support for GraphQL. Reservoir says it plans to support GraphQL, but in the version that ships on the Docker image, if you try to enable it, you get a message that it is still under development. There is a GraphQL Drupal module so I’m sure this is coming to Reservoir soon.

Many of my clients are predominantly Java shops–they are often reluctant to adopt technology that would require new additions to their toolchain, like PHP. And they don’t always have an interest in hiring or developing Drupal talent. Containers running highly-specialized Drupal distributions, like Reservoir, could eventually make both of these concerns less of an issue.

In addition to Acquia Reservoir, there is another de-coupled Drupal Distribution called Contenta, so if you like the idea of running headless Drupal, you might take a look at both and see which is a better fit.

Using Elasticsearch to more effectively target dynamic content

Photo credit: viZZZual.com
Photo credit: viZZZual.com

One of my clients came to me with a problem: Despite being a much-admired Fortune 500 company that leads its competitors in the travel industry in customer satisfaction and profitability, their web site, through which the vast majority of their revenues flow, was still mostly static. That by itself is not a huge problem, but they felt like they weren’t able to target content based on their customers’ needs and interests as well as they could with a more dynamic content engine.

It just so happened they were about to re-implement their site from mostly server-side to mostly client-side which is a huge undertaking. They figured that would be a pretty good time to add a dynamic content service to the mix, so they called me.

From Static to Dynamic

The diagram below depicts the high-level setup before the introduction of the content service.

Original ArchitectureThis is pretty standard for sites like this. The Marketing Team edits content in a Content Management System (CMS), which in this case is Interwoven. Through various processes, binary files (mostly images), system data (things like lists of destinations and hotels), and content fragments are published out of Interwoven to destinations accessible by the e-commerce application.

A content fragment is literally a piece of content. It might be a promotion of some sort. Or it could be some text that gets used as part of a banner. The challenge using this setup is that content fragments are static files that live on the file system. If you want to show a different fragment based on something you know about the user you have to generate every permutation you might want ahead-of-time, publish them all, then use logic in the application to decide which one to use.

One obvious way to address this is to publish content fragments in a relational database and then code the front-end app to query for the right content. That wasn’t appropriate here for a few reasons:

  1. The front-end is being migrated to a collection of Single Page Applications (SPA’s) written in JavaScript. It’s easier for those pages to call a RESTful API to get JSON back. Yes, you could still do that with a relational database and a service tier, but the client was looking for something a little more JSON-native.
  2. The structure of the content changes over time. We wanted to be able to accept any kind of content fragment the Marketing Team or SPA developers could think of and not have to worry about migrating database schemas.
  3. The anticipated style of queries needed to find appropriate content fragments was more like what you’d expect from a search engine and less like what you might put in a SQL query–we needed to be able to say, “Here is some context, now return the most appropriate set of content fragments for the situation,” and be able to use relevancy scoring to help determine what comes back.

So relational databases were ruled out in favor of document-oriented NoSQL repositories. Ultimately, Elasticsearch was selected because of its ease of clustering, high performance, unified REST API, availability of commercial support, and add-ons such as Shield, Marvel, and Watcher that make it easier to integrate with the rest of the enterprise.

Introduction of a Content Delivery Service

The first thing we did was stand up an Elasticsearch cluster, load some test data, and beat the heck out of it (see “Using JMeter to Test Elasticsearch“). Once we were satisfied it would be able to handle more than the expected load we moved on to the service.

The Content Delivery Service sits between Elasticsearch and the front-end applications. Its purpose is to abstract away Elasticsearch specifics and to protect the cluster by providing a simple, read-only REST API. It also enforces some light business logic such as making sure that only content that is currently effective according to its publication and expiration date is returned.

The diagram below shows the content infrastructure augmented with Elasticsearch and the content delivery service.

Content Delivery ServiceAs seen in the diagram, Interwoven is still the source of record and the primary way Marketing manages their content. But now, content fragments and system data are published to Elasticsearch. The front-end Single Page Apps ask the Content Delivery Service for content based on some set of context. The content is returned as a collection of JSON objects. The SPAs then take those objects and format them as needed.

Content Objects are Pure Content

A key concept worth emphasizing is that a content object is pure content. It contains no markup. It might have some properties that describe how it is expected to be used, but it is completely lacking in implementation. This has several benefits:

  1. Content objects returned by the Content Delivery Service can be used across any and all channels (such as mobile) rather than being specific to a single channel (such as web).
  2. Within a given channel the same object can have many different presentations.
  3. Responsibilities are cleanly separated: The content service provides content. The front-end applications style and present the content for consumption.

This was a bit of a departure from how things used to be done. In the bad old days presentation was always getting mixed up with content which severely limits reuse.

Micro-services Provide Administrative Features

I mentioned earlier that the Content Delivery Service is read-only. And in my previous diagram I showed Interwoven talking directly to the Elasticsearch cluster. In reality, we don’t let anyone talk directly to the Elasticsearch cluster. Instead, all writes have to go through the Content Management Service. This ensures that we know exactly what is going into the cluster and who is putting it there.

The other role the Content Management Service plays is JSON validation. When new types of content objects are developed we use JSON Schema to codify the structure. When a person or system posts a content object to the Content Management Service, the service validates the object against its JSON Schema before storing it in Elasticsearch.

In addition to the Content Management Service we also implemented a Scheduled Job Service. As the name suggests, it is used to perform administrative tasks on a schedule. For instance, maybe content needs to be reindexed from one cluster to another in a lower environment. Or maybe content needs to be fetched from a third-party and written to the cluster. The Job Service is able to talk to either the Content Management Service or Elasticsearch directly, depending on the task it needs to execute.

All of the administrative services are independently deployed web applications that sit behind an API Gateway. The Gateway leverages the Netflix Zuul Proxy. It is responsible for authenticating against LDAP and creating a shared session in redis. It gives the content admin team a single URL to hit and isolates authentication logic in a single place.

The diagram below shows the fully-realized picture.

Administrative ServicesA few key components aren’t on the diagram. We use Shield to protect the Elasticsearch cluster. Shield also makes it easy to configure SSL for node-to-node communication and provides out-of-the-box LDAP integration. With Shield we can map LDAP groups to roles and then grant roles various privileges on our Elasticsearch cluster and its indices.

We use Watcher to monitor cluster health and job failures that may happen in the Scheduled Job Service. The client has their own enterprise alerting and monitoring solution, but Watcher gives the content management team a flexible, powerful tool for keeping track of things at a level that is probably more granular than what the enterprise ops team cares about.

Ready for the Future

With Elasticsearch and a few relatively small services on top of that, this travel giant now has what it needs to provide its customers with a more customized online experience. Content can be targeted to the users it is most appropriate for using any kind of context the Marketing team can come up with. As the front-end commerce app evolves, new types of content objects can be added easily and be served to the front-end with no schema or service changes required. And it’s all built on commercially-supported open source software.

Alfresco, NOSQL, and the Future of ECM

Alfresco wants to be a best-in-class repository for you to build your content-centric applications on top of. Interest in NOSQL repositories seems to be growing, with many large well-known sites choosing non-relational back-ends. Are Alfresco (and, more generally, nearly all ECM and WCM vendors) on a collision course with NOSQL?

First, let’s look at what Alfresco’s been up to lately. Over the last year or so, Alfresco has been shifting to a “we’re for developers” strategy in several ways:

  • Repositioning their Web Content Management offering not as a non-technical end-user tool, but as a tool for web application developers
  • Backing off of their mission to squash Microsoft SharePoint, positioning Alfresco Share instead as “good enough” collaboration. (Remember John Newton’s slide showing Microsoft as the Death Star and Alfresco as the Millenium Falcon? I think Han Solo has decided to take the fight elsewhere.)
  • Making Web Scripts, Surf, and Web Studio part of the Spring Framework.
  • Investing heavily in the Content Management Interoperability Services (CMIS) standard. The investment is far-reaching–Alfresco is an active participant in the OASIS specification itself, has historically been first-to-market with their CMIS implementation, and has multiple participants in CMIS-related open source projects such as Apache Chemistry.

They’ve also been making changes to the core product to make it more scalable (“Internet-scalable” is the stated goal). At a high level, they are disaggregating major Alfresco sub-systems so they can be scaled independently and in some cases removing bottlenecks present in the core infrastructure. Here are a few examples. Some of these are in progress and others are still on the roadmap:

  • Migrating away from Hibernate, which Alfresco Engineers say is currently a limiting factor
  • Switching from “Lucene for everything” to “Lucene for full-text and SQL for metadata search”
  • Making Lucene a separate search server process (presumably clusterable)
  • Making OpenOffice, which is used for document transformations, clusterable
  • Hiring Tom Baeyens (JBoss jBPM founder) and starting the Activiti BPMN project (one of their goals is “cloud scalability from the ground, up”)

So for Alfresco it is all about being an internet-scalable repository that is standards-compliant and has a rich toolset that makes it easy for you to use Alfresco as the back-end of your content-centric applications. Hold that thought for a few minutes while we turn our attention to NOSQL for a moment. Then, like a great rug, I’ll tie the whole room together.

NOSQL Stores

A NOSQL (“Not Only SQL”) store is a repository that does not use a relational database for persistence. There are many different flavors (document-oriented, key-value, tabular), and a number of different implementations. I’ll refer mostly to MongoDB and CouchDB in this post, which are two examples of document-oriented stores. In general, NOSQL stores are:

  • Schema-less. Need to add an “author” field to your “article”? Just add it–it’s as easy as setting a property value. The repository doesn’t care that the other articles in your repository don’t have an author field. The repository doesn’t know what an “article” is, for that matter.
  • Eventually consistent instead of guaranteed consistent. At some point, all replicas in a given cluster will be fully up-to-date. If a replica can’t get up-to-date, it will remove itself from the cluster.
  • Easily replicate-able. It’s very easy to instantiate new server nodes and replicate data between them and, in some cases, to horizontally partition the same database across multiple physical nodes (“sharding”).
  • Extremely scalable. These repositories are built for horizontal scaling so you can add as many nodes as you need. See the previous two points.

NOSQL repositories are used in some extremely large implementations (Digg, Facebook, Twitter, Reddit, Shutterfly, Etsy, Foursquare, etc.) for a variety of purposes. But it’s important to note that you don’t have to be a Facebook or a Twitter to realize benefits from this type of back-end. And, although the examples I’ve listed are all consumer-facing, huge-volume web sites, traditional companies are already using these technologies in-house. I should also note that for some of these projects, scaling down is just as important as scaling up–the CouchDB founders talk about running Couch repositories in browsers, cell phones, or other devices.

If you don’t believe this has application inside the firewall, go back in time to the explosive growth of Lotus Notes and Lotus Domino. The Lotus Notes NSF store has similar characteristics to document-centric NOSQL repositories. In fact, Damien Katz, the founder of CouchDB, used to work for Iris Associates, the creators of Lotus Notes. One of the reasons Notes took off was that business users could create form-based applications without involving IT or DBAs. Notes servers could also replicate with each other which made data highly-available, even on networks with high latency and/or low bandwidth between server nodes.

Alfresco & NOSQL

Unlike a full ECM platform like Alfresco, NOSQL repositories are just that–repositories. Like a relational database, there are client tools, API’s, and drivers to manage the data in a NOSQL repository and perform administrative tasks, but it’s up to you to build the business application around it. Setting up a standalone NOSQL repository for a business user and telling them to start managing their content would be like sticking them in front of MySQL and doing the same. But business apps with NOSQL back-ends are being built. For ECM, projects are already underway that integrate existing platforms with these repositories (See the DrupalCon presentation, “MongoDB – Humongous Drupal“, for one example) and entirely new CMS apps have been built specifically to take advantage of NOSQL repositories.

What about Alfresco? People are using Alfresco and NOSQL repositories together already. Peter Monks, together with others, has created a couple of open source projects that extend Alfresco WCM’s deployment mechanism to use CouchDB and MongoDB as endpoints (here and here).

I recently finished up a project for a Metaversant client in which we used Alfresco DM to create, tag, secure, and route content for approval. Once approved, some custom Java actions deploy metadata to MongoDB and files to buckets on Amazon S3. The front-end presentation tier then queries MongoDB for content chunks and metadata and serves up files directly from Amazon S3 or Amazon’s CloudFront CDN as necessary.

In these examples, Alfresco is essentially being used as a front-end to the NOSQL repository. This gives you the scalability and replication features on the Content Delivery tier with workflow, check-in/check-out, an explicit content model, tagging, versioning, and other typical content management features on the Content Management tier.

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will. NOSQL-native CMS apps will be developed (some already exist). And existing CMS’s will evolve to take advantage of NOSQL back-ends in some form or fashion, similar to the Drupal-on-Mongo example cited earlier.

What does this mean for Alfresco and ECM architecture in general?

Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications. The challenge for Alfresco and other ECM players is whether or not they can achieve the kind of scale and replication capabilities NOSQL repositories offer today before NOSQL can catch up with a new breed of Content Management solutions built expressly for a world in which content is everywhere, user and data volumes are huge and unpredictable, and servers come and go automatically as needed to keep up with demand.

If Alfresco and the overwhelming majority of the rest of today’s CMS vendors are able to meet that challenge with their current relational-backed stores, NOSQL simply becomes an implementation choice for CMS vendors. If, however, it turns out that being backed by a NOSQL repository is a requirement for a modern, Internet-scale CMS, we may see a whole new line-up of players in the CMS space before long.

What do you think? Does the fundamental architecture prevalent in today’s CMS offerings have what it takes to manage the web content in an increasingly cloud-based world? Will we see an explosion of NOSQL-native CMS applications and, if so, will those displace today’s relational vendors or will the two live side-by-side, potentially with buyers not even knowing or caring what choice the vendor has made with regard to how the underlying data is persisted?

Top Five Alfresco Roadmap Takeaways

Now that the last of the Alfresco Fall meetups has concluded in the US, I thought I’d summarize my takeaways. Overall I thought the events were really good. The informative sessions were well-attended. Everyone I talked to was glad they came and left with multiple useful takeaways.

Everyone has their own criteria for usefulness–for these events my personal set of highlights tend to focus on the roadmap. So here are my top five roadmap takeaways from the Washington, D.C., Atlanta, and LA meetups.

1. Repository unification strategy revealed

Now we know what Alfresco plans to do to resolve the “multiple repository” issue. In a nutshell: Alfresco will add functionality to the DM repository until it is on par with the AVM (See “What are the differences…“). What then? The AVM will continue to be supported, but if I were placing bets, I would not count on further AVM development past that point.

This makes a lot of sense to me. We do a lot of “WCM” for people using the Alfresco DM repository, especially when Alfresco is really being leveraged as a core repository. It also makes sense with Alfresco’s focus on CMIS (see next takeaway) because you can’t get to the AVM through CMIS.

2. CMIS, CMIS, CMIS

Clearly, CMIS is an important standard for Alfresco. (In fact, one small worry I have is that Alfresco seems to need CMIS more than any of the other players behind the standard, but I digress). Alfresco wants to be the go-to CMIS repository and believes that CMIS will be the primary way front-ends interact with rich content repositories. They’ve been on top of things by including early (read “unsupported”) implementations of the draft CMIS specification in both the Community and Enterprise releases, but there a number of other CMIS-related items on the roadmap:

  • When the CMIS standard is out of public review, Alfresco will release a “CMIS runtime”. Details are sketchy, but my hunch is that Alfresco might be headed toward a Jackrabbit/Day CRX model where Alfresco’s CMIS runtime would be like a freely-available reference CMIS repository (Alfresco stripped of functionality not required to be CMIS compliant) and the full Alfresco repository would continue as we know it today. All speculation on my part.
  • Today deployments are either FSR (Alfresco-to-file system) or ASR (Alfresco AVM to Alfresco AVM). The latter case is used when you have a front-end that queries Alfresco for its content but you want to move that load off of your primary authoring server. In 3.2, the deployment service has gotten more general, so it’s one deployment system with multiple extensible endpoint options (file system, Alfresco AVM, CouchDB, Drupal, etc.). Alfresco will soon add AVM-to-CMIS deployment. That means you can deploy from AVM to the DM repository. Does it mean you can deploy to any CMIS repository? Not sure. If not, that might be a worthwhile extension.
  • One drawback to using DM for WCM currently is that there is not a good deployment system to move your content out of DM. It’s basically rsync or roll-your-own. On the roadmap is the ability to deploy from DM instead of AVM. This is one of the features the DM needs to get it functionally equivalent to what you get with the AVM. I wouldn’t expect it until 4.0.

3. Shift in focus to developers

Alfresco WCM has always been a decoupled system. When you install Alfresco WCM you don’t get a working web site out-of-the-box. You have to build it first using whatever technology you want, and then let Alfresco manage it. So, unlike most open source CMS’, it’s never been end-user focused in the sense of, “I’m a non-technical person and I want a web site, so I’m going to install Alfresco WCM”. Don’t expect that to change any time soon. Even Web Studio, which may not ever make it to an Enterprise release, is aimed at making Surf developers productive, not your Marketing team.

Alfresco is realizing that many people discard the Alfresco UI and build something custom, whether for document management, web content management, or some other content-centric use case. To make that easier, Alfresco is going to rollout development tools like Eclipse plug-ins, Maven compatibility, and Spring Roo integration (Uzi’s Spring Roo Screencast, Getting Started with Spring Roo ).

Alfresco has also announced that web scripts, web studio, and the Surf framework will be licensed under Apache and there were allusions to “making Surf part of Spring” or “using Surf as a Tiles replacement”. I haven’t seen or heard much from the Spring folks on this and I noticed these topics were softened between DC and LA, but that could have just been based on who was doing the speaking (see “What do you think of Alfresco’s multi-event approach?“).

Essentially what’s going on here is that Alfresco wants all of your future content-centric apps and even web sites to be “CMIS applications”, and Alfresco believes it can provide the best, most productive development platform for writing CMIS apps.

4. Stuff that may never happen but would be cool if it did

This is a grab bag of things that are being considered for the roadmap, but are far enough out to be uncertain. Regardless of if/when, these are sometimes a useful data point for where the product is headed directionally.

  • Native XML support. Right now Alfresco can manage XML files, obviously, but, unlike a native XML database like eXist or MarkLogic, the granularity stops with the file. Presumably, native XML support would allow XML validation, XPath and XQuery expressions running against XML file content, and better XSLT support.
  • Apache Solr. I think the goal here is to get better advanced search capability such as support for faceted search, which is something Solr knows how to do.
  • Repository sharding. This would be the ability to partition the repository along some (arbitrary?) dimension. Sharding is attractive to people who have very, very large repositories and want to distribute the data load across multiple physical repositories, yet retain the ability to treat the federation as one logical repo.

5. Timeline

Talk to Alfresco if you need this to be precise, but here’s the general idea of the timeline through 4.0 based on the slides I saw:

  • 3.2 Enterprise 12/2009
  • CMIS 1.0 Release Spring 2010
  • 3.3 Enterprise 1H 2010
  • 4.0 Enterprise 12/2010 (more likely 2011)

Thanks, Alfresco, and everyone who attended

Lastly, thanks to Nancy Garrity and the rest of the team that put these events together. I enjoyed presenting on Alfresco-Drupal in Atlanta and giving the Alfresco Best Practices talk (Alfresco Content Community login required).

I always enjoy the informal networking that happens at these events. There’s such a diverse group of experience levels, use cases, and businesses–it makes for interesting conversations. And, as usual, thanks to the book and blog readers who approached me. It always makes me happy to hear that something on your project was better for having read something I wrote. It was good meeting you all and I’m looking forward to the next get-together.

Understanding the differences between Alfresco’s repository implementations

People new to Alfresco are often unaware of the existence of two different repository implementations within the product. One, which I’ll call the “DM Store”, is the classic store, the one that’s been used by Alfresco since the beginning. The other, the “WCM Store” or, as it is often referred to in API-speak, the “AVM Store”, was born with the addition of the Alfresco WCM product offering. Whether you are doing document management or web content management, you use the same Explorer client, but under the covers, your content lives in two very different types of repositories.

The Alfresco story on why a second repository implementation was created is that the Engineers writing WCM didn’t believe the DM store was capable of providing the kind of support for versioning, branching, and layering functionality they needed (hence, the AVM acronym, which stands for Advanced Versioning Manager) so they created an entirely new repository implementation to support WCM.

Why does this matter, apart from being a possible topic of conversation at your next get-together (“Healthcare is easy to fix. Do you think Alfresco will ever unify their two repository implementations?”)? It matters because the “two sides” of Alfresco are not equivalent in terms of functionality and depending on what you need to do, you may find yourself performing unnatural acts to work around the disparity.

Many projects will be completely unaffected by the differences between Alfresco DM and Alfresco WCM. But it is important to know what these differences are when you first begin to plan your solution to avoid uncomfortable conversations between you and your customer when you realize you’ve made a bad assumption.

I’ll assume you know the high-level capabilities of both Alfresco DM and Alfresco WCM. Obviously there are some things one product can do that the other can’t that are by design (sandboxes and virtualization in WCM, for example). What’s more important to understand are the subtle (and sometimes not-so subtle) differences between the two. Here’s the list and a table that summarizes, if you are into the whole brevity thing:

Content Modeling. Alfresco DM uses a proprietary XML-based description of the content model while Alfresco WCM uses XML Schema. On the surface this isn’t a big deal, but it does mean if your repository contains a mix of DM- and WCM-stored data, you won’t have a single model that defines it all and you could possibly have duplication between the two.

Custom Content Types. In Alfresco DM, when you create content, you tell Alfresco what its content type is. If you’ve extended the out-of-the-box model, you can have any number of business-specific content types with your own custom metadata. In Alfresco WCM, custom content types are not supported. In WCM, your content type is your web form. Interestingly, although the “Type” dropdown is shown in the “Create Web Content” dialog, and it will contain custom content types you’ve defined using the Alfresco DM model, your selection will not be honored. All AVM content is created as an instance of the “avmplaincontent” content type no matter what you select. However, although you must do it through an API call, you can apply custom aspects to AVM content.

User Interface Configuration. Alfresco DM uses a proprietary XML-based configuration file to define the “property sheets” that display metadata in the Alfresco Explorer client for a given content type or aspect. Alfresco WCM uses the embedded Chiba XForms engine to inspect the XML Schema (XSD) and automatically create a web form that will produce data that conforms to the XSD. XSD annotations can be used to influence the presentation of the form fields. One outcome of this is that it is much easier to localize things like property labels in Alfresco DM than it is in Alfresco WCM.

User Interface Extension. If you need to change how the Alfresco Explorer client behaves, there are some things you can do through XML, but advanced customizations will require JavaServer Faces (JSF) development. Alfresco DM and WCM both use the same Explorer client so this applies to both (See “Alfresco User Interface: What are my options?”). However, if you need to change how the web form engine works, you may need to write new Chiba XForms widgets. For instance, Optaros developed a web form used to describe points and regions on Google Maps. That kind of thing requires you to understand how to extend Chiba.

Structured (XML) data entry. Data entered in an Alfresco WCM web form is saved as XML that conforms to the XSD you’ve defined. There is no similar facility for capturing data as XML available within Alfresco DM. At one point the Community code line had “ECM Forms” which was essentially WCM web forms for the DM side of the house, but that’s disappeared in the latest Community release. On the DM side, when you edit metadata you are editing object properties whose values get stored in the database, not as XML.

Transformations. You can use either Freemarker or XSLT to transform Alfresco web form XML into other formats. That transformation is defined as part of the web form configuration which you do within the Explorer client. In Alfresco DM, transformations are more about binary file transformations (DOC to PDF or GIF to PNG, for example). If you want to do Freemarker or XSLT transformations on XML content stored in Alfresco DM, you’ll need to write that yourself (an Action would do the trick). If you want to do DM-style transformations on binary files in Alfresco WCM, that’s not out-of-the-box. You’ll have to do that using the API.

Rule actions. Alfresco DM allows you to configure rules on folders to trigger actions (out-of-the-box or custom) to operate against newly-added, updated, or deleted documents. Alfresco WCM does not support rule actions at all.

Auditing. Alfresco DM has a granular auditing sub-system. You can configure it to audit just about anything you want. Anything except WCM. You can audit web project creation, but not changes to individual web assets within a web project. At least not out-of-the-box.

Object-level permissions. In Alfresco DM you can assign users and groups to roles at the folder and file level. In Alfresco WCM, the UI will only let you go as low as the web project level. The API supports more granular security but you have to implement that yourself with custom code.

Search. Everything in Alfresco DM is full-text indexed and searchable. In Alfresco WCM, only the Staging Sandbox of each web project is indexed. You can do a search from your user sandbox but you’re really searching the Staging Sandbox. If you have any content you’ve created in your user sandbox that you have not yet committed to Staging, web project search won’t find it. Another limitation is that you cannot search across web projects. That search box that’s visible in the far upper right-hand corner of the Alfresco Explorer client is the Alfresco DM search–it won’t find anything in any of your web projects.

Advanced Workflow. Alfresco DM and Alfresco WCM use the same JBoss jBPM workflow engine so there’s no functional difference between what you can do with workflow on either side. The only catch is that in Alfresco DM, all deployed workflows show up in the “Start Advanced Workflow” dialog whereas in WCM, you have to tell Alfresco which deployed workflows are okay to use for WCM. That’s covered in the Alfresco Developer Guide and on the wiki.

File protocols. CIFS and FTP are the only two file protocols supported by both Alfresco DM and Alfresco WCM. Similar protocols supported by Alfresco DM such as WebDAV, inbound SMTP, and IMAP, are not supported by Alfresco WCM.

Deployment. Some people use Alfresco DM to manage content that is published to the web because they don’t need the additional features WCM offers, or they have some other reason to export content to another server. Unfortunately, Alfresco DM does not yet offer a deployment component like the one in Alfresco WCM. If you want to export content from Alfresco DM to some other destination in a systematic way, you’ll have to roll your own solution.

As John and Paul said, “It’s getting better”

Some of these differences will become less drastic in coming releases. For example, Alfresco is implementing a new form service that will be used to define the content model and user interface across the entire product line, so that helps. The WCM deployment functionality is also being refactored and will ultimately work for both DM and WCM. And at every community event Alfresco talks about “repository unification” as a goal for the future, although the timeline is lightyears away in terms of software releases.

As I said, depending on what you’re doing these differences may not affect you at all. Just make sure you don’t assume that a given feature is available everywhere, and make sure you’ve made a conscious decision about what content to put in which repository (DM or WCM) based on your requirements.

Alfresco 3.1 Enterprise is a significant release for WCM users

I don’t typically write a post every time a vendor releases a new version of software but Alfresco 3.1 Enterprise, which became available for download on Tuesday, is a significant release. Users of 2.2, who had previously been asked not to upgrade to 3.0, should now upgrade to either 2.2 SP3 or 3.1. All of the fixes in 2.2 SP3 are included in the 3.1 release. If you are currently on 2.2 SP3, the biggest reason to move to 3.1 would be to take advantage of Alfresco Share. Conservative users may decide to wait until 3.1.1.

One big change with 3.1 is that modified items are now merged with the Staging sandbox asynchronously. That means when submitting modified items, users are immediately returned to the sandbox list and will still see the modified items until they are committed. In a cursory test of this, I was surprised at how long it took for those changes to leave my Modified Items list. Maybe the polling interval is configurable.

This change affects the out-of-the-box WCM submission workflow. So if you created your own custom WCM submission workflow, you’re going to need to make some changes. The required changes are documented in the release notes, or you can take a look at the new submission workflow to get the gist.

Another new feature of Alfresco 3.1 is the REST API for WCM. A lot of URLs you probably already created on your own are included. The new API includes things like adding users to web projects, creating user sandboxes, creating, updating, and deleting assets, submitting assets, and more. Even if you already built these yourself, you should take a look to see if these meet your needs. Why continue to maintain your own custom code?

The 3.1 release also marks a shift in Alfresco’s “two flavors” approach. According to John Newton (post), Alfresco is looking for ways to entice large Enterprise users to migrate from Labs to Enterprise. So they’ve created functionality that they feel only appeals to large Enterprises and are making it available only to people paying for subscriptions. This includes things like monitoring (JMX, Hyperic plug-in), proprietary database extensions, and clustering.

Newton says 100% of the source code will still be available for both releases and that fixes made in Enterprise will be made available in the next Labs release (although he didn’t say how long Labs releases will lag behind Enterprise).

Other noteworthy 3.1 fixes or enhancements include:

  • No one (not even admin) can write to a Staging sandbox in WCM.
  • Share now includes a “Links” component (which means I don’t have to finish coding the “Bookmarks” component we started but never finished). There are numerous other Share enhancements around Calendar, rich text editing, and previewing.
  • Actions can now have AND/OR conditions and can trigger on property values.
  • A new group called ALFRESCO_ADMINISTRATORS has been created that makes it much easier to designate administrator users other than admin.

See the release notes for a full list of Jira tickets addressed by the release.

Reminder: DFW Alfresco Meet-up is Monday

Don’t forget to sign-up for the first ever DFW Alfresco Meet-up. It’s happening Monday, 3/9 at Ackerman McQueen over in Las Colinas. Plan to arrive around 5:30 and we’ll start our first topic at 6:00. We’ll hear about Ackerman McQueen’s recent Alfresco WCM-based project as well as the portal implementation built on Alfresco DM and Django (a Python-based framework) from the folks over at Neiman Marcus.

We’re letting Optaros pick up the tab on food and drinks so if you’re doing an Alfresco project right now or considering it, you need to join us. Come share what you’ve learned with others and maybe leave with a few new ideas as well.

Address and directions are on the sign-up page.