Category: Open Source

Why Alfresco follows the same strategy as closed source commercial software companies

soundwalk_by_finishing-schoolMy friend and colleague, Peter Löfgren, recently wrote a blog post on what he sees as the two possible strategies Alfresco could pursue. He describes the two approaches as follows:

Vertical: You try to get as many as possible Community installs be converted to paying Enterprise customers. Convert from the bottom and up if you so like.

Horizontal: You try to get as many as possible to run Alfresco, even if they run the free Community version. A fair share will always want commercial support to have a professional backing. A broad approach, where you get market presence and self-sustained marketing.

Peter argues that, to date, Alfresco has used a vertical approach, which is really about pushing Enterprise Edition and begrudgingly acknowledging that Community Edition is an acceptable alternative, only when the client can’t currently justify the expense of an Enterprise Edition license.

Peter observes that the vertical approach pits Alfresco Enterprise Edition squarely against Community Edition making it very tempting for Alfresco sales people to badmouth Community Edition, because they often see it as cannibalizing their revenue. I actually don’t see this happening much anymore–sales people know that denigrating Community Edition is counter-productive because customers are savvy enough to know it is the same software and a good portion of the sales team understands the bigger picture.

Before I go any further, I should refer you to a blog post I wrote about a year ago called, “The plain truth about Alfresco’s open source ethos“. In it, I argue that Alfresco’s marketing strategy isn’t the community’s concern, and that the company is basically a “normal” software company that won’t ever be the dogmatic open source company many of us wish it was.

But Peter brought it up and I like discussing such things, so I’ll ignore my own advice and provide my take on why I think Alfresco will continue to ignore the horizontal strategy and will continue to basically act like any other traditional software company…

Here’s my take

First, let’s compare Alfresco with another commercial open source company, Elastic. Elastic is the company behind the popular search engine Elasticsearch as well as a variety of related big data and analytics tools such as Logstash, Kibana, and Beats. Like Alfresco, Elastic makes money from support, and their incentive to get people to pay for support is to offer a set of products they make available only to paying customers. Unlike Alfresco, Elastic ships a single distribution of their products. For a given release of Elasticsearch, for example, there is no difference between what a paying and non-paying customer downloads and runs. It’s just that if you want some additional value-add on top of what’s freely-available, you have to pay.

So this is an example of a horizontal pursuit as Peter describes it. The reason it is working for Elastic, though, is that their offerings are much more horizontal than Alfresco’s. Elasticsearch is more popular and more widely used in all kinds of use cases. Their download stats are impressive and increasing steadily.

Alfresco, on the other hand, is niche software. It is quite narrowly focused on document management. Yes, there are a lot of use cases within that, but it isn’t something that you see embedded in all sorts of applications like you would a database, a workflow engine, or a search engine. I suspect download stats are flat or maybe even decreasing, although this is a bit of an apples/oranges comparison as the two products are in different phases of maturity and adoption.

The other issue is one of leadership. Elastic’s CEO, Steve Schuurman, exudes open source. He was a co-founder of SpringSource, for example. Both he and Shay Bannon, the creator of Elasticsearch, have said repeatedly that Elastic will always be open source and that they’ll never have an Enterprise-only distribution of their core software. That’s the kind of leadership I expect from a commercial open source company.

Contrast that with the current leadership at Alfresco. When former CEO John Powell announced his retirement, the board could have chosen someone with open source credibility like Elastic’s Steve Schuurman. Instead, they went with Doug Dennerline. He and his lieutenants have next to zero open source credibility or experience. It is clear they were brought on solely to take the company public. For them, open source is not a driving part of their worldview. Instead, their focus is simply to build a software company and take it public, employing whatever strategy gives them and their shareholders the biggest revenue numbers year-after-year. (I don’t mean to paint this in an overly-negative light–it is what it is. I’m just trying to point out the stark contrast in motivation and philosophy between the two leadership teams).

Unfortunately, a horizontal strategy does not necessarily equate to those kind of numbers. With Red Hat as the notable exception, it is hard to find a commercial open source company with financials that Doug, the board, and investors are looking for.

Do the math. Let’s assume there are 50,000 installs of Alfresco Community Edition. I have no reason to believe that is accurate–this is just an exercise. What kind of conversion rate would you expect? You’ll probably guess too high, forgetting that Alfresco is now very expensive, even for modest installations, and that the company is still working to add more differentiation in its paid offering compared to the free product. Let’s use 2%. So that’s 1,000 paying customers, which is roughly the number John Newton disclosed publicly in a keynote several years ago. It’s probably higher now, but remember that there is attrition we haven’t accounted for and those customers have to be earned year after year.

Now, what do you think the average sales price of Alfresco is across all of their paying customers? Again, just spit-balling, let’s say it is $100k annually. Multiply that times 1,000 and that’s “only” a company with $100 million in annual revenue. If you’re looking for a $1 billion IPO, that’s not enough. (If EMC sells Documentum to someone for 10x revenue I’ll have to update this post, but I think I’m pretty safe).

In a horizontal strategy, those are your levers: Total installs, conversion rate, attrition, average sales price. For example, using the horizontal approach, to increase revenue from $100 million to $300 million you would have to triple the number of Community Edition installs from 50,000 to 150,000. Alternatively, you could keep CE installs steady at 50,000 and instead triple your conversion rate from 2% to 6%. Which seems easier?

My bet is that rather than increasing total Community Edition installs, Alfresco will find it easier to increase the conversion rate by increasing differentiation between the two products, cutting attrition by implementing “customer success programs” and consulting, and continuing to put upward pressure on the average sales price by charging more for the core product and finding new paid add-ons to sell.

The horizontal approach Peter advocates may be the one we all wish would work, but I think that ship has sailed.

Using Elasticsearch to more effectively target dynamic content

Photo credit:
Photo credit:

One of my clients came to me with a problem: Despite being a much-admired Fortune 500 company that leads its competitors in the travel industry in customer satisfaction and profitability, their web site, through which the vast majority of their revenues flow, was still mostly static. That by itself is not a huge problem, but they felt like they weren’t able to target content based on their customers’ needs and interests as well as they could with a more dynamic content engine.

It just so happened they were about to re-implement their site from mostly server-side to mostly client-side which is a huge undertaking. They figured that would be a pretty good time to add a dynamic content service to the mix, so they called me.

From Static to Dynamic

The diagram below depicts the high-level setup before the introduction of the content service.

Original ArchitectureThis is pretty standard for sites like this. The Marketing Team edits content in a Content Management System (CMS), which in this case is Interwoven. Through various processes, binary files (mostly images), system data (things like lists of destinations and hotels), and content fragments are published out of Interwoven to destinations accessible by the e-commerce application.

A content fragment is literally a piece of content. It might be a promotion of some sort. Or it could be some text that gets used as part of a banner. The challenge using this setup is that content fragments are static files that live on the file system. If you want to show a different fragment based on something you know about the user you have to generate every permutation you might want ahead-of-time, publish them all, then use logic in the application to decide which one to use.

One obvious way to address this is to publish content fragments in a relational database and then code the front-end app to query for the right content. That wasn’t appropriate here for a few reasons:

  1. The front-end is being migrated to a collection of Single Page Applications (SPA’s) written in JavaScript. It’s easier for those pages to call a RESTful API to get JSON back. Yes, you could still do that with a relational database and a service tier, but the client was looking for something a little more JSON-native.
  2. The structure of the content changes over time. We wanted to be able to accept any kind of content fragment the Marketing Team or SPA developers could think of and not have to worry about migrating database schemas.
  3. The anticipated style of queries needed to find appropriate content fragments was more like what you’d expect from a search engine and less like what you might put in a SQL query–we needed to be able to say, “Here is some context, now return the most appropriate set of content fragments for the situation,” and be able to use relevancy scoring to help determine what comes back.

So relational databases were ruled out in favor of document-oriented NoSQL repositories. Ultimately, Elasticsearch was selected because of its ease of clustering, high performance, unified REST API, availability of commercial support, and add-ons such as Shield, Marvel, and Watcher that make it easier to integrate with the rest of the enterprise.

Introduction of a Content Delivery Service

The first thing we did was stand up an Elasticsearch cluster, load some test data, and beat the heck out of it (see “Using JMeter to Test Elasticsearch“). Once we were satisfied it would be able to handle more than the expected load we moved on to the service.

The Content Delivery Service sits between Elasticsearch and the front-end applications. Its purpose is to abstract away Elasticsearch specifics and to protect the cluster by providing a simple, read-only REST API. It also enforces some light business logic such as making sure that only content that is currently effective according to its publication and expiration date is returned.

The diagram below shows the content infrastructure augmented with Elasticsearch and the content delivery service.

Content Delivery ServiceAs seen in the diagram, Interwoven is still the source of record and the primary way Marketing manages their content. But now, content fragments and system data are published to Elasticsearch. The front-end Single Page Apps ask the Content Delivery Service for content based on some set of context. The content is returned as a collection of JSON objects. The SPAs then take those objects and format them as needed.

Content Objects are Pure Content

A key concept worth emphasizing is that a content object is pure content. It contains no markup. It might have some properties that describe how it is expected to be used, but it is completely lacking in implementation. This has several benefits:

  1. Content objects returned by the Content Delivery Service can be used across any and all channels (such as mobile) rather than being specific to a single channel (such as web).
  2. Within a given channel the same object can have many different presentations.
  3. Responsibilities are cleanly separated: The content service provides content. The front-end applications style and present the content for consumption.

This was a bit of a departure from how things used to be done. In the bad old days presentation was always getting mixed up with content which severely limits reuse.

Micro-services Provide Administrative Features

I mentioned earlier that the Content Delivery Service is read-only. And in my previous diagram I showed Interwoven talking directly to the Elasticsearch cluster. In reality, we don’t let anyone talk directly to the Elasticsearch cluster. Instead, all writes have to go through the Content Management Service. This ensures that we know exactly what is going into the cluster and who is putting it there.

The other role the Content Management Service plays is JSON validation. When new types of content objects are developed we use JSON Schema to codify the structure. When a person or system posts a content object to the Content Management Service, the service validates the object against its JSON Schema before storing it in Elasticsearch.

In addition to the Content Management Service we also implemented a Scheduled Job Service. As the name suggests, it is used to perform administrative tasks on a schedule. For instance, maybe content needs to be reindexed from one cluster to another in a lower environment. Or maybe content needs to be fetched from a third-party and written to the cluster. The Job Service is able to talk to either the Content Management Service or Elasticsearch directly, depending on the task it needs to execute.

All of the administrative services are independently deployed web applications that sit behind an API Gateway. The Gateway leverages the Netflix Zuul Proxy. It is responsible for authenticating against LDAP and creating a shared session in redis. It gives the content admin team a single URL to hit and isolates authentication logic in a single place.

The diagram below shows the fully-realized picture.

Administrative ServicesA few key components aren’t on the diagram. We use Shield to protect the Elasticsearch cluster. Shield also makes it easy to configure SSL for node-to-node communication and provides out-of-the-box LDAP integration. With Shield we can map LDAP groups to roles and then grant roles various privileges on our Elasticsearch cluster and its indices.

We use Watcher to monitor cluster health and job failures that may happen in the Scheduled Job Service. The client has their own enterprise alerting and monitoring solution, but Watcher gives the content management team a flexible, powerful tool for keeping track of things at a level that is probably more granular than what the enterprise ops team cares about.

Ready for the Future

With Elasticsearch and a few relatively small services on top of that, this travel giant now has what it needs to provide its customers with a more customized online experience. Content can be targeted to the users it is most appropriate for using any kind of context the Marketing team can come up with. As the front-end commerce app evolves, new types of content objects can be added easily and be served to the front-end with no schema or service changes required. And it’s all built on commercially-supported open source software.

Using Hubot and Watcher to automate Elasticsearch admin tasks via chat

hubot-avatar@2xAlmost all of my client work is remote. For many projects, that means chat is an essential communication tool. When you and your team essentially live in a chat window it’s nice when your tools can participate in the conversation. Luckily, it’s pretty easy to wire this up. Let me show you how I did it for a recent Elasticsearch project.

Openfire: An open source chat server

Today, hosted chat services like Slack and HipChat get all of the attention. The approach I outline in this blog post will work with those tools too, but on this particular project we’re running an open source chat server on-prem called Openfire. Openfire has been around for a long time. I like it because it is open source, easy to install, and will run anywhere you can run Java.

Because it implements an open protocol called XMPP (aka Jabber) there are a variety of chat clients that will work with it. Openfire ships a web-based client called Spark and some of my teammates use that, but most of the time I use Adium on my Mac.

If you need help installing Openfire, take a look at the docs.

Inbound and outbound integration with Elasticsearch

Once your chat solution is working, it’s time to integrate it with Elasticsearch. For my requirements I needed two “directions” for this integration. First, I wanted to be able interrogate one or more of my Elasticsearch clusters from within chat. This “outbound” integration requires a “bot”. There are many open source bots to choose from and examples of bot scripts working with Elasticsearch. I’ll cover both shortly.

The other direction I needed was “inbound”–I wanted my Elasticsearch cluster to be able to tell the chat server when something is wrong with the cluster. This requires something to monitor the health of the cluster (we use Watcher, a paid add-on from Elastic) and a web hook that can use the chat server API to send messages.

Let me cover the outbound implementation–the bot–first. Then I’ll talk about Watcher and the web hook which make up the inbound implementation.

Hubot: An open source chat bot from Github

There are a number of chat bots out there. I went with Hubot from Github. Hubot is based on Node.js. Hubot scripts are written in Coffeescript. However, if you are new to Node or Coffeescript there are plenty of examples out there so don’t let that stop you from using Hubot.

I used this blog post to get Hubot working. However, there were a few gotchas I should point out:

  • I had to use an old version of node.js (0.10.23). The newer version was having a lot of trouble with one of its dependencies and I got tired of fooling with it.
  • The blog post lists some Linux dependencies you need to install, but it leaves one out that’s critical: libicu. On Centos this is libicu-devel and on Ubuntu it is libicu-dev.
  • The blog post specifies some environment variables that need to be set. If you are running Hubot with Openfire, the HUBOT_XMPP_ROOMS variable needs to be set to the fully-qualified conference room name. For example, if the Hubot username is “hubot” running on a host named “grumpy” the variable should be set to hubot@conference.grumpy.
  • You may have to set HUBOT_XMPP_HOST to the hostname of your Openfire server.

Other than that, you should be able to use that blog post to get Hubot and Openfire working.

Hubot and Elasticsearch

There are Hubot scripts that do all sorts of stuff. One of the fun things about adding a bot to your chatroom is to have it do something silly. Maybe every time someone uses the word “Dude” the bot throws out a quote from the Big Lebowski, for example. So you’ll see lots of stuff like that. But there are also more useful examples out there. Here is the one I started with. The hubot-elasticsearch script knows how to use the Elasticsearch API to spit out information about nodes, indices, allocation, and settings. And it allows you to alias your clusters so you don’t have to constantly tell the bot what your URL endpoints are.

Out-of-the-box, the hubot-elasticsearch project is not compatible with Shield, but it’s a decent start. I made a small tweak to get it to work with Watcher, which I’ll cover next.

Watcher: Monitoring and Alerting for Elasticsearch

This particular client is a paying customer of Elastic, which means they are entitled to paid-only add-ons such as Shield (secures the cluster) and Watcher (for monitoring and alerting).

Watcher is pretty handy and we’re glad to have it, but if you aren’t able to use it for some reason, writing your own tool for running tasks on a schedule isn’t too tough. I wrote something similar using Spring MVC and Quartz, for example. You just need something that will periodically interrogate the cluster and then take some action based on some condition. But if you are an Elastic customer there’s no need to build it. The rest of the post assumes that’s the case.

I’ll let you read the Watcher docs to learn more, but at a high level, a watch consists of a trigger, an input, a condition, and an action. The trigger is the schedule. The input might be an Elasticsearch query or the response from some random HTTP endpoint. The condition looks at the input and then decides whether or not action is needed. The action taken might be to send an email, create some data in Elasticsearch, or invoke a web hook.

For my needs, the web hook action is perfect–if one of my watch conditions is met, like maybe something goes wrong with my cluster and the cluster state goes to red, Watcher will invoke my web hook which will post a message in the chat room. Here’s what the action part of my watch definition looks like:

"actions": {
    "notify_chat": {
        "webhook": {
            "method": "POST",
            "host": "localhost",
            "port": 8008,
            "path": "/chat",
            "headers": {
                "Content-Type": "application/json"
            "body": "cluster_health alert: Someone needs to look at the DEV cluster. It appears to be in a RED state."

Watcher can have any number of actions listed for a given watch. In this case, I’m using a single “webhook” action called “notify_chat” that does a POST to a URL running on port 8008. That URL could be anything, and it can include basic authentication.

Web Hook: Spring Boot, Spring MVC, and Smack

I’ve been using Spring Boot lately when I need to knock out a quick RESTful API. In this case, I just needed something to listen to the “/chat” end point. When it is called, the code grabs the message posted to it and uses the Smack API to connect to the chat room and post the message. This webapp is probably less than 10 lines of code and Spring Boot packages it up nicely for me.

If you need help with this part take a look at the Smack API Multi User Chat docs.

Tweaking the bot to allow watch acknowledgement

Watcher can throttle or suppress actions based on a time period (“Don’t tell me about this condition again for 5 minutes,” for example) or explicit acknowledgement. If a watch is triggered that uses explicit acknowledgement, I want to be able to acknowledge that from within chat. You already saw that the Elasticsearch Hubot script can talk to the cluster. It’s pretty easy to tweak the script to allow Watcher acknowledgement.

First, I added a function called “ackWatch” that actually does the work of acknowledging the watch:

ackWatch = (msg, watch_id, alias) ->
  cluster_url = _esAliases[alias]

  if cluster_url == "" || cluster_url == undefined
    msg.send("Do not recognize the cluster alias: #{alias}")
    msg.send("Acknowledging watch: #{watch_id}")

    .put() (err, res, body) ->

Then, I added the regular expression that the bot should be listening for:

robot.hear /elasticsearch ack (.*) (.*)/i, (msg) ->
  if is

  ackWatch msg, msg.match[1], msg.match[2], (text) ->
    msg.send text

With that in place, any user in the chat room can acknowledge a watch by typing, “hubot: elasticsearch ack some_watch some_alias” where some_watch is the ID of a watch and some_alias is the nickname for the cluster you’re talking about (like “dev”, “qa”, or “prod”, for example).

Putting it all together: A short demo

With all of this in place, my Elasticsearch clusters can tell the team when something interesting is going on and the team can acknowledge that alert and do preliminary investigation by interrogating the cluster, all from the comfort of their chat window.

The video below shows this working. In it, I create a simple watch that invokes a web hook to post a message to the chat room when a watch condition is met.

The demo uses a simple example where the alert is triggered when the test index is a certain size. But you could easily wire up any watch to the same action, such as when your cluster state goes red or when CPU or RAM reach a certain threshold.

This was relatively simple to put together, but hopefully you can see how you could build on this to automate all kinds of things related to monitoring, alerting, and administration of your Elasticsearch cluster from chat.

Would the commercial open source software you depend on survive a zombie apocalypse?

2596483147_58d6bae3b1_mPick a commercial open source software product that’s important to you. Now imagine that there’s been a global zombie attack. Fortunately, after much heroics and stylized cinematic violence, the attack is eventually thwarted. Unfortunately, it wasn’t soon enough. Everyone who receives a paycheck from the company behind your favorite commercial open source company is now a zombie in the steadfast pursuit of acquiring brains for food–they could care less about your sev one support ticket. The question is this: Can the open source software project survive without its commercial backer?

My blog readership consists of multiple audiences. For some of you, the question posed is interesting because your company depends on that software to run its business. For you, this is a “sole source supplier” problem with the primary concern being: If the software vendor goes away, is there another place I can get the same software?

Some of you are like me–you provide professional services around commercial open source software. The commercial company produces the software you deliver to your clients but may also provide other types of assistance such as marketing, business development, and training.

Still other readers might be classified as “community members” who give time and resources toward the project. Members of the first two groups are often members of this group as well. It is this group–the community–that becomes critical in answering this survivability question as we shall soon see.

Okay, so back to the scenario. Zombies attack. Zombies subdued. Sadly, all of the employees of our commercial open source software vendor are no more. Why is the question of survivability even a question? Remember, we are talking about open source software with a primary commercial backer. Your organic open source projects are safe from zombie attacks, for the most part, because there is no single company responsible for its development. For organic open source projects with a large, healthy community, the load of building and maintaining software is distributed across potentially hundreds or thousands of individuals employed by many different companies. Commercial open source, on the other hand, has a single company that pumps significant dollars into engineering, research and development, sales, and marketing. If that company goes away, there’s a clear risk to the project. Stakeholders need to understand this risk.

For the rest of this analysis, I’m going to ignore the very real investments commercial open source vendors make around marketing, business development, community management, and training. These things matter, of course, but they don’t matter at all unless you ship code. As the smoke clears from the zombie attack, the single most important thing facing the surviving stakeholders is getting out the next release.

Suppose our zombie-stricken company sold support for what was already an extremely successful open source project. There are many people who regularly make very significant contributions to the code base and documentation. There are lots of people answering questions in forums and on IRC. That healthy community means there is a viable population of non-zombies who will obviously mourn the loss of their former collaborators, but will no doubt bravely carry on.

Now, instead, imagine a commercial vendor who is really open source in license only–they lack a community entirely. There are exactly zero people outside of the company who know anything at all about where to get the code and how to build and test it, let alone the intricate details of how it works. It’s clear this project is doomed. Maybe someone would fork the code base and start a new company but that would be tough. Unlike what MariaDb did after the Oracle-MySQL acquisition, our scenario doesn’t allow a new company to bootstrap with experienced founders or engineers, at least those who were employed by the vendor at the time of the zombie attack.

From looking at these two extreme ends of the spectrum it is clear that the thing that allows the project to continue is the viability of the project as an independent open source project and that depends on the health and makeup of the community.

To answer this question–whether or not a commercial open source project could operate as an independent organic open source project–there are several aspects of the current community to assess:

Pre-commercial open source success. If the software project was a successful organic open source project before the commercial company appeared, this is a good indicator that if that company were to go away, the software project would survive.

Governance model. Is the software and associated trademarks under the control of an independent foundation? If so, that’s good–a lot of the details and processes will already be taken care of. Formal governance of this sort is designed around ensuring viability independent of a single commercial backer.

Number of external committers. How many committers are employed by the commercial company? If the answer is 100% that’s bad for survivability. In our scenario anyone working for the company is now a zombie which means there will be fewer experienced people to work on the next release.

Number of external contributions. What kind of contributions have typically come from the community? If it is low volume or mostly insignificant patches, this is another negative indicator. If the project stands a chance of survival post zombie attack it needs significant activity from independent contributors because that provides a population of people who are familiar with the technical details of the product.

Install base. A product installed in a large number of big companies would be ideal but even a few large “anchor” companies would be sufficient. The surviving community will need to pool its resources to move the project forward. If it is strategic to stakeholders with deep pockets, those stakeholders may be willing to invest in the software’s future.

Product complexity. The simpler, the better. When a product gets complex, its engineering team starts to specialize. A large community could do that too, but if the product is complex and the surviving community is too small, there may not be enough people who can go deep enough on every part of the system to maintain the entire thing. A complex product also requires more resources to test and build.

Upstream dependencies. If the product is a relatively thin veneer on a handful of well-known upstream dependencies, that’s good. It means you might be able to depend on some of those upstream communities for help. However if the system is complex and has a lot of smaller dependencies, it means the surviving community has a lot to learn and the upstream communities may lack the resources to be of much help.

Downstream dependencies. If the software is used by many other projects downstream, that’s a good thing because it significantly increases the number of people interested in keeping the project alive.

Company diaspora. How much turnover has there been over the life of this company’s engineering department? Low turnover means there will be fewer people in the market who know the deep dark secrets of the code base.

Community makeup. This is closely related to the “external contributions” aspect. If the community is mostly “power users” that’s a problem for survivability. What you need is a community that has a significant number of individuals who are passionate and technical enough to dedicate time and resources to moving the software forward. It obviously helps if the software is strategic to their employers.

You might assess your favorite commercial open source software and realize that it isn’t viable without commercial backing. That doesn’t mean it’s a bad company or a bad product. What it does mean depends on your perspective:

If you depend on the software for your business…

  • And you are evaluating the software for purchase, you cannot give “availability of source code” a much higher score against other alternatives because one of the benefits you derive from that–the ability to continue forward should the company go away–is less likely to be realized. (see “Why Clients Choose Open Source”).
  • And you already have the software installed, have a backup plan in mind should something happen to the vendor, just like you would a proprietary vendor. If you deem a major event to be likely, consider moving to an alternative now.

If you are part of the commercial open source software project’s community…

  • Push for changes in the aspects identified above that would make the project more viable as an independent project. Realize, however, that not all vendors are enlightened (or empowered) enough to execute these changes.
  • Be okay with the fact that you are dedicating time and energy to something that depends mostly (or entirely) on its commercial backing. This means not getting bent out of shape when the commercial company does something to further its commercial interests. Or don’t be okay with it and move on.
  • Assess your zombie attack survivability as a community and identify initiatives that address areas of weakness.

If you are the commercial open source software vendor…

  • Be careful how you market your open source-ness. If your software fails the zombie survivability test but you tout open source as a major advantage, your marketing message may ring flat.
  • Be open to ideas that could give your software the ability to outlive you. This makes it more attractive to customers and community members alike. It makes your open source claim more genuine.

Obviously the specific threat of a zombie attack is far-fetched. But the risk of a commercial open source company being acquired, folding, or radically shifting their business model is quite real. Just because “commercial open source” has “open source” in it, does not magically give it a vibrant community that can fulfill the vital engineering and quality assurance role that the commercial company often provides. Even a vibrant community may lack the proper make-up to step in should it need to.

Regardless of which role you play, if you are a stakeholder in a commercial open source software product, it is important you assess this risk and have a plan should the need arise.

How my teen-aged son became a Mozilla contributor

Jeff and Justin zip-lining during Mozilla Work Week in Whistler
Jeff and Justin zip-lining during Mozilla Work Week in Whistler
I’m in Orlando for Mozilla Work Week, which is an event that happens a couple of times a year aimed at bringing the organization’s far-flung employees and contributors together to collaborate on projects.

But I’m not here because of my own code contributions–I’m here as a chaperone. My 17 year-old son, Justin, earned his third trip to Work Week as an invited guest by dedicating time and code towards Mozilla’s mission of a free and open web.

Mozilla has thousands of contributors worldwide. Only a handful get invited to Work Week, so this is a Proud Open Source Dad Moment® for me, but I thought I’d share his story in the off chance that it motivates you or your kids to participate.

A few years ago Justin asked how he could get more involved in an open source project. He had just finished Google Code-In, which matches up High School students with open source projects. It’s a cool program, but it only lasts a few months and the projects he contributed to were somewhat esoteric. He was looking for two things in a project: it needed to mean something to him personally so that it would hold his interest and it needed to offer a technical ramp that would help grow his skills.

I told him to take a look at Mozilla. My own experience contributing to Mozilla was that they were good at getting contributors aligned with projects based on their skills and interests (check out their signup page).

At the time, Justin was new to coding. I had taught him Python but he wasn’t feeling confident enough to put those skills to use on a public project. So he started reviewing and editing technical documentation. This was a perfect place to start because it offered a chance to learn more about the culture of the organization and its tools and processes while also being exposed to the dizzying array of products, acronyms, and technologies in play at Mozilla.

(For more on teaching kids to code, see “Kids these days: Learning to code then and now“).

Then one day I walked into Justin’s room. He was hammering away at his laptop. “What are you working on?”, I asked. Without looking up he replied, “I’m writing some automated test scripts in Python using Selenium.” He was clearly “in the zone” so I just said, “Oh, cool,” and headed back down the hall, but I was pretty pumped–my kid was committing code.

Justin had found his way onto the Web Quality Assurance team. They make sure all of Mozilla’s sites are running correctly. This led to a cool father-son moment. I had been contributing some code to Mozillians, which is one of the sites Justin’s Web QA team was responsible for. Justin came to me with a problem: He could improve the efficiency of his tests if a small change was made to one of the site’s pages. He was reluctant to make the change but he knew I could do it quickly, so I did, and we ended up spending an afternoon hanging out in his room, squashing bugs and testing.

As his confidence and skills grew, so did his influence and responsibilities within Mozilla. He became a “vouched” Mozillian and, later, was added to some of the internal systems reserved for trusted contributors, and was ultimately granted the ability to commit code directly to Web QA’s projects. He participated on as many of the team’s online meetings as his school schedule would allow, began mentoring other contributors, and was given his own project to run.

This gradual increase in trust and recognition that occurs over time as a contributor spends time with a project is a key part of what makes open source work. He and I had discussed “open source” as a concept a lot, but it didn’t really sink in until he became part of it, and it has been wonderful to watch.

Something that’s been particularly instructive is how Mozilla treats its contributors. During a family vacation to San Francisco we toured both of Mozilla’s offices. Every person we were introduced to stopped what they were doing and thanked us. We felt special and appreciated.

This attention extends to code contributions. When I made my first contribution I worked with a mentor of sorts who helped me understand the workflow and coding standards for the project. If he didn’t hear from me in a while he’d shoot me a note to see if I still had time to work on the project. His continued interest made me feel like my contributions mattered, even if they were small.

Justin’s mentor went even further, offering career advice and making personal introductions. He’s even been helping Justin find internship opportunities.

When I think about everything his Mozilla experience has given him, I’m amazed. Of course he’s been apply to apply his coding skills in real world applications. But there is also a set of important technical skills that are hard to teach in a classroom, like how to be productive in a multi-developer git workflow. Perhaps more crucial are the softer skills, like how to give constructive feedback to teammates or how to pitch an idea. And the cool thing is that while he’s soaking up all of this, he’s helping further Mozilla’s mission, which is something both of us believe in.

If you know someone who believes in a free and open web, and they have the time and interest to get involved and to stick with it, encourage them to signup and get involved. There’s plenty to do for all skill levels and interests.

Kids these days: Learning to code then and now

300px-Epson-hx-20This little beauty is an Epson HX-20. It’s one of the first laptop computers. What makes it special to me is that its 4 line LCD screen is where I took my first coding steps in the early 1980’s.

My Dad was an IT guy. He’d bring home the Epson so he could work remotely. I’d been to his office before and had spent hours playing hide-and-seek in the data center with my sister or hucking write protect rings from mag tapes across the room like frisbees. To this day a raised floor and the smell of chilled, vaguely plastic, air brings back fond memories. But what he did all day at work was completely abstract to me until he brought that little Epson home.

Learning those first few lines of BASIC removed a lot of the mystery about how computers worked. It was clear I could tell the computer to do anything I wanted and it would follow those instructions to the letter. I quickly caught the bug and eventually got my own computer (We were a staunch Atari family–sorry Commodore fans!).

Fast forward 30 years…

Now I work in software and my own son is learning to code. When he was 12 or so he was running Linux on his laptop. Then he became interested in coding so I taught him Python. He has since picked up JavaScript, some HTML/CSS, and is now learning Java in high school.

When I think about his experience learning to code compared to mine, there are some similarities:

  • Exposure. He has access to various devices (laptops, PCs, Macs, tablets) and operating systems at home and at school.
  • Curiosity. More than using these devices for entertainment, he’s curious about how they work. I see plenty of kids with their faces buried in a tablet for hours on end. But if they don’t care to know how those apps or that game or that web site came to be they may not be ready to dive into coding.
  • Encouragement. What’s great is that not only does he have his Dad and Grandpa encouraging him to dive in, he’s got lots of teachers who understand the importance of code. (More on the curriculum in a minute).

So those are the ingredients that were the same for both of us. What’s different for this generation of kids learning to code? A lot, and it’s not just about ever-increasing computing power at lower and lower prices.

There are a ton of great resources kids have to help them learn to code these days. Let me call out some of them. I’ll put them in three buckets: Tools, Curriculum, and Communities.


When I learned to code the tools were virtually non-existent. I had a computer, the BASIC programming language, and a reference manual. Now there are all kinds of freely-available, cross-platform tools that are great for teaching kids to code. Here are a few.

For many kids, Lego Mindstorms is a great entry point into coding. This is coding that feels like play. Mindstorms comes with a drag-and-drop programming app that you use to build apps as if you were assembling building blocks. The app is then downloaded to the robot and boom–the kid sees their own creation obeying their commands.

If you don’t like the drag-and-drop IDE there are several traditional coding environments for a variety of languages that work with Mindstorms.

Scratch was the first coding tool my son was exposed to in school. Hosted at MIT, Scratch was purpose-built for teaching kids to code. Its graphical interface is similar to Mindstorms in some respects.

One of the things I love about Scratch is that kids can share their projects and use the projects of others as a starting point for their own creations. It’s never too early to start teaching kids about open source-style collaboration!

Alice is a story-telling tool. Kids manipulate 3D objects on a canvas. By giving instructions to the objects in the virtual world, kids learn the basics of object oriented programming.

Jeroo is another tool that teaches kids programming by manipulating objects in a virtual world. In this one you have an island. On the island are these mammals–Jeroos–who like to eat flowers. Coding lessons involve instantiating instances of Jeroos and getting them to eat instances of flowers.

Whereas Scratch, Alice, and Jeroo are similar to multiple object-oriented langauges, Greenfoot is about teaching kids a specific language: Java. Working in Greenfoot students write standard Java to build all kinds of programs. It’s a nice tool for learning Java before moving into a full-fledged IDE.

The RaspberryPi is a tiny computer and a code-learning tool all wrapped into one for $25 – $35. About the size of a credit card, you plug this little guy into a keyboard, mouse, and monitor and you’ve got yourself a Linux box with 512 MB of RAM that’s ready for code. Python, Java, C, C++, Ruby, Scratch are all installed by default.


I grew up in a small rural town. In the mid-80’s, my high school computer science curriculum consisted of a year of typing (on these things called typewriters) and a year of programming. The programming class was divided into a semester of typing and 10-keying (this time, on a computer) and a semester of BASIC.

In our school system programming starts fairly early in 7th grade using tools that don’t really feel like coding and then is offered every year, gradually getting more complex throughout high school.

Here is the curriculum by grade. I’ll include rough age equivalents so my readers outside the U.S. can map it to their grade levels:

  • 7th Grade (~13 years old): Technology, Tools: HTML, Scratch, Alice
  • 8th Grade (~14 years old): Principles of Communication in Business & Technology, Tools: HTML, Scratch, Alice
  • 8th Grade (~14 years old): Engineering, Tools: Google Sketch-Up, Lego MindStorms, Vex Robotics
  • 9th Grade (~15 years old): Pre-AP Computer Science, Tools: Scratch, Alice, Jeroo, Greenfoot (Java)
  • 10th Grade (~16 years old): AP Computer Science, Tools: Eclipse (Java)

Unfortunately, that’s it. Nothing is offered beyond AP Computer Science.


When I was a kid there was no Internet, CompuServe was out of my financial reach, and Bulletin Board Systems were just taking off. That meant learning to code (especially in a small town) was largely a solitary pursuit.

With ubiquitous network connectivity, today’s kids have instant access to a globally connected community of learning resources, mentors, and others who are also learning to code. Here are a few examples of resources and communities my son has found helpful:

  • Codecademy is a community of educators and students. There are courses on Python, JavaScript, Ruby, HTML, and others, all available for free.
  • Khan Academy is another free learning community that offers computer programming.
  • Once kids know a little bit about programming, if they are interested in learning Python, my son found Google’s Python Class very useful.

The Role of Open Source in Teaching Kids to Code

Open source software plays a huge role in teaching kids to code. All of the tools I’ve mentioned in this blog post are open source and freely-available. But, more importantly, kids can dig into the hundreds of thousands of open source projects that are out there to see how they work and, eventually, to participate in those projects by writing documentation, helping with QA, and coding bug fixes. Open source doesn’t care how old you are. Mozilla, in particular, has been a wonderful and supportive project that my son has been participating in.

One of the ways high school students can get introduced to open source is through the Google Code-In. Each Fall various open source projects create a bunch of tasks they need done. Students participating in the Code-In work on these tasks. Tasks might be things like writing documentation, creating unit tests, working on a web site, you name it.

At the end of the Code-In, the open source projects select the students who made the most and best contributions and those students (and a parent) are flown to San Francisco for a tour of the Googleplex.

Encourage Your Kid and/or Somebody Else’s Kid

Today kids are exposed to computing nearly all day, every day. A small percentage of them will be curious enough to want to learn how it works. You can use some of the tools and resources I’ve listed here to encourage them to dive in and you can get them plugged in to open source and code-learning communities on the Internet so they can dig deeper on their own.

The hardware, software, and network have changed a lot since I was a kid, but one thing hasn’t changed: A wonderful world of discovery and opportunity awaits for young coders.

This week, December 9 – December 15, is Computer Science Education Week. They are encouraging people to spend an hour teaching someone to code. They’ve got tutorials ready to go. All they need is your time!

What tools, resources, or communities has your kids leveraged as they’ve learned to code? How does your local high school’s computer science curriculum differ from this one? Let me know!

5 drivers of your decision to pay for commercial support

DISCLOSURE: I work for a commercial Open Source Software company called Alfresco that earns revenue by selling commercial support for the software we freely distribute under an OSI-approved license.

A recent CMSWire post cited a study commissioned by a proprietary software vendor to show that Open Source Software is of lesser quality than proprietary. The person funding the survey is quoted as saying, “You certainly can’t use Open Source for something that’s the lifeblood of the company.”

I commented to one of my colleagues that it seemed like this guy just stepped out of a time machine from the 90’s and is still spouting anti-Open Source Software FUD that was dispelled years ago. And because the entire planet, except for this one guy, realizes that Open Source Software in fact runs the mission critical operations of many companies of all sizes and, basically, most of the Internet, the cloud, mobile phones, tablets, and countless other electronic devices, I shall not spend any more time on his nonsense here.

What the post did do, however, was to get me thinking about how companies make the decision to pay for commercial support of the Open Source Software they have deployed in their companies.

In talking it over with my colleague, Richard Esplin, we’ve come up with five drivers that companies think through when they decide whether or not to pay for commercial support of the Open Source Software they use. You can think of this like a scorecard. The higher a software packages scores across these dimensions, the more likely it is that the company will pay for support.

First, a couple of assumptions. Let’s assume that a company has the budget to pay for support. Companies and non-profits that don’t, don’t have the luxury of this decision–they make due. And, let’s further assume that commercial support for a given piece of Open Source Software can be obtained within that budget. Not all OSS has a commercial support option, or one that fits within their budget.

Now, the company is left with a decision: Do we pay for support or not?

Richard and I argue that this decision comes down to these five drivers:

Mission criticality

This one is simple: The more mission critical a piece of Open Source Software is to the business, the more it makes sense to pay someone to help keep that software running.

Switching Costs

If something goes wrong with a library I’m using as a developer, I may be able to find a similar library or I’ll just code around it. However, if my Open Source CRM, ERP, or ECM system fails, it is a different story. Those migrations don’t happen overnight. The higher the switching costs, then, the greater the need for someone to be there to help push through any issues that do arise.


Are you one of millions of people using a particular piece of Open Source Software or one of ten? One of the benefits of Open Source Software over proprietary software is that “with enough eyes, all bugs are shallow”. If there is a huge population of people using the software, that means it has been used for a variety of use cases on a variety of platforms. The risk of a problem caused by using the software in a new and unique way is lessened. And a thriving community not only helps with finding and fixing bugs, it can also make it easier and cheaper to get trained up and to get ideas around approaches or to discover new use cases by engaging with that community.

If, however, the Open Source Software is narrowly-adopted, a lot of the benefit of the network effect is lessened. In that case, rather than crowd sourcing support, a company pays someone to do it.


Open Source Software runs the gamut of complexity, from relatively simple libraries and small web applications to operating systems and databases and entire software suites. A company’s willingness to pay for support for software comprised of a few thousand lines of code that performs a small number of functions is going to be vastly different than that of a package comprised of millions of lines of code with hundreds of moving parts. Also consider some Open Source Software projects that are built on top of multiple layers of more foundational components, which are also Open Source.

The more complicated the stack (large code base, lots of moving parts, lots of layers and dependencies), the more it makes sense to pay a single vendor with deep expertise in the entire stack.

Cost of Self-Support

The final driver is the cost of supporting the software internally. In this case, I mean “cost” in the economic sense of the word, to be inclusive of the real dollars it costs to hire, train, and employ people with expertise in that software but also in terms of the opportunity cost experienced when a company spends their own fiscal and human resources on supporting the software instead of on other things. For a huge company with a deep bench of under-utilized people with technical skills in a certain area, for example, the cost of self-support may be relatively cheap. Companies where that is not the case are more willing to pay someone else to support the software because it is extremely costly for them to do it on their own.

Weigh Each Driver to Make a Decision

You can see how a strong score in one of these areas might offset a weaker score in others. For example, maybe you are using an Open Source Software package that is widely Adopted and has low Switching Costs but is of relatively high Complexity. Even if the Cost of Self-Support is high relative to what commercial support would cost, you might still not pay for support because you judge that the benefits of wide Adoption and low Switching Costs outweigh everything else.

Mission Criticality has the potential to trump the other drivers, though. A company whose existence depends on a piece of software that is completely unsupported appears wreckless, despite how low the actual risks may be when a package scores low on the other dimensions. Commercial support works like an insurance policy in this case.

All software breaks. Companies have to decide how they will deal with that when it happens. Companies may weight each of these drivers differently, but I think these five drivers are a good model of the decision at hand. And isn’t that the cool thing about Open Source Software? A company gets to make their own decision whether or not they want to pay for commercial support and from whom they want to buy it. I think that’s pretty awesome.

Will Abson’s Wonderful World of Dashlets

Back when Alfresco first launched Surf, the framework on which Alfresco Share is based, a handful of us went to Chicago to hang out in a conference room on a ship in the harbor where we did a deep dive on the framework and then came up with proposed add-ons that would leverage it. I was at Optaros at the time. Our add-on was the Alfresco Share microblogging component and we also did some Surf Code Camps. The goal, of course, was to get the word out about Surf and encourage others to develop and contribute Share customizations.

The deep dive was great and the code camps that followed were valuable and well-attended. What I think the approach missed was that you don’t need to be a Surf expert to code some simple dashlets. We were handing out “How to Fly the Space Shuttle” when we probably should have started with “Building and Launching Your First Model Rocket”.

That’s why Will Abson is my current Alfresco community hero. At this year’s Alfresco Kickoff meeting in Orlando (notes), Will showed a project he and a few others have been working on called Share Extras. Share Extras is a collection of small projects ranging from “Hello World” dashlets to custom theme, data lists, and document action examples.

For example, the list of what I’d call simple, mash-up examples includes things like:

  • Twitter Feed Dashlet – Shows a specific Twitter user’s feed.
  • Twitter Search Dashlet – Shows a Twitter feed based on a hashtag.
  • BBC Weather Dashlet – Shows weather feed from BBC.
  • Flickr Dashlets – Shows flickr photos in a slideshow.
  • Google Site News – Shows the last ten blog posts from Google News.
  • iCal Feed – Shows entries from an iCal feed.
  • Notice Dashlet – Stores/shows arbitrary text, like what you’d use for a maintenance message or an announcement.
  • Train times – Shows the National Rail train schedule.

From there, you can move on to more extensive examples. For example, rather than simply displaying data from public services, these examples start to store/retrieve data in the underlying Alfresco repository:

  • Site Tags Dashlet – Displays a tag cloud consisting of tags used in your site.
  • Site Poll Dashlet – Uses a custom data list type called Poll to configure a simple poll. Shows results in bar chart.
  • Document Geographic Details – Adds a map using the document’s geocoding metadata just below the permissions section.
  • Sample Data Lists – A simple data list example that lets you capture info on Books (author, title, ISBN).
  • Execute Script Custom Document Action – Shows an example of adding a custom action to the action list that runs server-side JavaScript against a node.

The nice thing is that (almost) every one of these extensions deploys as a self-contained JAR file. Will’s build assumes you are running the repository and the Share web apps in the same container, so it deploys the JAR to $TOMCAT_HOME/shared/classes/lib, but you can obviously tweak that if your config is different. The ability to run everything out of a JAR, including what would normally be file system based resources like CSS, client-side JavaScript, and images is a relatively new feature (3.3, I think). It’s much nicer than fooling with AMPs.

Here is a list of my five favorites from the collection:

  • Node Browser – A port of the Explorer client’s node browser to the Share UI. I like this one because it brings an extremely useful developer tool into Share, which is where most of us are spending time these days. It also shows how you can plug your own tools into Share’s admin console.
  • Red Theme – A simple custom theme example. This is on my favorites list because creating a custom theme is something that is requested often and should be easy to do. Follow this example to create your own.
  • Site Geotagged Content Dashlet – Adds a dashlet that shows a map of geotagged content contained in the document library. I can’t help it. I like maps.
  • Site Blog Dashlet – Dashlet that shows site blog posts. This is a favorite because it plugs a hole in the product. If you’re going to use the blog tool in a Share site, you’re going to want to show those posts somewhere and a dashlet makes a lot of sense.
  • Wiki Rich Content – Automatically puts a table of contents at the top of a wiki page based on the headings contained within the page. Also does a nice job with pre-formatted text. This is another example of a feature that should probably be in the core product.

The Google Code project includes screenshots for each of these projects, but it is really easy to do a checkout on the code, import the projects into Eclipse, create a file in your home directory to override the tomcat.home prop, then run “ant hotcopy-tomcat-jar” to deploy one and see it in action for yourself. I tried them all out on Alfresco 3.4d Community and they worked great. I think all but one or two will work on 3.3.

The Share Extras project includes a Sample Project with a folder structure and Ant build that you can clone and use as a starting point for your own development. If you create something cool, you should share it on Google Code and then let me know about it. Or give it to Will and he can add it to his ever-growing pile of cool Share add-on examples.

Now available: Alfresco Fivestar Ratings add-on for Alfresco Share

A couple of weeks ago I posted a survey asking if anyone saw any value in a five star ratings widget for Alfresco Share. Honestly, it would have only taken one or two positive responses–even if no one needed one, there’s value in it for example’s sake. It turns out about 20 readers of this blog voted positively, so I went ahead and knocked it out.

This Alfresco Share customization makes it possible for any document in the repository to become “rateable”. When a document is rateable, the Alfresco Share user interface will show a clickable five star ratings widget. The stars light up to indicate the average rating for that document. Users simply click one of the stars to post their own rating. When clicked, the widget refreshes itself with the updated average.

Here is a short screencast that demonstrates the customization. You’ll want to make it full screen.

To implement this, I took the Someco Ratings Service from the Alfresco Developer Guide, moved it to the Metaversant namespace, and changed the names of my Spring beans and JavaScript root variable. Even though my initial target Alfresco version is 3.3, I didn’t want the code to conflict with Alfresco’s new back-end-only ratings service in 3.4 which uses some of the same names that were in the book. I also changed the JSON that the ratings web scripts use to be closer to what exists in 3.4. That way, when I do make a version that works with 3.4, it could potentially work with either my ratings back-end or Alfresco’s.

I then went to work on the UI side, integrating the widget into Share’s document details page, document library (both Share and repository views), search results page, and document-related dashlets. To go from what was in the book to a working integration I revamped the client-side ratings JavaScript from a set of functions to an actual object. Then, I started injecting my own methods into Alfresco’s client-side object prototypes to drop my widget in where appropriate.

Alfresco is still working to make customizations like this more modular and easier to plug in alongside their code and code from the community. Until then, be aware that if your Alfresco implementation already has customizations that override some of the same web scripts and client-side components this module does, there may be some manual integration needed. If you have an out-of-the-box installation (or a set of customizations that won’t conflict with this one) you can deploy the AMP to the Alfresco WAR and the Share customizations to the Share WAR and you’ll be set.

The Alfresco Fivestar Ratings project lives at Google Code. Feel free to check out the source, try it out, and use it on your projects. If you find a bug, log it, then fix it!

Apache Chemistry cmislib 0.4 incubating now available

Apache Chemistry LogoThe Apache Chemistry development team is pleased to announce that the 0.4 incubating release of cmislib, the Python client API for CMIS, is now available for download. You may have to use one of the backup servers until the mirrors fully update. Alternatively, you can use easy_install to install cmislib by typing “easy_install cmislib”.

This release has various fixes and enhancements that the community has contributed since cmislib joined the Apache Chemistry project with its 0.3 release. If you are using Alfresco, you might be interested in an enhancement in cmislib 0.4 that makes it possible to use ticket-based authentication instead of basic auth.

For those who haven’t used it, cmislib makes it easy to work with CMIS-compliant repositories from Python.