Alfresco Acquired by Private Equity Firm

Alfresco Software, Inc. announced today that it is being acquired by private equity firm Thomas H. Lee Partners (“THL”).

Alfresco had been rumored to be on the block for quite some time. There was speculation that cash was dwindling, but I have no way to confirm that was the case.

To be sure, Alfresco has a vision for the future that is going to require resources, and making yourself available to be purchased is one way to get resources. A private equity acquisition couldn’t have been their first choice, but there just didn’t seem to be any other companies out there for whom acquiring Alfresco made any kind of sense.

How Will This Affect You?

Private equity firms are investors–they buy companies with the express intent of selling that company in the future to make a profit. This is very different than if Alfresco had been acquired by a larger tech company or a competitor in the ECM space or if had raised funds through an IPO.

If you are a customer and you are aligned with Alfresco’s future vision of content services running in the cloud, decomposed into a set of independent services, then you should be hopeful that THL also believed in that vision, and that this acquisition will make it more likely to materialize. Day-to-day, I suspect nothing much will change for customers in the short-term.

If you run Community Edition, or your content management needs are not quite as visionary, cross your fingers and hope that the new investors don’t completely ignore the the parts of the business that don’t have a direct and immediate impact on their return on investment.

Private equity firms don’t buy and hold. The clock is now ticking on THL’s investment. Look for a renewed emphasis on anything and everything that can pump up Alfresco’s attractiveness to future suitors or public investors.

January 20, 2018

Future of Alfresco Share Remains Foggy After DevCon

CORRECTION: The original version of this post attributed comments to John Knowles. John wasn’t at DevCon. The comments should have been attributed to Mark Heath, VP of Product Development. Also, the ADF announcement was at BeeCon 2016 in Brussels. Sorry for the mistake and thanks to alert readers for the correction.

This week Alfresco held a conference for its developer community in Lisbon, Portugal. Alfresco has been very focused on its new Alfresco Developer Framework (ADF) in terms of both marketing and engineering, and that was reflected in this year’s conference program.

However, there has been a lot of confusion and concern amongst customers and the rest of the community regarding the future of Share, the out-of-the-box web client that ships with Alfresco. In this post, I’m going to focus on why there is confusion, what, if anything, got cleared up during the conference, and speculate on what might happen going forward.

Summary of Customers’ Concerns

Alfresco Share was originally built using a proprietary framework called Surf. It was immediately controversial because even at that time (roughly 2009) there were widely-used frameworks that Alfresco could have chosen to build upon, but didn’t.

Fast forward to BeeCon 2016 in Brussels when Alfresco announced it would build a new framework featuring components based on the popular AngularJS framework. This was a welcome announcement because it painted an appealing vision of a future where a broader community of developers would be able to develop applications using well-known frameworks and established skills. But it also caused concern because, for the seven years prior, customers had been configuring and extending Alfresco Share in a myriad of ways ranging from small tweaks to massive custom applications. With Alfresco building a new developer framework, it seemed unlikely that Share, built on the old, proprietary framework, would have much of a future.

Another concern is about what a customer can expect when they install Alfresco in terms of base functionality. Alfresco Share was created when the company was going after Microsoft SharePoint, so it includes basic document management as well as some light collaboration features. A central question customers have is whether or not Alfresco will eventually replace Alfresco Share with something else, and, if so, will it address the “light collaboration” use case. Until this week Alfresco was largely silent on this point.

What We Learned about the Future of Share at DevCon

During the conference, Richard Esplin, one of Alfresco’s product managers, showed a slide that confirmed what had previously been speculated: Alfresco Share will be deprecated–some day. Most found this unsurprising, but it was the first time Alfresco had made a public statement to that effect.

This was touched upon again by Thomas DeMeo, Alfresco’s VP of Product Management. During the closing Q & A session he answered a question about the future of Share by saying (paraphrasing), “Will there be another Share? No there will not be another Share. But as the ADF continues to evolve we will release more components which could be used to build all kinds of apps”. I think some people heard, “There will be no Share replacement”, but I interpreted this as “There will not be a feature-by-feature port of Alfresco Share to ADF called Share” and my interpretation was confirmed by multiple high-level Alfresco employees, although I did not speak directly to Thomas about this.

What happened next seemed to reinforce the “Share is going away without a replacement” view. Mark Heath, who is VP of Product Development at Alfresco, said something like, “We want to be a platform company. We do not want to develop applications. We want to be the platform and let you guys develop applications.” Again, I am paraphrasing and was unable to find Heath to get a clarification, but discussions with employees indicate that’s pretty clearly how he feels.

So the messaging around the future of Share continues to be a bit of a mess. What we do know is that Share will go away some day, but we don’t know when. It could be years. What we also don’t know is what, if anything, will take its place.

What Might Happen Next

When Alfresco introduced Share, there was already a web client called Explorer. Just like Share, many customers had extended and customized Explorer. To help those clients, Alfresco kept both clients around for a long time until we eventually bid Explorer goodbye. There is no reason to think Alfresco will behave any differently this time around.

I realize Alfresco wants to be a platform company. But that doesn’t mean it can provide only a library of components and a couple of example applications unless it wants to radically alienate its existing customer base and go after a completely different market than it does now. Maybe that will be what happens over many years, but I don’t see it happening abruptly. So there will have to be some sort of Share replacement, even if Thomas doesn’t want to call it that and despite the fact that developing and supporting applications may not be ideal for a platform company.

Can you imagine implementing Alfresco for a customer and then saying, “Okay, everything is installed and working great. But before you can actually use it for anything, you’ll need to use these components to assemble an application that does what you want.” It would be like buying a car, except it only comes as a chassis, an engine, and four tires.

Alfresco points out that they are already providing at least two example applications built with the ADF. Those are helpful for developers, but a short time-to-value demands that a production-ready, supported, configurable, and extensible client be made available to customers out-of-the-box.

I suspect Alfresco will realize this and will ultimately provide it. If the past is prolog, the current “Example Content App” might evolve to be that thing.

If that does not happen, one or more of the following will happen:

Customers will cling to Alfresco Share for as long as possible and may ultimately delay its deprecation by threatening to not renew their support subscription unless Share support is continued.
Partners will start developing competing front-ends (funded by their clients). Of course alternative front-ends already exist, but you’ll see this increase, big-time.
The community might step up and organize around a true open source project that aims to approximate Alfresco Share, either with ADF or with their own components. I floated this idea on Twitter during the conference and it sparked a lot of discussion.
The Alfresco Share code base could fork. If Alfresco decides to end support for Alfresco Share before customers are ready, which I find highly unlikely, people who need it could carry it forward. A slight variation on this would be if Alfresco volunteered to make Share a community project as they’ve done with other products for which they’ve dropped support.
Customers could decide to migrate to some other vendor’s product.

There are many customers who don’t use Share at all. I suspect some within Alfresco believe that because many of their biggest clients don’t use Share anyway it wouldn’t be a big deal to sunset it without a replacement. I’m hoping that there’s a stronger contingent that realizes it’s not that simple and that there are a variety of customers using the platform. Alfresco can’t afford to walk away from customers who can’t or don’t want to develop and support their own custom apps for simple document management or light collaboration use cases.

The bottom line is that you should not count on Alfresco Share being around forever. This will take years to unfold, but we should all wrap our heads around that fact now and plan accordingly.

Photo Credit: Mark Gunn, CC by 2.0

December 2, 2017

Alfresco announces major changes coming in the 6.0 release

Last week, Alfresco sent a notice to its customers (link requires a support login) outlining some major changes coming with 6.0, the next major release of the software.

The general theme of these changes appears to be that of streamlining and simplifying the platform, both in terms of functionality it offers and in terms of how it is deployed and run.

I’m glad to see this happening–the platform is littered with good ideas that were implemented, iterated upon once or twice, and then abandoned (I’m looking at you, Web Quick Start). There is never a good time to make such significant changes to a shipping product, but it was probably past time to do this.

The high-level goals probably won’t surprise you. They can be summarized as: Better Integration, Containerization, and Enhanced REST API. The devil is in the details, and there are a few items that don’t fit neatly into those categories, so let me run-down each of the major items and give you my take.

Bye bye binary installers

Right off the bat, Alfresco says they are going to eliminate the binary installers and replace them with Docker containers. The idea is to make it easier to deploy across environments and data centers, whether those are on-premises or in the cloud.

My take: I like Docker and I know there are people running Alfresco in containers, but this isn’t going to be that interesting to most of my current clients, at least in the near-term, because they aren’t yet ready to use containers in production. Instead, it’s going to cause pain because we’re going to have to manually install Alfresco rather than use the binary installer. On the other hand, there are many things that need to be done for a production install that the binary installer does not cover, so the installer wasn’t really buying us much anyway.

Was nice knowing you, WAS

Alfresco is going to make the repository directly executable without the need for a separate servlet container. This means JBoss, WebSphere, and WebLogic will no longer be supported. The announcement says that support for running within Tomcat will also be dropped in the future, though it does not say when.

My take: I rarely see those “enterprise-y” app servers used any more, especially in the context of Alfresco. Most people run on Tomcat. But those running on Tomcat rarely run anything else besides Alfresco in the same Tomcat instance, so the value that Tomcat brings is pretty low. Making the repo directly executable nets out to a Good Thing.

Sayonara CIFS

It’s funny–CIFS is often the killer feature clients cite as the thing that made it possible for Alfresco to be their shared drive replacement. But it also seems to be a huge source of headaches. Alfresco says security vulnerabilities have made continuing with it untenable. Customers should switch to AOS for Windows clients and WebDAV for non-windows clients. AOS is shipped with Community Edition, but it is not open source.

My take: So many problems, so tough to debug. Good riddance. I will say that AOS has not been without its share of problems, and with AOS not being open source, we’re completely reliant upon Alfresco here.

So long Solr 1

Alfresco 6.0 will leverage Solr 6.0. Solr 1 is being dropped and Solr 4 is sticking around to help customers migrate to Solr 6.

My take: Alfresco has been playing catch-up with Solr for a long time, so anything that keeps it moving forward is good.

Disconnect the Share Connector

The Share Connector made it possible to integrate standalone Activiti (aka, “Process Services”) with Alfresco Share. This gave you the full power of a separate Activiti instance while letting users continue to use the document management and collaboration UI they were familiar with (Alfresco Share). Now Alfresco is essentially saying, if you need both, write it yourself using the ADF.

My take: Alfresco Share is essentially no longer being developed, so why continue to maintain a connector between Activiti (aka, “Process Services”) and Share? Agreed, it doesn’t make sense. And the more I look at the ADF, the more apparent it is that the true value of the framework is only realized if you are using both Process Services and Content Services. Otherwise, you end up stripping out a bunch of stuff you don’t need. But that’s another blog post.

Multi-Tenancy doesn’t live here anymore

Alfresco is dropping multi-tenancy support unless you have an OEM agreement.

My take: Ah, multi-tenancy. You think you want it until you see the list of what you have to give up to use it. It’s a feature even Alfresco didn’t use in their own cloud product. Now it’s official.

Deprecation Appreciation

Okay, those are the things that are going away. But the notice also included a list of deprecations. Web Quick Start, Alfresco in the Cloud, and Share Site Components will all be deprecated with Alfresco 6.0. All three are examples of good ideas that never really “made the turn”. I suspect Alfresco in the Cloud will prove to be the most problematic in terms of transition, depending on what Alfresco replaces it with.

Share Site Components means that Site Blogs, Site Calendars, Site Data Lists, Site Links, and Site Discussion Forums are going to be considered deprecated in the 6.0 release. Alfresco says if you need those features you should rely on integrating with other products or writing your own custom code.

Most projects I work on don’t use blogs, calendars, links, or discussions. I have a client or two that are using Alfresco as a team collaboration tool, but even then, the use of those tools is fairly sparse within the company. But lots of people use Data Lists, so that one may raise an eyebrow or two.

A Data List is just a collection of content-less objects of the same type. To Do Lists and Contact Lists are simple examples but people use Data Lists for all kinds of things. The Share UI doesn’t deal with content-less objects very well–you’re supposed to be collaborating on documents so when an object does not have a content stream it looks a little ugly. The Data List UI in Alfresco Share is an answer to that. But Alfresco is encouraging the creation of custom applications with the ADF (or without) so if you need something like a Data List, the repository still supports it–it’s just up to you to develop the UI.

Dropping these lesser-used tools is okay. It does seem weird that they are being dropped before Share is officially discontinued. Why not just let them stick around until Share is dead? I suspect Alfresco knows it will be a long, long time before they can officially kill Share, so they might as well trim what they’ve got to lessen the ongoing burden and to “purify” the platform of anything that’s not strictly about content.

Developer Direction

The notice closes out with some advice for people customizing the platform which can be summarized as: Use the platform as a black box. Rather than customizing Share, write a custom application that calls Alfresco’s services. Rather than integrating Alfresco with other systems “from the inside, out” by writing code that runs in the same process and leverages the foundational, native Alfresco API, put that logic in external applications that get what they need from Alfresco via REST. This is sound advice and I think many of us made that shift a while ago.

I should note that while Alfresco would like you to use ADF to build your custom content-centric applications, you’d be wise to assess that decision carefully. Depending on exactly what you are doing you may be better off without it.

There are some confusing statements about workflow. Here’s what the announcement says (emphasis is theirs): “In order to make it easier to design, deploy, and maintain custom workflows, in a future release we will be providing a platform-wide workflow service using Alfresco Process Services (powered by Activiti). This will replace the use of embedded Activiti for custom workflows. Future custom workflows will be implemented external to the Content Repository and will leverage the REST APIs of Alfresco Content Services.”

That sounds to me like they are removing embedded Activiti from the repository. However, in the next bullet, they say, “ACS workflows are intended to automate the management of content items within the Content Repository and APIs for custom workflows will continue to be available with subscriptions to Alfresco Content Services.” That sounds like there will still be a way to do workflows within Alfresco, but it isn’t clear whether or not that will still be Activiti or something else.

This was a long post, but, as you can see, they had a lot to announce. On the whole, I think they are positive, necessary changes. Small customers that are trying to use Alfresco for a lot of things, particularly collaboration, may feel some pain as they will be forced to look elsewhere for that kind of functionality. Very large customers, who often leverage Alfresco only for the repository and almost always have a custom front-end, may not be affected at all, and to the extent they already use containers, may benefit.

Regardless of customer size, we’ll all benefit from a more svelte platform that Alfresco and the community are better able to support.

Photo Credit: Richard Summers, CC BY 2.0

November 6, 2017

My initial experience with Antsle, a virtual machine appliance

I love virtual machines and containers because they make it easy to isolate the applications and dependencies I’m using for a particular project. Tools like Docker, Virtualbox, and vagrant are indispensable for most of my projects and I’m still using those, but in this post I’ll describe a product called Antsle which has given me additional flexibility and has freed up some local resources.

My daily developer workstation is a MacBook Pro with 16 GB of RAM and a 500 GB SSD. From a memory and CPU perspective, it can handle running a handful of virtual machines simultaneously without a problem. But disk space is starting to be an issue.

I use vagrant and Ansible to make virtual machine provisioning repeatable–I can delete any VM at any time without remorse because I can always recreate it easily. But I get tired of continually cleaning up machines and pruning back base boxes just to reclaim space.

I decided to do something about it. My options were:

When Apple releases a MacBook Pro that can take 32 GB of RAM, buy that with at least 1 TB SSD, then continue with my current toolset.
Buy a Mac Pro or some other desktop to use exclusively for virtual machines.
Buy or build an actual server and set it up with virtualization. Something like this, for example.
Use AWS for my development virtual machines.

Then I came across a little company based out of San Diego called Antsle. Antsle builds virtualization appliances. What makes their product attractive to me versus buying a workstation or server or building my own is that:

The machines have no fan or other moving parts–they are completely silent. The case acts as a heat sink.
The machines are energy-efficient. The docs say mine will run at 45 watts.
They are built on Linux with standard virtualization technology (LXC and KVM) plus some additional optimizations from Antsle.
They are ready-to-go out-of-the-box, saving me the time and effort of building my own solution.

I really like using AWS, and I think for production workloads, no one, not even your own internal IT data center, can do it cheaper or more securely. Plus the breadth of their service offering is nuts. But for my modest developer needs, I’m pretty sure I’ll break even within a year, and that’s not counting the productivity gain of not having to wait for instances to spin up or having to fool with the complexity of the AWS console.

So, after that analysis, I was ready to buy. The biggest struggle was to decide which model to buy and whether or not to do any upgrades. I went for an Ultra, which has an 8-core 2.4 GHz Intel processor, 32 GB of ECC RAM and two Samsung EVO 850 1 TB SSD drives. The drives are mirrored so that’s 1 TB of space. I could have expanded the RAM to 64 GB and increased the storage up to 16 TB, but it was hard to justify the added expense based on my needs.

My Antsle arrived last week and I’ve been pretty happy with it so far. I’ve got a set of “base” images created so that I can easily instantiate new machines based on typical components and configuration. For example, I have an image for every recent Alfresco release. When I need to work on one for a client project or to help someone in the forums, I can just clone one of my base images and start it up. I can let it run as long as I want without worrying about cost, and then kill it or keep it around as needed.

Here is a summary of my experience, thus far:

No setup necessary. I plugged it in, started it up, and was starting up machines in minutes.
Creating machines from templates, cloning machines, taking snapshots, and startup/shutdown happens very quickly.
Templates and instantiated machines take up less space than I would have thought, which is great. So far, I’m glad I stuck with the base storage option.
I haven’t pegged the CPU yet, but I have seen it spike briefly to as high as 50%, and that was when I was only running a single VM. I continue to see brief spikes here and there, but as I won’t have too many machines under load at any given time so I’m not that worried about it yet.
Documentation seems thorough and helpful. The company has been really responsive and helpful so far as well. They responded to a minor billing issue quickly and resolved it without a fuss.
I noticed when you clone a machine that has a bridged network adapter, the MAC address doesn’t change. You have to drop and re-add the NIC if you want a new MAC address, otherwise DHCP will assign it the same IP address as the original machine. This isn’t a big deal once you know the behavior.
I had to change the vm.max_mem_map setting to make Elasticsearch happy, which is a typical setup task for Elastic. It took me a minute to realize that needs to be done on the Antsle host and applies to all guests–it cannot be done on the individual VM, at least for LXC.
There does not appear to be a way to tag or comment on virtual machines. Additionally, the name you assign to each image is fixed-length and fairly short. So I’m somewhat concerned that, as my library grows, I’ll start to lose track of what’s installed on which machine. AntMan, the management console, seems to be evolving fairly rapidly so maybe this will change in a future release.

I’ve also created a few videos if you want to see it in action.

This video is the unboxing.

This video shows an Ubuntu and a CentOS image being created and then configured for bridged networking.

This video shows how image templates work and gives you a little bit of a feel for the performance using a real-world app (in this case, Alfresco running on CentOS) while other machines are running simultaneously (one Mail/LDAP machine and a four-node Elastic cluster).

September 28, 2017

Alfresco Software resurrects DevCon

Encouraged by the success of the independently-organized, developer-focused BeeCon conference, and seeking to continue its renewed focus on developers, Alfresco has decided to resurrect its own annual developer-focused event. This week Alfresco announced that DevCon will be held January 16 – 18 in Lisbon, Portugal.

Alfresco had previously given up on big, annual events, deciding instead to focus on smaller, one day events in local markets around the globe. These were primarily sales and marketing events focused on lead generation and did not include an open call for papers.

When the annual events were discontinued, the community stepped in. The Order of the Bee, a global community of Alfresco enthusiasts independent of Alfresco Software, Inc., held two successful conferences in 2016 and 2017. These were low-budget, non-profit affairs with a very high signal-to-noise ratio.

Despite being organized independently by the community, the Order of the Bee events were still heavily supported by Alfresco. The company paid for high-level sponsorships and sent many engineers, John Newton, and other staff to give talks at both Order of the Bee conferences.

The company’s interest in annual events isn’t the only thing to have come back around lately. In the early days, the company was very focused on developers. The repository was pitched as a key foundational technology to content-centric applications. Over time that focus blurred as the company tried to move up-market towards “solutions” and the marketing focus turned to business buyers. But the pendulum has swung back again, centered mainly around the Alfresco Development Framework, a set of components meant to make it easier for developers to build content and process centric applications. So it is no surprise that Alfresco would be interested in being the primary driver behind an annual developer-centric event.

The resurrection of DevCon by Alfresco should be good for the community as long as the event is able to hold on to its community and developer focus. They have invited The Order of the Bee to help with conference planning, so that will help. And, the community and developer outreach functions within Alfresco are now held by many former community members so that also increases its chance of success, at least from a community perspective.

The Alfresco community has always been strongest in Europe. For now, Lisbon is the only date announced. If it is successful, it’s possible we could see a North American date later in 2018, or perhaps they will alternate continents every other year.

The Call for Papers is open now. But if you want to speak, you’d better hurry. The deadline for submissions is Monday, October 23, 2017.

September 25, 2017

Secure your Alfresco server

One of the services that Metaversant provides is called a Health Check, which is exactly like it sounds: We review clients’ Alfresco installations from bottom to top and make recommendations for improvement.

A surprisingly large number of those assessments consistently show that people are not doing enough to secure their environment. In the past, I might have added, “especially for those with an externally-facing server” but, honestly, at this point, you really ought to be treating the entire network as an untrusted, hostile environment, even if it is behind a firewall.

First, let’s start with the bare minimum. These will be obvious to many of you, but believe me, they are not universally applied.

Change the default admin password to a random string

Most everyone changes the admin password, but many people make poor choices as to what that password is. Do what you should be doing with your personal passwords: Change the admin password to a randomized string.

Once you change it, don’t share it in clear-text email or text messages. Use PGP to encrypt emails that include secrets. If you must use a mobile device to share secrets use Signal.

Secure your server traffic with HTTPS

If you aren’t going to encrypt the traffic to your server then you might as well skip all of the advice in this entire post and treat your repository as public information. If that sounds like a bad idea, then you must encrypt your traffic to prevent passwords from being exposed in clear text.

The wonderful service, Let’s Encrypt, makes quality SSL certificates available to everyone for free. Yes, you have to renew them more often than paid certificates but you can automate that fairly easily with EFF’s certbot. In fact, once you establish the web proxy in front of Tomcat, securing your traffic with Let’s Encrypt is as easy as running the certbot script if you have a public-facing server.

Do not run Alfresco as root

If someone does compromise Alfresco you want to limit the damage they can do. If Alfresco is running as root, they can wreck havoc on your server.

I often see Alfresco running as root on installations where someone has simply stood up a server, switched to root, and then run the installer. When you install as root, the Alfresco service will be set up to run as root. If you later try to run as a non-root user, the alfresco.sh script might complain. You can fix this by editing the script. But you can avoid the problem altogether by installing as a non-root user in the first place.

Sometimes Alfresco will be set up to run as a non-root user and then someone will unknowingly start the server as root. When this happens you have to stop the server, recursively fix all of the permissions on the files that root touched, and then restart as the non-root user.

Disable unused protocols

This one is about reducing your attack surface. One of the nice things about Alfresco is the wide number of options you have for getting information in and out of the repository. That’s great, but if you aren’t using, for example, FTP, then why leave FTP enabled? That’s a potential place an attacker could find a toehold. Purposefully review each of the protocols that Alfresco supports and disable those that are not being used.

Re-generate the Solr certificate

Alfresco and Solr are separate web applications. Regardless of whether or not these web apps are running in the same Tomcat server, different Tomcat servers, or even different machines, they use HTTP to communicate with each other. The communication between Solr and Alfresco is encrypted, by default. The Solr web application is secured using certificate-based client authentication. But, by default, the certificate Solr uses for both encryption and authentication is the one that Alfresco generated and shipped with the product. This means that, by default, if someone can get to your Solr port they can search your entire repository because the public has easy access to that Alfresco-generated, default client certificate.

To fix this, either make sure no one can hit the Solr port (8443, by default) or re-generate the certificate. Or both. For more info on how to re-generate the Solr certificate, see the docs.

Stay current

I see an alarmingly high number of people running ancient versions of Alfresco. Often this is because the effort to upgrade can be fairly intense, especially if there are a lot of customizations to deal with. Like any significant piece of software, there have been a number of vulnerabilities discovered and resolved in Alfresco over the years. Staying on an old release could put your installation at risk.

Those are certainly the most common security issues I come across. I would consider these to be the minimum set of best practices.

Recently, I’ve had an increase in the number of clients asking about adding Two-Factor Authentication to Alfresco Share. There are a few options for doing this:

Loftux offers a module that implements two-factor authentication using Authy. There is a cost associated both with the Loftux add-on and the Authy service.
Contezza offers an add-on that uses Google Authenticator. In this case, the add-on has a cost but Google Authenticator is free. Google Authenticator may not be the right choice for everyone, though.
There are also community projects that have done some work in this area, including a very old add-on that works with Yubikeys. Yubikeys are pretty cool, but the obvious drawback is that you have to distribute and manage the physical keys to your users.

Finally, no discussion of Alfresco and security would be complete without mentioning my friend and former colleague, Toni Blyx. The guy knows his stuff. His “Security Best Practices” presentation from Alfresco Summit 2014 is an important read.

Photo Credit: “Vintage Bank Vault” by Brook Ward, CC BY-NC 2.0

September 1, 2017

Apache Chemistry cmislib 0.6.0 released

It has been far too long since our last Apache Chemistry cmislib release, but we finally managed to get one out. The new release, 0.6.0, features support for the browser binding as well as many fixes contributed by the community.

If you make no changes to your code the library will continue to use the Atom Pub binding, by default. But, the browser binding, which communicates with CMIS 1.1-compliant repositories using HTML forms and JSON, is often preferable because it may be more performant than the XML-based Atom Pub binding.

To use the new browser binding, import it, then pass it to the CmisClient constructor, like this:

from cmislib.browser.binding import BrowserBinding
client = CmisClient('http://localhost:8081/chemistry/browser',
   'admin',
   'admin',
   binding=BrowserBinding())

From there everything works like it always has.

For more information, please see the docs. If you have issues, please file a Jira with as much detail as possible, including the vendor and version of the repository you are working with. And if you have a fix, include that in your Jira. Contributions are welcome!

August 31, 2017

Just for fun: Docker Swarm on my 4-node Raspberry Pi cluster

I recently spent some time standing up a four-node Raspberry Pi cluster running Docker and Docker Swarm. I had no real practical reason to do this–it just sounded fun. And it was!

Docker is a technology that allows you to package your applications together with the operating system into a virtual machine, called a container, that can run anywhere. Docker Swarm establishes a cluster of hosts which can be used to run one or more Docker-based containers. You tell Docker Swarm which containers you want to run and how many of each and it takes care of allocating those containers to machines, provisioning the containers, starting them up and keeping them running in case of failure.

For example, suppose you have an application that is comprised of a web server, an application server, a database, and a key-value store. Docker can be used to package up each of those tiers into containers. The web server container has a thin operating system, the web server, and your front-end code. The application server has a thin operating system, the application server, and your business logic. And so on.

That alone is useful. Containers can run anywhere–local developer machines, on-prem physical hardware, virtualized hardware, or in the cloud. Because the applications and the operating system they run on are packaged together as containers I don’t have to worry about installing and configuring the infrastructure plus the code every time a new instance is needed. Instead I just fire up the containers.

With Docker Swarm I can say, “Here is a fleet of servers. Here are my containers that make up my stack. Make sure I always have 6 web servers, 3 app servers, 2 databases, and 3 key-value stores running at all times. I don’t care which of the servers you use to do that, just make it happen.” And Docker Swarm takes care of it.

This works surprisingly well on Raspberry Pi. Sure, you could do it on beefier hardware, but it’s pretty fun to do it with machines no bigger than a pack of cards. I used a mix of Raspberry Pi models: 1 2b+ and 3 model 3b’s, but I’ve also seen it done with Pi Zero’s, which are even smaller.

The examples I’ll reference in the links below do simple things like install a node-based RESTful service that keeps track of a counter stored in Redis. But once you do that, it is easy to see how you could apply the same technique to other problems.

If you want to try it yourself, here are some resources that I found helpful:

If you don’t already have a multiple Raspberry Pi set up, here is a shopping list (with Amazon links):

Raspberry Pi 3 model B (x4)
Compact SD cards (x4)
Stackable dogbone enclosure (holds 4 Raspberry Pi’s)
5-port ethernet switch able to run with USB power. Obviously any switch will do, but if you use one that is USB powered you can cut down on cords.
5-port USB power hub to provide power for all of the above. If you go with a different one, make sure you get one that can provide 2 amps per port.
USB to micro cables
Ethernet cables
Cable wraps (optional)

I already had a 2b+ sitting around so I used that with 3 model 3’s. The performance difference between the 2b and the 3b was significant, though, so if I do much more with it I will replace the model 2 with another model 3. My existing model 2b+ has a Sense HAT attached to it, which, among other things, gives me a nice 8×8 RGB LED matrix for displaying messages and status indicators.

When it is all put together, it looks like this:

Last year I used my Raspberry Pi as part of a hands-on class I gave to some elementary school students for Hour of Code. I haven’t settled on what I might do for them this year or whether or not that will leverage my new cluster, but it is handy to have Docker running on my Pi’s because I can set stuff up, tear it down, and relocate it much more easily.

July 10, 2017

Storj.io: An open source, massively distributed object store and API for developers

I’ve been playing with a new object storage solution that’s kind of cool. It’s called Storj. Before I describe how it works, let me start by comparing it to a more familiar solution.

Probably the best-known example of object storage is Amazon S3. It allows you to define buckets and then upload files into those buckets. Amazon charges you based on the amount you store and the amount you transfer, plus a little based on the total number of objects stored. There are three tiers of storage based on frequency of access and pricing varies by region, but for discussion purposes let’s say it is about $0.023 per GB per month. To store 500 GB that would cost about $138 per year without transfer fees.

For that $138 you can be sure that Amazon is replicating your data across multiple facilities and devices. Amazon says that S3 offers 99.999999999% durability. That’s pretty impressive.

But one consideration with using S3 or any other traditional cloud storage solution is that your data is sitting in data centers owned by a single vendor. Of course you could take steps to replicate that data to other providers, but that is kind of a pain. Even then you will still end up with your data sitting behind a relatively small number of vendors, none of whom are really geared toward transparency and openness.

Storj.io was built to address this problem. It’s an open source, distributed object storage platform. Like S3, the model consists of buckets and files in those buckets. The difference is in how your data is stored. When you upload a file to Storj, your file is broken into small pieces called shards, encrypted using keys you hold, and then uploaded to several nodes around the world.

Here’s where it gets really interesting. The nodes that store data are not owned by a single entity. Instead, nodes are run by “storage farmers”. Disk farming is kind of like crypto currency farming, but instead of solving mathematical computations to earn crypto coins, farmers receive micro payments based on how much their space gets utilized. Storj actually leverages the Ethereum blockchain to make this work, and if you are interested in the nitty gritty details, you should check out the whitepaper.

A farmer might be an individual with 50 GB of spare disk space or it could be an organization with lots and lots of space. You don’t know and you don’t really care. Their space gets selected based on a number of factors, which includes things like stability of the node, bandwidth, and total space available. If a farmer tries to tamper with any data on their node they get dropped and they don’t get paid.

Right now Storj is offering 25 GB of free space for one year. After that, their current pricing is $.015 per GB per month. So using my 500 GB example, that’s $90 per year without transfer fees. And if you have some extra storage sitting around, you could become a farmer and offset your costs a little bit.

To be clear, Storj is a tool for developers. After signing up you’ll get presented with a GUI for creating buckets, but when it’s time to start moving data into those buckets you’ll need the API. Right now there is a NodeJS library or you can use a command-line tool provided by a native installer.

This service definitely looks promising, but it is important to know that it is still early. One thing to think about is what happens if farmers start dropping out of the network. When your file is split into shards, each shard is copied to multiple nodes. In my quick test, shards were spread across five nodes, which is plenty to give me confidence that I will be able to get my file back.

If a node drops offline, it is supposed to trigger a replication of your shard to another farmer using one of the remaining good nodes. This works great unless all of the farmers who hold one of your shards drop at once, but with 19,000 farmers and climbing, and assuming your shards are always on multiple nodes, the chances of that happening seem very, very low. The docs say that Storj is working on rolling out additional mirroring strategies. And, you can always use the API to ask Storj which nodes your file is sharded across. It looks like you can make an API call to move a shard yourself, but I haven’t tried that yet.

One last thing to point out is that this is an open source project. You are welcome to contribute. You can even grab the software and run a completely private Storj network, if you want.

I feel like some of my clients are still getting used to the idea of putting their data in the cloud. And some like one throat to choke. A distributed cloud like this may be a tougher sell for conservative customers, even if the security and the durability are there. Still, I love the concept. What do you think?

June 19, 2017

Have you tried the serverless framework?

Last year I was working on a POC. The target stack of the POC was to be 100% native AWS as much as possible. That’s when I came across Serverless. Back then it was still in beta, but I was really happy with it. After the POC was over I moved on to other things. A couple of days ago I was reminded how useful the framework is, so I thought I’d share some of those thoughts here.

Before I continue, a few words about the term, “serverless”. In short, it gets some folks riled up. I don’t want to debate whether or not it’s a useful term. What I like about the concept is that, as a developer, I can focus on my implementation details without worrying as much about the infrastructure the code is running on. In a “serverless” setup, my implementation is broken down into discrete functions that get instantiated and executed when invoked. Of course, there are servers somewhere, but I don’t have to give them a moment’s thought (nor do I have to pay to keep them running, at least not directly).

If your infrastructure provider of choice is AWS, functions run as part of a service offering called Lambda. If you want to expose those functions as RESTful endpoints, you can use the AWS API Gateway. Of course your Lambda functions can make calls to other AWS services such as Dynamo DB, S3, Simple Queue Service, and so on. For my POC, I leveraged all of those. And that’s where the serverless framework really comes in handy.

Anyone that has done anything with AWS knows it can often take a lot of clicks to get everything set up right. The serverless framework makes that easier by allowing me to declare my service, the functions that make up that service, and the resources those functions leverage, all in an easy-to-edit YAML file. Once you get that configuration done, you just tell serverless to deploy it, and it takes care of the rest.

Let’s say you want to create a simple service that returns some JSON. Serverless supports multiple languages including JavaScript, Python, and Java, but for now I’ll do a JavaScript example.

First, I’ll bootstrap the project:

serverless create --template aws-nodejs --path echo-service

The serverless framework creates a serverless.yml file and a sample function in handler.js that echoes back a lot of information about the request. It’s ready to deploy as-is. So, to test it out, I’ll deploy it with:

serverless deploy -v

Behind the scenes, the framework creates a cloud formation template and makes the AWS calls necessary to set everything up on the AWS side. This requires your AWS credentials to be configured, but that’s a one-time thing.

When the serverless framework is done deploying the service and its functions, I can invoke the sample function with:

serverless invoke -f hello -l

Which returns:

{
    "statusCode": 200,
    "body": "{\"message\":\"Go Serverless v1.0! Your function executed successfully!\",\"input\":{}}"
}

To invoke that function via a RESTful endpoint, I’ll edit serverless.yml file and add an HTTP event handler, like this:

functions:
  hello:
    handler: handler.hello
    events:
      - http:
          path: hello
          method: get

And then re-deploy:

serverless deploy -v

Now the function can be hit via curl:

curl https://someid999.execute-api.us-east-1.amazonaws.com/dev/hello

In this case, I showed an HTTP event triggering the function, but you can use other events to trigger functions, like when someone uploads something to S3, posts something to an SNS topic, or on a schedule. See the docs for a complete list.

To add additional functions, just edit handler.js and add a new function, then edit serverless.yml to update the list of functions.

Lambda functions cost nothing unless they are executed. AWS offers a generous free tier. Beyond the first million requests in a month it costs $0.20 per million requests (pricing).

I should also mention that if AWS is not your preferred provider, serverless also works with Azure, IBM, and Google.

Regardless of where you want to run it, if you’ve got 15 minutes you should definitely take a look at Serverless.