Content-as-a-Service Review: Cloud CMS

Cloud CMS LogoThe next vendor in my Content-as-a-Service (CaaS) round-up is Cloud CMS. If you missed my overview of the CaaS market, you might want to read that first.

Cloud CMS has been around since 2010. It was founded by Michael Uzquiano, formerly of Alfresco, Epicentric, and Vignette. This year they brought on a new CEO, Malcom Teasdale, who, prior to joining Cloud CMS, ran Rothbury Software, an Alfresco partner that was ultimately purchased by Ixxus. Malcom spent some time at Interwoven earlier in his career. So these two have a lot of content management experience and that’s evident in their product. (Disclosure: I’ve known Michael and Malcom for several years but I do not currently have a business relationship with Cloud CMS).

My CaaS overview described functionality you’ll find in all CaaS offerings. Similar to my posts on Prismic.io and Contentful, in this post on Cloud CMS I’ll focus on some select areas of the Cloud CMS offering:

  • User interface for content authors
  • Creating content types
  • Working with content via the API
  • Security
  • Pricing

Cloud CMS is an extensible platform with a long list of features. This is more than just a place to stick content and an API to get it back out–this is a full-featured CMS that happens to be running as SaaS in the cloud. But I’m primarily interested in it for its ability to act like a CaaS offering, which to me means a more pragmatic, stripped down approach to content management, so this review is going to ignore a lot of the Cloud CMS features that go beyond that. Perhaps I’ll come back to those in a future blog post.

User Interface for Content Authors

Before I describe the Cloud CMS user interface let me tell you a little about how Cloud CMS stores content. A Cloud CMS account has one or more repositories. How you use your repository is up to you–you might have one repository per project, for example.

Within a repository there are one or more branches. A branch works just like it does in source code control–it is a view of the repository. Each repository has a “master” branch, but you can have as many branches in a repository as you want. Branches can be areas where individuals work on content in isolation, or they can be for teams. Branches can never be deleted.

Within a branch are one or more changesets. Adding content to a branch (or making any change) takes place against a changeset–you’re never actually changing anything that was written previously.

And, finally, at the object level you have nodes. Everything in Cloud CMS is stored as a node. If you have a blog post, that’s a node. If your blog post points to an image, the image is also node. All nodes are typed. There are out-of-the-box types but you can also define your own. I’ll talk more about that in a bit.

Cloud CMS provides a web user interface that helps teams organize content creation built around the notion of projects. For the most part, your users see your repository through the lens of a project. They don’t have to understand the underlying structure of the Cloud CMS repository.

The first thing you see when logging in to Cloud CMS is a dashboard. From here you can manage your projects, tasks, members, and workflows. For my review I created a project called “My Cloud CMS Project”.

Cloud CMS Home PageDrilling into the project displays the project’s dashboard. Clicking on Documents shows a hierarchical view of folders and documents. Multiple files can be dragged-and-dropped which triggers an upload to the folder. As you can see in the screenshot below, thumbnails are automatically created and displayed in the document list.

Cloud CMS Project DocumentsClicking a specific document opens the details page for that object where users can edit the properties of the object, make comments, assign tags, start workflows, and perform a number of other actions.

Cloud CMS View Project DocumentSo that’s how content authors manage content in Cloud CMS. Let’s take a look at how the content model is defined.

Creating Content Types

Content Type definitions are created using JSON Schema. You can define types using either the Cloud CMS UI by typing your JSON Schema into a web form, or by using the API. I really like the ability to define a content type simply by typing the JSON Schema that defines it. Of course, the trade-off is that you have to know how to define a type using JSON Schema. Fortunately, there is documentation that can help.

Content modeling in Cloud CMS is much more advanced than what I’ve seen in other CaaS offerings. Three examples of that are: hierarchical content models, aspect-orientedness, and role-based form definitions.

The first one, hierarchical content model, is easy to grasp: Your types can inherit from parent types. This can potentially save you time setting up your content model if your types lend themselves to being organized in a hierarchy.

The second one requires a little more explanation: aspect-orientedness. A content model is aspect-oriented when it has the ability to define bundles of properties that can be “attached” to objects, regardless an object’s underlying type. In Alfresco these are called “aspects”, in CMIS these are called “secondary types”, and in Cloud CMS these are called “features”.

For example, you might have a set of metadata that you want to track for anything that is “client-related” like the client’s name, industry, size, and primary contact. In Cloud CMS you define that set of metadata as the “client-related” feature, and then any object can become “client-related” just by adding the client-related feature–the bundle of properties having to do with a client–to the object.

The third advanced content model-related feature in Cloud CMS is role-based forms. When you create instances of a content type, Cloud CMS looks at the schema and generates a default form, similar to how other CaaS offerings work. What’s different is that Cloud CMS allows you to define multiple forms for the same content type. You can tie forms to specific roles so that different people see different authoring forms depending on their role. This is helpful when you have different types of people who need to edit the same content–you can use role-based forms to structure the form the way it makes the most sense for each type of user.

By the way, like content type definitions, form definition is also JSON-driven. Cloud CMS leverages the open source AlpacaJS forms engine into their product, which you might want to consider for your own projects, even if you don’t need Cloud CMS.

Working with Content via the API

Cloud CMS can host your application for you. This makes it nice for people who are looking for a one-stop shop for their content-centric mobile and web applications because you don’t need separate services for your content and your app. If you look back at the project dashboard screenshot I showed earlier you’ll see an example–I’ve created an application called “demo”.

What wasn’t obvious to me initially is that even if you are going to host your application elsewhere, you still need to create an application, and within that, a deployment. That’s where you’ll find the API keys needed to work with the API.

In addition to the API keys you’ll need to know your repository ID (which you can get from the Cloud CMS admin console). If you are going to fetch content from anything other than the master branch you also need the branch ID. Once you have that, querying for nodes is straightforward, as shown in this gist:

(Can’t see the code? Click here).

Under the covers, Cloud CMS is running MongoDB and Elasticsearch. So when you query for Nodes, you’re using Mongo. When you’re searching the full-text of your nodes you’re using Elasticsearch.

Everything in Cloud CMS is stored in JSON so creating content is as simple as calling createNode and passing in JSON for the property values:

(Can’t see the code? Click here).

Security

Every object in Cloud CMS can have an ACL and child objects can inherit ACLs from their parents. There are out-of-the-box roles you can use in those ACLs, including: Connector, Consumer, Contributor, Editor, Collaborator, and Manager. Unfortunately, to set permissions on an object you have to switch from the project user interface to the administrative user interface. Still, this is the only vendor in the round-up offering object-level permissions.

Pricing

Cloud CMIS offers a free 14-day trial. After that you have to switch to one of four monthly plans:

  • Starter: $20/month. Limited to a single project and email support.
  • Platform: $200/month. Unlimited projects and email support.
  • Premium: $800/month. Unlimited projects, phone support, developer support, additional AWS features
  • Enterprise: $2400/month. Private cloud with an on-premises option

The ability to run the entire stack on-premises is an interesting option for those not ready for the cloud although I haven’t explored how practical that really is for Cloud CMS.

The usual caveats apply to pricing here. Like other CaaS offerings, Cloud CMS places limits on total storage space and data transfer so look at the details to make sure you understand what this will cost you each month.

Overall Impressions

Cloud CMS is a full-featured platform for content management. Where some startups focus on the minimum viable product, Cloud CMS has gone for a kitchen sink approach that approximates the functionality you might expect in more mature, on-premises ECM offerings. Cloud CMS can even be the platform that runs your application, if that’s what you need.

Content management professionals will appreciate the advanced content modeling and forms features, which is just one example of functionality that reflects the founders’ ECM industry heritage. The flip-side of that coin, however, is the complexity also reminiscent of those systems. The documentation and tutorial videos help but there is a learning curve here.

Cloud CMS can compete against CaaS players like Prismic.io and Contentful as well as established ECM vendors like Alfresco and Documentum. Compared to other CaaS players it has much richer functionality, but it is also more complicated to use. To compete against them successfully Cloud CMS may need to further streamline the UI so that users can harness that power without being overwhelmed by other features.

Compared to established vendors like Alfresco and Documentum, Cloud CMS offers a hosted system running in the cloud with a similar feature list while maintaining the ability to customize the platform. Customers looking to move to the cloud may find an easier migration to Cloud CMS than to one of the newer CaaS players. In this respect Cloud CMS also competes with other traditionally on-prem document management offerings now offered in the cloud like Hippo onDemand and Nuxeo Cloud.

With two ECM veterans at the helm and impressive functionality it will be interesting to watch Cloud CMS go after one or both of these markets.

Posted in Content-as-a-Service | Tagged , , | 5 Comments

Content-as-a-Service Review: Contentful

contentful-logoIt’s time for the next vendor in my Content-as-a-Service (CaaS) round-up. In this post I’ll be taking a look at Contentful. If you missed my overview of the emerging CaaS market you might want to take a look and then come back.

Contentful came out of beta to be generally available in May of 2014, so it is the youngest vendor of the three in my round-up. The company is based in Berlin and has a number of well-known clients including EA, Disney, Viacom, Asics, Nike, Playboy, and McAfee.

My CaaS overview described functionality you’ll find in all CaaS offerings. Similar to my post on Prismic.io, in this post on Contentful I’ll focus on the areas that typically differentiate one CaaS offering from another, namely:

  • User interface for content authors
  • Creating content types
  • Working with content via the API
  • Security
  • Pricing

Finally, I’ll wrap up with my overall impressions and takeaways.

User Interface

In Contentful you have one or more “spaces”. You can think of a space as a repository. It’s a collection of “entries”, which are instances of content types, and “assets”, which are file-based assets like images. In addition, each space has its own users, roles, and API keys.

When you log in to Contentful you’ll be sitting in one of your spaces (the first one in the list). The Contentful user interface is clean and quite minimal. It’s easy to get going quickly because you have a limited function set. You can define your content model, manage entries, manage assets, or manage your API keys and that’s about it.

contentful user interfaceOptions for organizing lists of entries and assets are limited to filtering based on status (Published, Changed, Draft, Archived), content type (for entries), and file type (for assets). There is a saved search feature and saved searches can be organized into folders. But, there is no notion of hierarchical content storage.

This minimalistic approach to content management is what I characterized as a Good Thing in my Content-as-a-Service overview.

Content Types

Every content management system has a way to describe the data stored in the repository which is referred to as a “content model”. In Contentful, each space has its own content model which is simply a set of content type definitions. A content type is a set of typed properties. There are property types you’d expect, like text, date, number, decimal number as well as “pointer” types that reference other objects in the system, where those objects are either entries or file-based assets.

For example, the screenshot below shows a Promo content type I created. It has a name, a description, and a set of fields such as pubDate, geoCode, image, alt, etc. In the case of the image field, it points to a file-based asset.

contentful content typeUnfortunately, there is no support for cross-cutting concerns (aspects) so if you want to repeat property definitions across types you have to do that manually. For example, a type called “Cat Picture” and a type called “Dog Picture” might both have “height” and “width” properties. In Contentful, you have to repeat similar properties across type definitions.

On a related note, there is no way to clone or copy content types in the user interface. If you have two similar content types you have to re-do the property sets in each one or use the Contentful content management API to do this.

Managing your content model in the Contentful UI takes way too many clicks. I much prefer the approach other CaaS offerings take where you edit the content type definitions using JSON rather than the point-and-click approach. The nice thing with Contentful is that you can use the API for everything, including defining your content model–you don’t have to use the UI at all if you don’t want to.

One thing to watch out for: If you want to change a field on a content type (like maybe change the type of field) you have to deactivate the content type first. That’s not such a bad thing until you realize you cannot deactivate a content type without first getting rid of the instances of that content type. This could be a challenge once you go to production, so make sure you are happy with your content model before you get too far down the road.

Working with content via the API

Content can either be “published” or “draft”. The API keys you generate for a space can either be “production” or “preview”. The production URL and access key can only fetch published content. The preview URL and access key will see everything.

The Contentful Content Delivery API lets you fetch content by space which can be further filtered by query terms that will be AND’ed together. There was not an obvious way to OR query terms.

If you’d rather, Contenful offers client libraries in Java, JavaScript, Objective-C, Swift, and Ruby.

Here is a gist that fetches content using JavaScript:

var contentful = require('contentful');

var spaceId = 'someSpaceId';
var accessToken = 'someAccessToken';
var contentTypeId = 'someContentTypeId';

var client = contentful.createClient({
  space: spaceId,
  accessToken: accessToken,
  secure: true,
  host: 'cdn.contentful.com'
});

client.entries({
    'content_type': contentTypeId,
    'fields.pageId': 'all',
    'fields.geoCode': 'all'
}, function(err, entries) {
    if (err) { console.log(err); return; }
    for (var i = 0; i < entries.length; i++) {
        var entry = entries[i];
        console.log('id: ' + entry.sys.id);
        console.log('name: ' + entry.fields.name);
        console.log('pageId: ' + entry.fields.pageId);
        console.log('--------')
    }
});
id: someId1
name: promo-3
pageId: all,page2,page3
--------
id: someId2
name: promo-2
pageId: all,page5
--------
id: someId3
name: promo-1
pageId: all
--------

(Can’t see the code? Click here.)

If all you need to do is fetch content from Contentful, you’ll stick with their Content Delivery API. To create, update, or delete content, use the Content Management API.

For example, here is a gist that creates content using the Content Management API:

var contentful = require('contentful-management');

var spaceId = 'someSpaceId';
var accessToken = 'someAccessToken';
var contentTypeId = 'someContentTypeId';

var client = contentful.createClient({
    space: spaceId,
    accessToken: accessToken,
    secure: true,
    host: 'api.contentful.com'
});

// a custom "Generic Content" content type
var entry = {
    fields: {
        name: {
            'en-US': "Take a Trip!"
        },
        description: {
            'en-US': "Title for page 4"
        },
        geoCode: {
            'en-US': ["all"]
        },
        loginState: {
            'en-US': ["all"]
        },
        pageId: {
            'en-US': ["page4"]
        },
        placementId: {
            'en-US': ["banner"]
        },
        expDate: {
            'en-US': "2014-10-31T00:00:00-05:00"
        },
        pubDate: {
            'en-US': "2014-10-01T00:00:00-05:00"
        },
        content: {
            'en-US': {title: "Take a Trip", imgSrc:""}
        }
    }
};

client.getSpace(spaceId).catch(function(error) {
    console.log('Could not find space: ' + spaceId);
    throw error;
}).then(function(space) {
    space.createEntry(contentTypeId, entry).catch(function(error) {
            console.log('Error creating entry: ' + error.toString());
            throw error;
    });
});


(Can’t see the code? Click here.)

Note that content is initially created in draft mode. You must publish the content if you want it to be retrievable via the “production” content delivery API.

I should mention that Contentful also offers a sync API, which is particularly useful for mobile applications.

Security

In Contentful, everyone belongs to one or more Organizations. Organization owners can create spaces and can invite users to a space. Users can be editors (edit all content) or developers (edit all content, manage API keys). Space admins can create new content types. The UI shows a “custom role” but this article says that custom roles will be implemented for the enterprise offering at some point in the future.

Contentful also makes it easy for agencies or consultants to administer an organization on their client’s behalf while the client remains the main contact for billing purposes.

Pricing

Contentful offers the following plans:

  • Free plan which is limited to 3 users, 3 spaces, and 1,000 objects
  • Plus plan for $99 per month which includes 5 users, 5 spaces, and 5,000 objects
  • Pro plan is $200 per month for up to 10 users, 10 spaces, and 10,000 objects.

Most of my clients would probably need the “Enterprise” plan which could cost anywhere between the low thousands to the tens of thousands of dollars per month depending on exactly what is needed.

Also, be aware that Contentful, like other CaaS vendors, places limits on additional things such as API requests, API bandwidth, and API keys. These and other details may change so take a look at the pricing page (click “Compare Plans” for the expanded details) to be sure.

Overall Impressions

Contentful is an extremely basic offering in terms of both the user interface and the capabilities of the underlying platform. But the simplicity of the offering is precisely what makes it so attractive. Developers will appreciate the “API-first” approach to content management, and without a lot of extraneous sub-systems getting in the way, they’ll be able to develop a solution quickly, and then let content authors manage the content with the easy-to-use interface. Contentful’s stripped down, pragmatic approach to content management is a category-defining building block you can use to create really cool content-centric solutions.

Posted in Content Management, Content-as-a-Service | Tagged , , | 8 Comments

You may be surprised at what’s not in Alfresco 5

4325797829_280db25ffe_zIt won’t be long before we’ll be celebrating Alfresco’s tenth birthday. Sniff, sniff, they grow up so fast!

As part of that growth, it’s only natural that certain areas of the product will reach their end-of-life. Since its first release we’ve seen very little pruning of old or obsolete features, but that changes with the Alfresco Community Edition 5.0.b release.

Some of the things that have been dropped won’t surprise anyone. Some I consider regressions and may actually come back quickly, at least I hope they do. The surprises have been handled a little sloppily–Richard Esplin, the current head of community apologized for that earlier this week, essentially saying it was due to the rush to get 5.0.b out in time for Alfresco Summit.

You can read Richard’s post and the release notes for the full list of feature removals. In this post I’ll call out the major items you will no longer find in Alfresco Community Edition as of 5.0.b.

Alfresco Explorer

If you’ve paid any attention to my blog or just about anyone else who speaks or writes about Alfresco you already knew to avoid the original JSF-based web client called “Alfresco Explorer”. It’s the original web client accessible at /alfresco.

Alfresco has been saying Explorer was going away for a long time and they’ve finally done it. If you fire up Alfresco 5.0.b Community Edition and go to /alfresco you’ll find our old friend is no more. Good night, sweet prince!

All of my clients have been focused on Alfresco Share for years so if the impact of the change was simply that you couldn’t log in to that old client any longer it wouldn’t be a big deal, but there has been some collateral damage, which brings us to the next section…

Workflow Console, Tenant Console, or Basically Any Console

Unfortunately, some vital consoles leveraged the same JSF code base as Alfresco Explorer. When that went, these consoles went as well. The old saying about babies and bathwater comes to mind.

The removal of the workflow console is particularly irksome. It’s critical to anyone doing anything with either Activiti or jBPM in Community Edition. In my opinion, this is the most important console of the bunch.

The data dictionary console is also gone, which is used to enable or disable hot-deployed content models. If you only use content models deployed as part of the WAR this won’t affect you.

The tenant console is also gone. This obviously won’t affect you unless you are using the multi-tenancy feature.

The AVM console is also gone, but then again, so is the AVM as I’ll touch on briefly next.

The frustrating thing about these missing consoles is that they aren’t planned to make a return until 5.1, according to Richard. That makes 5.0 Community Edition harder to use than it needs to be. It’s possible that Alfresco will make the console framework available so that the community can help get these back in place quickly.

The AVM

The AVM is the ill-fated Web Content Management offering that Alfresco told you was reaching its End-of-Life back in February of 2012 so, again, you should not be surprised about this one. All but a handful of people have found other WCM solutions.

Lucene

This one sparked a WTF moment on Twitter earlier in the month when I was shocked to realize that 5.0.b Community Edition required Solr to be fully functional. Without it, you can’t do a people search, for example. Actually, you can’t do a full-text search either. So in my book, this makes Solr required to run.

Prior to this change you could choose either Solr or Lucene. I often used Lucene locally because it was one less WAR to deal with and it was the default when doing a manual WAR install. Some people preferred Lucene’s in-transaction indexing over Solr’s asynchronous indexing and eventual consistency.

I understand that Solr is the way forward for Alfresco. It just felt like this one was a bit rushed. I don’t remember any public communication saying that Lucene would no longer be an option in 5.0. The Alfresco Product Support Status page doesn’t list it either. Richard’s post says, “When we added Solr to Alfresco 4, we deprecated Lucene.” That may be true, I’m just not sure Alfresco told anyone, although it is possible I missed it.

SDK, API, & Publishing Features

The release notes for 5.0.b includes a “Feature Removals” section. Noteworthy entries include:

  • The old Java SDK has now been replaced with the Maven-based SDK in Github. This has been a long time coming.
  • The CMIS API endpoints from 3.x and 4.0 have been removed (use the 4.2 URLs). People are constantly using the wrong CMIS service URL for their version. Maybe this will help that.
  • “CML” Web Services SOAP API. Another one that is past-due.
  • JCR / JCR-RMI. I rarely see this used.
  • URL Addressability API. Who needs it when you’ve got web scripts?
  • Social Publishing Features.
  • Blog Publishing Features

These are all positive changes and I suspect will help Alfresco a lot on the engineering and support side.

Future Removals

The release notes also include a note that NFS and jBPM are now officially deprecated. I’ve been expecting the jBPM removal for a while now. If you haven’t started moving everything to Activiti process definitions you should definitely do so now.

Getting the old stuff out of the distribution is great–I’m glad to see it. I hope that going forward Alfresco can do a better job communicating openly in a timely manner about major changes like dropping Lucene. It sounds like Richard is going to take that on as part of his new role in Product Management, which is a very good thing.

 

Posted in Alfresco | Tagged , , | 19 Comments

Content-as-a-Service Review: Prismic.io

Today I’m prismic.io logogoing to take a brief look at Prismic.io. Prismic.io is one of three commercial Content-as-a-Service (CaaS) vendors I’m reviewing as part of my CaaS round-up. If you missed my overview of the emerging CaaS market you might want to take a look and then come back.

Prismic.io was founded by Sadek Drobi, one of the co-creators of the Play framework, and Guillaume Bort, one of the Play lead developers. It launched in early October, 2013.

My overview talked about the kind of functionality you’ll find in all CaaS offerings. In this post on Prismic.io I’ll focus on the areas that typically differentiate one CaaS offering from another, namely:

  • User interface for content authors
  • Creating content types
  • Working with content via the API
  • Security
  • Pricing

Finally, I’ll wrap up with my overall impressions and takeaways.

User Interface: Let me show you to your room

Your Prismic.io account starts out with a sample repository called Les Bonnes Choses (The Good Stuff) and some sample content within that. The repository and sample content powers a sample web app for a fictitious pastry shop.

When you first log in to Prismic.io a dashboard lists the Les Bonnes Choses repository and any other repositories you’ve created. A quick side-note on repositories: When creating a new repository I was forced to pick a name no one else had taken. The repository name is used as a sub-domain under prismic.io, so I see why it has to be unique, but this seems like an odd choice for a service I’m sure the founders are hoping will grow wildly.

Anyway, clicking on a specific repository takes you into that repository’s “writing room”, which is the primary interface for not only authoring content but also configuring your repository’s various settings.

The writing room UI is certainly the most polished of any of the CaaS vendors I reviewed. While it looks great, I feel like the fly-in/fly-out animation is a bit over-used and many of the icons are un-labeled. The UI also uses colors to indicate status, but the meaning is not obvious initially.

prismic.io writing room

The overall organizational model is activity-centric rather than asset-centric. If you’ve got a team of people pushing out content in a rather flat hierarchy this will probably work well. If you like a more traditional list-of-assets-in-a-nested-folder hierarchy you’ll have to adjust. The “Live Now”, “Your Documents”, and “Favorites” help filter the list of content.

Releases provide snapshot capability

A helpful concept in Prismic.io is the “content release”. This gives you the ability to plan releases ahead-of-time. When your content authors write content they can publish it to a release. When the release is published, all content associated with that release goes live. You can control who can access to-be-published content in the API settings.

Creating content types

In Prismic.io, content types are called “Document Masks”. Document masks are defined via the Prismic.io UI by writing JSON that describes the mask. For example, here is a snippet from the Blog Post document mask:

(Can’t see the code? Click here.)

Notice that there is an object called “Blog Post” and an object called “Metadata”. These are rendered as tabs in the writing room UI. You can segment your document mask definitions however you see fit. Within the top-level object there are one or more fragments, each of which has a name, like “author”, a fieldset, a type, and a config.

Looking at the “author” config you can see that it is possible to add references to other documents in the repository. In this case the association is between the blog post and an author object. Elsewhere in the blog post example are similar links to “related posts” and “related products”.

I like that the content type definitions are expressed as JSON. It’s a lot easier to work with definitions in this way than with a UI-driven approach, especially when the content type definition language is fairly rich, as it is here.

Another nice feature of the document mask editor is the ability to quickly preview what the form will look like in writing room.

Collections and Bookmarks

Document masks define the type of documents your content authors can create. Prismic.io also gives you the ability to define collections and bookmarks.

You can think of a collection as a saved search. A collection defines a set of filters (document masks and/or tags). Content authors can then navigate to content by clicking collection names in the writing room. In fact, aside from the filters I mentioned earlier, collections are the only way to organize content.

Your front-end can use the API to get the documents matching a collection by specifying the collection name rather than building the query from scratch. I’ll show you an example of that in a minute.

A bookmark is a collection for a single document. It is kind of like an alias. Suppose you are building an app that needs to promote a “deal of the week”. The deal might change from time-to-time. One way to implement this would be to create a bookmark named “dealOfTheWeek” and point it to a document for this week’s deal. The front-end app can always ask for the same bookmark, but the object it points to can change as needed.

Fetching Content via the API

There are multiple client-side libraries you can use to work with content in Prismic.io including JavaScript, Python, Java, Ruby, .NET, PHP, and Scala. You can get more info from the Prismic.io developer page.

Here’s an example showing how to get the documents in a collection named “blogPosts”:

(Can’t see the code? Click here.)

Here’s an example getting the document for a bookmark named “dealOfTheWeek”:

(Can’t see the code? Click here.)

In the bookmark example you can see that once the document ID is fetched from the bookmark I constructed a query against the “everything” collection using the ID. This barely scratches the surface of what’s available in the query language–check out the developer docs to learn more.

One extremely disappointing thing about the Prismic.io API is that it is read-only. Yes, you read that right. Prismic.io is a content service that only lets you read content via the API, not create it.

The lack of a write API means that if you have existing content that you want to move to Prismic.io, there’s no way to do that short of hiring a bunch of temps and re-keying it into the system. Potentially worse, if your content originates from some other system and you want to use Prismic.io for delivery, there’s no way to do that either. And of course if you have user-generated content you need to persist from the front-end, you’ll have to write to some other service.

I suspect that a year from now we will all agree that a write API is table stakes for a viable CaaS offering and Prismic.io either will have fixed it or they will have relegated themselves to a microscopic corner of the market focused on tiny, greenfield, content silo projects. We shall see!

Security

By default, the API requires an OAuth token. Per repository, you can change that. For example, you might make it so that the API can fetch content from the “master” release without a key, but fetching content from future releases requires a token. Or, you can make access completely open.

Pricing

With Prismic.io, development is free. You don’t pay until you go to production. There are three paid plans available: Simple gives you all of the featues, but is limited to three users for $7/month. Team gives you unlimited users for $40/month. Enterprise gives you an SLA and a private cluster but you’ve got to negotiate pricing.

Something that’s pretty cool for all of my open source friends is that if you make your content available under the Creative Commons 4.0 license, you can use a Simple plan for free.

These plans, pricing, and details may change, so take a look at the official pricing page.

Overall impressions

The lack of a write API for programmatic content creation makes Prismic.io a non-starter for anyone having anything more than the most modest requirements, particularly in cases where there is a large volume of pre-existing content or when other systems need to write into the content store.

If you have the luxury of starting with zero content and all of your content will be created by humans using the snazzy Prismic.io user interface, then it is worth considering.

Prismic.io certainly has the most helpful developer on-ramp of any of the solutions I looked at with lots of client libraries, good examples, decent documentation, and useful starter apps. And I also liked their “free while you are developing” model, which makes it easy for teams to get started building their real solution, rather than a scaled-down PoC.

If Prismic.io puts a write API in place they’ll be a strong CaaS contender.

Posted in Content Management, Content-as-a-Service | Tagged , , | 14 Comments

The emerging Content-as-a-Service market

at-your-service-by-andrew-j-cosgriff-cropThere is an interesting new market emerging in the world of content management: Commercially-hosted Content-as-a-Service (CaaS). These are vendors who provide a service your applications can leverage for content management. Different than, “Hey look, we’re running our old school CMS in the cloud!”, CaaS is singular in focus and free from the feature bloat and operational complexity typical of the CMS your parents probably used.

At a minimum, CaaS vendors provide the following:

  • a hosted repository,
  • some mechanism for defining the types of content you need to manage,
  • a RESTful API to get content and static assets into and out of the repository,
  • a web-based user interface for managing content,
  • web hooks for taking action when content changes,
  • CDN integration for efficiently serving up static assets, and
  • an up-time and performance SLA.

You then build your web site or mobile app using any technology that suits your needs and fetch content as JSON using the API.

The best approach is to use the service to manage reusable, presentation-agnostic chunks of content. Metadata associated with the content chunks can then be used to make it easier to fetch the content for a variety of contexts. Because it is free of presentation the content can be more easily shared and reused across properties and channels.

Why not Drupal or WordPress?

CaaS vendors do not directly compete with full-featured platforms like Drupal or WordPress. There are Drupal and WordPress modules that add RESTful APIs on top of those platforms, so you could build a web or mobile site that is completely de-coupled from your Drupal back-end. Conversely, you could build a web site on top of a CaaS vendor’s service that had the same look, feel, and features of a site built with a traditional CMS. But both of those examples miss the point of CaaS which is, in a word, simplicity.

I’m not saying products like Drupal and WordPress are hard to use. On the contrary, you can install those tools and have a great looking site up-and-running in minutes. I’ve run this blog on WordPress for years and I am extremely happy with it. And sites like wordpress.com and Drupal Gardens take the hassle out of setting up your own server.

When I say the key to CaaS is simplicity I mean it strips away everything. It makes no assumptions. A hosted CaaS offering should distill content management down to its very essence, implied by the term itself: to manage content. Do nothing else. Take this chunk of JSON, free of any hint of style or presentation, and store it for me, making it available via a tool-agnostic API to my front-end channels to present as I see fit.

This pragmatic approach to content management can be implemented on-premises or on your own cloud-based servers using freely-available technology. I’ll talk more about that in another post. The nice thing about hosted CaaS is that you don’t have to assemble, test, scale, and maintain the solution yourself. Yes, you are giving up some amount of control, the degree to which varies across vendors, but many are willing and able to make that trade-off.

Business model

As with other Software-as-a-Service (SaaS) offerings, CaaS vendors charge a monthly subscription for their service. Some charge additional fees based on things such as number of content objects managed, number of content authors, and data volume. All of the market leaders I looked into provide a free-to-get-started plan to make it easy on developers in the early days of their projects.

Approach appeals to both startups and enterprises

The primary target market for CaaS vendors is clearly start-ups who are writing mobile and/or web apps that need some form of content management. Cost is usually a major factor for this segment, at least until the venture proves itself successful, but so is simplicity and efficiency. There’s no time for complex server installs, any sort of run-and-maintain burden, or pushing new app versions as content evolves. Hosted CaaS is a natural fit for these folks.

But this approach also make sense for enterprises, many of whom are still wrestling with their legacy content management vendor boat anchors (I’m looking at you, Interwoven). A hosted service that does nothing more than capture and share content chunks is a refreshing contrast to those bloated, over-priced WCM systems that require a huge staff to run and maintain yet still leave end-users frustrated.

Those systems haven’t changed much in nearly two decades and yet they remain firmly embedded in many companies where they are busy managing sites that may have been state of the art in 1999, but in a world where even the concept of a “page” is falling by the wayside, are now woefully outdated.

The content-as-a-service approach (API-first, native JSON, pragmatic, emphasis on reuse) aligns with how mobile apps and modern web sites are built and deployed as well as their content needs. This is true whether those apps are built by scrappy startups or huge enterprises.

Stay tuned for a CaaS round-up

So join me as I take a look at some of the players in the CaaS space. In the coming posts I’ll be looking at Prismic, Contentful, and Cloud CMS. If you have used any of these for your mobile or web project and you want to share your story with other ecmarchitect.com readers, do let me know.

Posted in Content-as-a-Service | Tagged , | 32 Comments

10 ways Alfresco customers can support the community

A prospective Alfresco customer recently asked me some of the ways an Alfresco customer can support the Alfresco community. Here’s what I said:

  1. Give your employees company time to answer questions in the forum or participate in the community in other ways. Perhaps set up an objective related to something on this list.
  2. Write a blog post about your experience with Alfresco (doesn’t have to be technical) then tweet the link with #Alfresco.
  3. Share your story at a meetup, Alfresco Summit, or some other conference.
  4. Make your office space available for local Alfresco meetups. If there isn’t a regular meetup in your area, start one and keep it going quarterly.
  5. If you customize Alfresco, take the customizations that don’t represent a competitive advantage to your business and contribute them to the community as freely-available addons. If you don’t want to take the time to package it up as a formal add-on, at least stick your code on github or a similar public code repository.
  6. Similar to the above, if you hire systems integrators, word your contracts such that they can contribute the code they write for you to the community. (Often the default language assigns IP ownership to the hiring party).
  7. If you choose Community Edition, give your time to the Order of the Bee so that you can help others be successful running Community Edition in production. The Order is particularly interested in Community Edition success stories at the moment.
  8. File helpful bug reports and make sure they are free of information specific to your business so that Alfresco will keep them public.
  9. If you not only find a bug but fix it, contribute the patch. One way to do that is to create a pull request on GitHub in the Alfresco Community Edition project then reference that pull request in an Alfresco Jira.
  10. If you see something wrong or missing on the wiki, log in and fix it.

Really, this list is mostly applicable to anyone that wants to participate in the community, not just customers. What did I leave out? Add more ideas in the comments.

Posted in Alfresco, Alfresco Community | Tagged , , | 17 Comments

What we’re dying to hear at Alfresco Summit

For the first time, ever, I will not be in attendance at this year’s annual Alfresco conference. I’m going to miss catching up with old friends, meeting new ones, learning, and sharing stories.

I’m also going to miss hearing what Alfresco has planned. Now, more than ever, Alfresco needs to inspire. As I won’t be there I need the rest of you to go to Alfresco Summit and take good notes for me. Here’s what you should be listening for…

What Are You Doing With the Money, Doug?

At last year’s conference Alfresco CEO, Doug Dennerline, made a quip about how much fun he was having spending all of the money Alfresco had amassed prior to his arrival. Now he’s secured another round of funding.

I think partners, customers, and the community want to hear what the specific plans are for all of that cash. In a Q&A with the community, Doug said he felt like there were too few sales people for a company the size of Alfresco’s. In the old days, Alfresco had an “inbound” model, where people would try the free stuff and call a sales person when they were ready for support. Doug is inverting that and going with a traditional “outbound” model. That obviously takes cash, and it may be critical for the company to grow to where Doug and the investors would like, but it is rather uninspiring to the rest of us. Where are the bold, audacious plans? Where is the disruption? Which brings me to my next theme to listen for…

Keep Alfresco Weird

Remember when Alfresco was different? It was open source. It was lightweight. It appealed to developers and consultants because it could approximate what a Documentum aircraft carrier could do but it had the agility of a speedboat. And, perhaps above all, it was cheap.

Now it feels like that free-wheeling soul, that maverick of ECM, that long-haired hippy love-child, born of a one-night stand between ECM and Linux, is looking in the mirror and realizing it has slowly become its father.

Maybe in some ways, growing up was necessary. Alfresco certainly feels more stable than years past. But what I want to hear is that the scrappiness is still there. I want to see some features that competitors haven’t thought of yet. I want to look into the eyes of the grown-up Alfresco and see (and believe) that the mischievous flicker of youth is still glowing, ready to shake things up.

Successfully Shoot the Gap Or Get Crushed?

Alfresco is in a unique position. There are the cloud-only players on one side who are beating Alfresco on some dimensions (ease-of-use, flawless file sync, ubiquity) and are, at least for now, losing to Alfresco on other dimensions (on-premises capability, security, business relevance). On the other side, you’ve got legacy players. Alfresco is still more nimble than they are, but with recent price increases, Alfresco can no longer beat them on price alone. That gap is either Alfresco’s opportunity or its demise.

Every day those cloud-only players add business-relevant functionality that their (huge) user base demands. They’ve got endless cash. And dear Lord, the marketing. If I have to read one more bullshit TechCrunch article about how Aaron Levie “invented” the alternative to ECM, I’m going to lose it. Bottom-line is that the cloud-only guys have their sites set on Alfresco’s bread-and-butter.

And those legacy vendors, the ones Alfresco initially disrupted with an open source model, are not only showing signs of life, but in some cases are actually introducing innovative functionality. If Alfresco turns away from the low-cost leader strategy they miss out on a huge lever needed to unseat incumbent vendors. “Openness” may not be enough to win in a toe-to-toe battle of function points.

So what exactly is the strategy for successfully shooting the gap? We’ve all heard the plans Alfresco has around providing content-centric business apps as SaaS offerings. That sounds great for the niche markets interested in those offerings. But that sounds more like one leg of the strategy, not the whole thing. I don’t think you’re fighting off Google, Microsoft, and Amazon with a few new SaaS offerings a year.

So Take Good Notes For Me

Alfresco has had two years to establish the office in the valley, to get their shit together, and to start kicking ass again. What I’m hoping is that at this year’s Alfresco Summit, they will give us credible details about how that $45 million is going to be spent in such a way as to make all of the customers, partners, employees, and community members glad they bet their businesses and careers on what was once an innovative, game-changing, start-up called Alfresco.

Take good notes and report back!

Posted in Alfresco, Alfresco Summit | Tagged , , | 24 Comments

Using Elasticsearch, Logstash, and Kibana to visualize Apache JMeter test results

In my last blog post I showed how to use Apache JMeter to run a load test against Elasticsearch or anything with a REST API. One of my recommendations was to turn off all of the Listeners so that valuable test client resources are not wasted on aggregating test results. So what’s the best way to analyze your load test results?

Our load test was running against Elasticsearch which just happens to have a pretty nice tool set for ingesting, analyzing, and reporting on any kind of data you might find in a log file. It’s called ELK and it stands for Elasticsearch, Logstash, and Kibana. Elasticsearch is a distributed, scalable search engine and document oriented NoSQL store. Logstash is a log parser that can send log data to various outputs. Kibana is a tool for defining dashboards that contain charts, graphs, and tables based on data stored in Elasticsearch.

It is really quite easy to use ELK to create a dashboard that aggregates and displays Apache JMeter test results in realtime. Here’s how.

Step One: Configure Apache JMeter to Create a CSV File

Another recommendation in my last post was to use the Apache JMeter GUI only for testing and to run your load test from the command line. For example, this runs my test named “Basic Elasticsearch Test.jmx” from the command line and writes the results to results.csv:

/opt/apache/jmeter/apache-jmeter-2.11/bin/jmeter -n -t Basic\ Elasticsearch\ Test.jmx -l ./results.csv

The results.csv file will get fed to Logstash and ultimately Elasticsearch so it can be reported on by Kibana. The Apache JMeter user.properties file is used to specify what gets written to results.csv. Here is a snippet from mine:

(Can’t see the code? Click here.)

Pay attention to that timestamp format. You want your Apache JMeter timestamp to match the default date format in Elasticsearch.

Step Two: Configure and Run Logstash

Next, download and unpack Logstash. It will run on the same machine as your test client box (or on a box with file access to the results.csv file that JMeter is going to create). It also needs to be able to get to Elasticsearch over HTTP.

There are two steps to configuring Logstash. First, Logstash needs to know about the results.csv file and where to send the log data. The second part is that Elasticsearch needs a type mapping so it understands the data types of the incoming JSON that Logstash will be sending to it. Let’s look at the Logstash config first.

The Logstash configuration is kind of funky, but there’s not much to it. Here’s mine:

(Can’t see the code? Click here.)

The “input” part tells Logstash where to find the JMeter results file.

The “if” statement in the “filter” part looks for the header row in the CSV file and discards it if it finds it, otherwise, it tells Logstash what columns are in the CSV.

The “output” part tells Logstash what to do with the data. In this case we’ll use the elasticsearch_http plugin to send the data to Elasticsearch. There is also one that uses the native API but when you use that, you have to use a specific version combination of Logstash and Elasticsearch.

A quick side note: In our case, we were running a load test against an Elasticsearch cluster. We use Marvel to report on the health of that cluster. To avoid affecting production performance, Marvel sends its data to a separate monitoring cluster. Similarly, we don’t want to send a bunch of test result data to the production cluster that is being tested, so we configured Logstash to send its data to the monitoring cluster as well.

That’s all the config that’s needed for this particular exercise.

Here are a couple of Logstash tips. First, if you need to see what’s going on you can add a sysout to the configuration by adding this line between ‘output {‘ and ‘elasticsearch_http {‘ before starting logstash:

stdout { codec => rubydebug }

The second tip is about re-running Logstash and forcing it to re-parse a log file it has already read. Logstash remembers where it is in the log. It does this by writing a “sincedb” file. So if you need to re-parse the results.csv file, clear out your sincedb files (mine live in ~/.sincedb*). You may also have to add “start_position => beginning” to your Logstash config on the line immediately following the path statement.

Okay, Logstash is ready to read the Apache JMeter CSV and send it to Elasticsearch. Now Elasticsearch needs to have an index and a type mapping ready to hold the log data. If you’ve spent any time at all with Elasticsearch you should be familiar with creating a type mapping. In this case, what you want to do is create a type mapping template. That way, Logstash can create an index based on the current date, and it will use the correct type mapping each time.

Here is the type mapping I used:

(Can’t see the code? Click here.)

Now Logstash is configured to read the data and Elasticsearch is ready to persist it. You can test this at this point and verify that the data is going all the way to Elasticsearch. Start up Logstash like this:

/opt/elasticsearch/logstash-1.4.2/bin/logstash -f ./jmeter-results.conf

If it looks happy, go start your load test. Then use Sense (part of Marvel) or a similar tool to inspect your Elasticsearch index.

Step 3: Visualize the Results

Now it is time to visualize all of those test results coming from the load test. To do that, go download and unpack Kibana. I followed a tip in this blog post and unpacked it into $ES_HOME/plugins/kibana/_site on my monitoring cluster but you could use some other HTTP server if you’d rather.

Now open a browser and go to Kibana. You can link to a Logstash dashboard, a sample dashboard, an unconfigured dashboard, or a blank dashboard. Pick one and start playing with it. Once you get the hang of it, create your JMeter Dashboard starting from a blank dashboard. Here’s what our dashboard looked like when it was done:

Apache JMeter Results DashboardClick to see the screenshot in all of its glory.

Using Logstash and Kibana we can see, in realtime, the throughput our Apache JMeter test is driving (bottom left) and a few different panels breaking down response latency. You can add whatever makes sense to you and your team. For example, we want all of our responses to come back within 250 ms, so the chart on the bottom right-hand corner shows how we’re doing against that goal for this test run.

One gotcha to be aware of. By default, Kibana looks at the Elasticsearch timestamp. But that’s the time that Logstash indexed the content, not the actual time that the HTTP request came back to Apache JMeter. That time gap could be small if you are running Logstash while your test is running and your machine has sufficient resources, or it could be very large if you wait to parse your test results for some time after the test run. Luckily, the timestamp field that Kibana uses is configurable so make sure all of your graphs are charting the appropriate timestamp field, which is the “time” field that JMeter writes to the CSV file.

Posted in Elasticsearch | Tagged , , , , | 12 Comments

Using Apache JMeter to Test Elasticsearch (or any REST API)

I’m helping a client streamline their Web Content Management processes, part of which includes moving from a static publishing model to a dynamic content-as-a-service model. I’ll blog more on that some other time. What I want to talk about today is how we used Apache JMeter to validate that Elasticsearch, which is a core piece of infrastructure in the solution, could handle the load.

Step 1. Find some test data to index with Elasticsearch

Despite being a well-known commerce site that most of my U.S. readers would be familiar with, my client’s site’s content requirements are relatively modest. On go-live, the content service might have 10 or 20 thousand content objects at most. But we wanted to test using a data set that was much larger than that.

We set out to find a real world data set with at least 1 million records, preferably already in JSON, as that’s what Elasticsearch understands natively. Amazon Web Services has a catalog of public data sets. The Enron Email data set looked most promising.

We ended up going with a news database with well over a million articles because the client already had an app that would convert the news articles into JSON and index them in Elasticsearch. By using the Elasticsearch Java API and batching the index operations using its bulk API we were able to index 1.2 million news articles in a matter of minutes.

Step 2: Choosing the Testing Tool & Approach

We looked at a variety of tools for running a load test against a REST API including things like siege, nodeload, Apache ab, and custom scripts. We settled on Apache JMeter because it seemed like the most full-featured option, plus I already had some familiarity with the tool.

For this particular exercise, we wanted to see how hard we could push Elasticsearch while keeping response time within an acceptable window. Once we established our maximum load with a minimal Elasticsearch cluster, we would then prove that we could scale out roughly linearly.

Step 3: Defining the Test in Apache JMeter

JMeter tests are defined in JMX files. The easiest way to create a JMX file is to use the JMeter GUI. Here’s how I defined the basic load test JMX file…

First, I created a thread group. Think of this like a group of test users. The thread group defines how many simultaneous users will be running the test, how fast the ramp-up will be, and how many loops through the test each user will make. You can see by the screenshot below that I used parameters for each of these to make it easier to change the settings through configuration.

JMeter Thread GroupWithin the thread group I added some HTTP Request Defaults. This defines my Elasticsearch host and port once so I don’t have to repeat myself across every HTTP request that’s part of the test.

JMeter HTTP Request DefaultsNext are my User Defined Variables. These define values for the variables in my test. Look at the screenshot below:

JMeter User Defined VariablesYou’ll notice that there are three different kinds of variables in this list:

  1. Hard-coded values, like 50 for rampUp and 2000 for loop. These likely won’t change across test runs.
  2. Properties, like thread, ES_HOST, and ES_PORT. These point to properties in my JMeter user.properties file.
  3. FileToString values, like for PAGE_GEO_QUERY. These point to Elasticsearch query templates that live in JSON files on the file system. JMeter is going to read in those templates and use them for the body of HTTP requests. More on the query templates in a minute.

The third configuration item in my test definition is a CSV Data Set Config. I didn’t want my Elasticsearch queries to use the same values on every HTTP request. Instead I wanted that data to be randomized. Rather than asking JMeter to randomize the data, I created a CSV file with randomized data. Reading data from a CSV to use for the test run is less work for JMeter to do and gives me a repeatable, but random, set of data for my tests.

JMeter CSV Data Set ConfigYou can see that the filename is prefaced with “${CSVDATA_ROOT}”, which is a property declared in the User Defined Variables. The value of it resides in my JMeter user.properties file and tells JMeter where to find the CSV data set.

Here is a snippet of my user.properties file:
ES_HOST=127.0.0.1
ES_PORT=9200
ES_INDEX=content-service-content
ES_TYPE=wcmasset
THREAD=200
JSONTEMPLATE_ROOT=/Users/jpotts/Documents/metaversant/clients/swa/code/es-test/tests/jsontemplates
CSVDATA_ROOT=/Users/jpotts/Documents/metaversant/clients/swa/code/es-test/tests/data

Next comes the actual HTTP requests that will be run against Elasticsearch. I added one HTTP Request Sampler for each Elasticsearch query. I have multiple HTTP Request Samplers defined–I typically leave all but one disabled for the load test depending on the kind of load I’m trying to test.

JMeter HTTP RequestYou can see that I didn’t have to specify the server or port because the HTTP Request Defaults configuration took care of that for me. I specified the path, which is the Elasticsearch URL, and the body of the request, which resides in a variable. In this example, the variable is called PAGE_GEO_DATES_UNFILTERED_QUERY. That variable is defined in User Defined Variables and it points to a FileToString value that resolves to a JSON file containing the Elasticsearch query.

Okay, so what are these query templates? You’ve probably used curl or Sense (part of Marvel) to run Elasticsearch queries. A query template is that same JSON with replacement variables instead of actual values to search for. JMeter will merge the test data from the randomized test data CSV with the replacement variables in the query template, and use the result as the body of the HTTP request.

Here’s an example of a query template that runs a filtered query with four replacement variables used as filter values:

(Can’t see the code? Click here)

JMeter lets you inspect the response that comes back from the HTTP Request using assertions. However, the more assertions you have, the more work JMeter has to do, so it is recommended that you have as few as possible when doing a load test. In my test, I added a single assertion for each HTTP Request that looks only at the response header to make sure that I am getting back JSON from the server.
JMeter Response AssertionJMeter provides a number of Listeners that summarize the responses coming back from the test. You may find things like the Assertion Results, View Results Tree, and Summary Report very helpful while you are writing and testing your JMX file in the JMeter GUI, but you will want to make sure that all of your Listeners are completely disabled when running your load test for real.

At the end of this step I’ve got a repeatable test that will run 400,000 queries against Elasticsearch (that’s 200 threads x 2,000 loops x 1 enabled HTTP request). Because everything is configurable I can easily make changes as needed. The next step is running the test.

Step 4: Run the test

The first thing you have to deal with before running the test is driving enough traffic to tax your server without over-driving the machine running JMeter or saturating the network. This takes some experimentation. Here are some tips:

  • Don’t run your test using the JMeter GUI. Use the command line instead.
  • Don’t run Elasticsearch on the same machine that runs your JMeter test.
  • As mentioned earlier, use a very simple assertion that does as little as possible, such as checking the response header.
  • Turn off all Listeners. I’ll give you an approach for gathering and visualizing your test results that will blow those away anyway.
  • Don’t exceed the maximum recommended number of threads (users) per test machine, which is 300.
  • Use multiple JMeter client machines to drive a higher concurrent load, if needed.
  • Make sure your Elasticsearch query is sufficient enough to tax the server.

This last point was a gotcha for us. We simply couldn’t run enough parallel JMeter clients to stress the Elasticsearch cluster. The CPU and RAM on the nodes in the Elasticsearch cluster were barely taxed, but the JMeter client machines were max’d out. Increasing the number of threads didn’t help–that just caused the response times JMeter reported to get longer and longer due to the shortage of resources on the client machines.

The problem was that many of our Elasticsearch queries were returning empty result sets. We had indexed 1.2 million news articles with metadata ranges that were too broad. When we randomized our test data and used that test data to create filter queries, the filters were too narrow, resulting in empty result sets. This was neither realistic nor difficult for the Elasticsearch server to process.

Once we fixed that, we were able to drive our desired load with a single test client and we were able to prove to ourselves that for a given load driven by a single JMeter test client we could handle that load with an acceptable response time using an Elasticsearch cluster consisting of a single load-balancing node and two master/data nodes (two replicas in total). We scaled that linearly by adding another 3 nodes to the cluster (one load-balancer and two master/data nodes) and driving it with an additional JMeter client machine.

Visualizing the Results

When you do this kind of testing it doesn’t take long before you want to visualize the test results. Luckily Elasticsearch has a pretty good offering for doing that called ELK (Elasticsearch, Logstash, & Kibana). In my next post I’ll describe how we used ELK to produce a real-time JMeter test results dashboard.

Posted in Elasticsearch, Search | Tagged , , | 15 Comments

Independent Alfresco community forms to guarantee freely-available open source ECM forever

Something very interesting is afoot in the Alfresco community. A subset of the community has formed an independent organization called The Order of the Bee, aimed at making sure the freely-available open source platform for Enterprise Content Management stays freely-available, forever.

The group of individuals, who hail from all parts of the globe, are customers, partners, independent individuals, and even Alfresco Software employees. Despite varied backgrounds and interests, they all have at least one thing in common: They want to make sure that Alfresco Community Edition stays free and open.

Alfresco has always provided what is essentially an “open core” distribution. The on-premises software ships in two editions: Community Edition is the freely-available software licensed under the LGPLv3 and Enterprise Edition is commercially licensed. But lately there has been growing concern amongst community members that Alfresco Software, the commercial company behind the product, doesn’t always have the best interests of the community in mind. Thus was born The Order of the Bee, a reference to the community keynote I delivered at Alfresco Summit 2013.

The Order began forming about the same time I stepped down as Alfresco’s Chief Community Officer. While the timing is uncanny, and I am a founding member of the Order, that timing was not planned and is coincidental.

Check out the web site to see what the Order is all about. If you feel compelled to participate, be sure to submit the contact form. And follow the group on Twitter.

Posted in Alfresco Community | 13 Comments