Category: Content Management

Enterprise Content Management (ECM), Web Content Management (WCM), Document Management (DM). Whatever you call it this category covers market happenings and lessons learned.

New version of “Alfresco Mention” add-on

I’ve pushed a new version of Alfresco Mention. It’s an add-on that gives users the ability to use @mentions to cause the specified user to be notified. The old version supported @mentions in document comments. The new version expands on that to also include discussion topics and replies.

The add-on still requires users to know the username of the user they are mentioning–there is nothing in the UI that will help them figure that out, like a pop-up or a typeahead or anything like that. If you want to add it, pull requests are welcome.

Even without the UI, multiple customers find this add-on handy. Instead of making a comment and then manually sending an email to let their co-worker know they’ve seen a document, for example, they just @mention them, and the add-on sends them a notification with a link to the item.

This add-on is known to work with 5.2 and 6.1, either Enterprise or Community.

Alfresco acquires Technology Services Group

Interesting news today in the Alfresco world. The software vendor has acquired Technology Services Group, a Chicago-based Alfresco partner that delivers both professional services and software solutions to the document management and case management market (press release).

TSG has been in the ECM market for a long time. I remember when Documentum was their bread-and-butter. But they’ve definitely widened their aperture over the years and have expanded beyond traditional consulting and into product development, which is typically hard for a professional services firm to get right. It will be interesting to see how TSG’s team, which includes people who are often simultaneously client-facing consultants and product-focused engineers will be integrated into Alfresco, where professional services and product development are much more segregated.

TSG has a deep bench of talented developers, but beyond the talent acquisition, Alfresco can now add some additional products to the price sheet. OpenContent and OpenAnnotate products will likely be the first that Alfresco will push.

OpenContent provides a user interface on top of content repositories such as Alfresco and others. The demos I have seen were very focused on Case Management, but other use cases can be satisfied.

OpenAnnotate, as the name suggests, allows users to markup documents and videos without leaving their web browser. Alfresco has other partners that provide annotation add-ons, but I’ve never really dug into any of them, including OpenAnnotate.

It wasn’t mentioned in the press release, but OpenMigrate is another offering from TSG where I see interest from clients. It can help with complex migrations from one ECM platform to another. Making it easier for clients to move off of FileNet or Sharepoint to Alfresco is a no-brainer. It might be a bit frustrating when customers use the tool to go the other way, but Alfresco can use that as a selling point: If you don’t like us, here’s a tool to help you leave.

TSG has been doing a lot of promotion lately around its own NoSQL-based and cloud-native document management solution, which can definitely be seen as an alternative to Alfresco’s more traditional repository built on a relational back-end. Alfresco has been busy working to decompose its monolithic stack into a set of more focused, containerized services, but it is still fundamentally the same architecture that has been in place since 2005. Dave Giordano, TSG’s owner and founder, will be taking on the role of Chief Strategy Officer for Alfresco. Might he be able to convince the rest of Alfresco that such a fundamental architectural shift is necessary for Alfresco? We shall see.

I’ll also be curious to see if any of the TSG offerings become freely-available as open source. TSG has always used the “Open” moniker on their product names, and briefly dabbled in making their code available as open source, but sources say they saw only downside, so they made their source code available only to their customers. Maybe the acquisition will mean that TSG’s products will shift to open core. That would certainly open up the TSG product catalog for implementation by other partners and potentially the community beyond that.

Every acquisition will face integration challenges. I hope those are met and overcome quickly and that this is more than a superficial product grab because this could be a chance for a much-needed injection of vision and innovation into Alfresco.

Alfresco Developer Series Tutorials for SDK 4.0

Back in February, when SDK 4.0 was still in beta, I upgraded the Alfresco Developer Series tutorials on a branch to help early adopters learn how to use the platform. I was holding off on merging the branch into master until SDK 4.0 was out of beta and until I was seeing more end-user adoption of Alfresco 6.x.

Since then, SDK 4.0 has left beta. The vast majority of my customers are still running 5.2, but I’m seeing enough interest in 6.x on StackOverflow and in the forums that I decided it was time to merge to master.

If you are looking for the older tutorials based on SDK 3.0, they are still there–just use the SDK-3.0.1 tag.

In addition to the source code projects, the tutorial content itself also changed slightly. Those HTML pages have been re-generated, published, and linked to from the Alfresco Developer Series home page.

Now when developers come across the tutorials they’ll be referencing source code and instructions that are up-to-date with the latest SDK.

Gartner uses false equivalency to warn against building custom content services

I guess it shouldn’t be too surprising that Gartner, a firm that gets paid by the software vendors it reviews, would be against the idea of custom content services. If companies stopped paying ridiculously high license fees and maintenance for bloated ECM platforms it could cut into Gartner’s bottom-line.

What is surprising is that they would use such a blatant false equivalency in describing why they think it is a bad idea to build your own content services platform. A recent blog post by Marko Sillanpaa starts out with this little gem:

You would not build a database from scratch…why would you ever accept the idea of building a content services platform from scratch?

Come on, Marko, you can do better.

Yes, at a very high level, one might describe a content management repository as a “database for documents”. But when the conversation shifts from needing to convey the basic concept of what a content repository is to why we might choose to custom build some or all of a content services platform, the comparison breaks down.

A relational database is a fundamental, commoditized element of a technical stack. For most uses, when someone needs to store rows and columns of data and then query for that data, any database will do. There may be differences in how databases perform, the platforms they support, or specific features they implement (clustering, replication), but, for the most part, they are the same.

Further, a relational database is very precise in the functionality it delivers. Only in the most niche applications would you hear someone say, “That database does too much for us–it delivers way more functionality than we need.”

So, for the vast majority of cases, it would never make sense to build a custom database for our larger application because so many are available that do exactly what we need and no more.

A content services platform, however, couldn’t be more different in these respects.

First, agreeing on what “content services platform” means is a challenge. Marko says, “the repository is more than file storage. It’s about version control, editor integrations, workflow, records management, etc.”. Really? I have several clients who require exactly none of those features but still need a content services platform.

Next is the issue of granularity. When you think about building a custom content-centric application, no one looks at FileNet, Documentum, or Alfresco and says, “Yeah, we’re just going to drop those in to our solution”. That’s because those platforms want to be the solution. Unlike a database, which is happy to focus on rows, columns, indices, and queries, legacy ECM platforms tried to do everything. My friends over at TSG do a good job of summarizing some of this in their blog post (which is how I came across the Gartner post in the first place). I love this quote:

modern content services needs to be rebuilt from the ground up focused on a new approach rather than rely on paradigms from the 1990’s that haven’t worked

I couldn’t agree more. And if the legacy ECM vendors aren’t going to modernize and rightsize their platforms, then service providers will continue to do so, and we’re going to continue to see content-as-a-service vendors like CloudCMS and Contentful continue to do well.

Now, like Gartner, I obviously have my own bias–it is more interesting to me to build custom content-centric solutions for clients than it is to try to turn aircraft carriers (ECM platforms) into speedboats (nimble, fit-to-purpose customized solutions). But maybe in the age-old buy-versus-build discussion we can avoid fallacious reasoning when advising customers which path is best for them?

If not, and Gartner wants some ideas, here’s a list of some good false equivalencies I came up with that they are welcome to use:

  • We would never build a custom web server, so why would we ever build a custom web application?
  • We would never build a custom enterprise service bus, so why would we ever build a custom order fulfillment system?
  • We would never build a custom desk, so why would we ever build a custom office building?
  • We would never build a custom keyboard, so why would we ever build a custom gaming rig?
  • We would never build a custom water heater, so why would we ever build a custom house?

ECM: You Ain’t Gonna Need It

Sometimes I feel like I spend as much time telling people why they don’t need an Enterprise Content Management (ECM) platform as I do helping people implement one. Today’s blog post by my friends over at TSG underscores another use case where a full-blown ECM platform may be overkill: Serving as a repository for huge volumes of files.

Their blog post says they are loading 20,000 documents per second into DynamoDB with a goal of getting up to 11 billion documents. The actual files are stored in S3, and that load rate does not include uploading files to S3 buckets, so this doesn’t exactly mimic a real-world bulk document import scenario, but that’s not what TSG was trying to test.

TSG correctly points out that it is the metadata repository, which legacy vendors often base on relational databases, that struggles in high-volume implementations, so their test focuses on the ability of Dynamo to store a high-volume of data while maintaining performance.

Let me shift slightly from case management, which is what TSG focuses on, to the more general problem firms have when they generate 100’s of millions of files that they need to manage and deliver to their stakeholders.

I have seen multiple clients and prospects who have very demanding requirements in terms of data volume while at the same time requiring very little in terms of what I’d call traditional ECM functionality. Often, they need nothing more than a RESTful API to get data into and out of the repository and some basic searching across a few metadata fields. Insurance companies and financial services companies are two examples of industries with lots of such use cases.

TSG is using native AWS services to manage metadata (DynamoDB) and file storage (S3), but this can also be done on-premises leveraging either commercial or open source solutions to provide a nearly-infinite scale, highly-performant, fully redundant solution.

The important part here is that you do not need an ECM platform to do this. In fact, many high-profile customers of legacy ECM vendors like Documentum, Alfresco, and FileNet, are actively moving away from those platforms for use cases like these.

Why? Companies tell me those platforms are proving too difficult to scale, too complex to implement and maintain, and too expensive in terms of licensing cost.

My company, Metaversant, can build you a minimal content management system in your own data center or in AWS. We call our solution Magritte. It includes:

  • Scalable and distributed object storage built on commodity disk drives
  • Flexible content model
  • REST API for CRUD functions
  • Basic permissions on objects
  • Metadata search (and full-text search if you need it)
  • Ability to manage billions of objects
  • $0 in mandatory licensing costs (commercial support of underlying open source components is available at the customer’s option)

That’s enough functionality for many use cases. In fact it seems like the exact right amount of functionality for managing things like customer statements, contracts, and agreements in large insurance or financial services companies.

What legacy ECM platforms include–to be sure, for a cost, in terms of both real dollars and complexity–are things like:

  • Support not for just object storage but also additional file system types (Glacier, NAS, SAN, EMC Centera)
  • Extensible, formal (schema-based) content model
  • REST API for every aspect of the platform
  • Foundational/native APIs, client libraries, or SDKs
  • Complex, fine-grained access control lists, including ability to support inheritance and “deny”
  • Extensible platform (hooks where developers can add code to alter or enhance the platform functionality)
  • Support for additional file protocols such as FTP, SMB, IMAP, SMTP, WebDAV
  • Support for content repository standards such as JCR and CMIS
  • Transformation engine (generates previews and thumbnails)
  • Workflow engine
  • Rules engine
  • Analytics, reporting, & dashboards
  • Integrations with third-party systems such as Outlook, SAP, Salesforce, Google Docs, Box, Dropbox
  • Web-based user interface
  • Application components/framework for extending the web UI or for building new custom web UIs
  • Forms engine
  • Mobile applications
  • Desktop Sync
  • Business-specific applications (Records Management, Media Management, Reporting)

Can those traditional ECM features be added to a minimal content management solution like Magritte? Of course. And if you add enough of them you might be better off with a legacy ECM vendor (assuming you can get it to scale to meet your needs, which may be a big assumption depending on your operational constraints).

But if you don’t need a lot or any of those additional features, why start out implementing and paying for an entire aircraft carrier if what you really need is a speedboat?

In software there’s a phrase, “You Ain’t Gonna Need It”, which aims to defer development of features until they are actually needed instead of developing them now for some future need, which may never materialize. In ECM, you might take on the complexity of an entire platform because the “E” in “Enterprise” makes you think you are implementing something the entire company will leverage. That hasn’t panned out–just look at how many companies have multiple so-called “Enterprise” Content Management systems.

Instead of continuing to install these giant platforms, most of which go unused, let’s implement right-sized solutions on top of clusterable, scalable, open source components that talk to each other via API’s. ECM: You Ain’t Gonna Need It.

(Updated 6/3/2020 to fix a minor typo)

Alfresco Developer Series Tutorials Upgraded to SDK 4.0

Alfresco SDK 4.0 has not been released yet, but it is in beta and developers are starting to use it.

To help developers who want to learn the platform using SDK 4.0 and Alfresco 6, I’ve created an sdk-4.0 branch on the Alfresco Developer Series tutorials project. The tutorials on that branch have been upgraded to the new SDK 4.0 structure.

Aside from the project reorganization and the introduction of the Docker modules the underlying code did not have to change.

In addition to the tutorial source code projects, I also updated the tutorial content bodies to reflect the Docker and Docker Compose features of SDK 4.0 and the minor changes regarding how projects are run and tested. While I was in there I enhanced the markdown with syntax highlighting for code snippets.

Once SDK 4.0 goes GA I will merge the sdk-4.0 branch in the tutorials project into master. Those needing tutorials that leverage the old SDK 3.0.1 will still be able to get them via the sdk-3.0.1 tag.

Incidentally, I used the Alfresco SDK Upgrader script to upgrade these projects and it saved me a lot of time. If you want to know more about upgrading your 3.0 projects to 4.0, either manually or via the script, see this blog post on upgrading.

So, if you are going to learn Alfresco using the latest release, give the updated tutorials a try. If you find any issues, please create an issue on GitHub, or, better yet, fix the issue and open a pull request.

Photo Credit: Veere: tools, by docman, CC BY-NC 2.0

I’m joining the Cloud CMS advisory team

I’ve known Michael Uzquiano, Cloud CMS CTO, and Malcolm Teasdale, Cloud CMS CEO, for at least a decade, so it’s been fun to watch as Cloud CMS has matured over the years.

As a Cloud CMS partner, I often talk to them about issues or ideas related to projects I’m doing for clients with Cloud CMS. Over the years we’ve occassionally found time to chat about more strategic topics. Both the tactical and strategic conversations have been valuable for both sides, but they were definitely ad hoc.

Recently, Malcolm asked if I’d be up for making those conversations more regular and potentially a bit deeper by making me part of the advisory team. Liking both the people and the tech, this sounded like a great idea to me.

Nothing changes for my clients or my company, Metaversant Group, other than the additional insight I might gain from discussions with the Cloud CMS team.

As far as this blog goes, I’ll continue to write about Cloud CMS and others in this space as I’ve done in the past, on topics that interest me. My friends at Cloud CMS might help inform me on technical details for Cloud CMS-related posts, but they won’t be making any editorial decisions. I know my readers value my transparency and openness and that will continue.

I think de-coupled content management is just starting to hit its stride and I’ve enjoyed working with Cloud CMS thus far. I’m definitely looking forward to continuing the discussions with the Cloud CMS crew about company and product direction.

Upgrading from Alfresco SDK 3.0 to 4.0

Alfresco recently announced the beta release of SDK 4.0. The release is long-overdue. Developers had become frustrated that Alfresco published generally-available releases of the platform while seemingly ignoring the fact that there was no compatible SDK that could be used to customize and extend version 6.x of the platform. At DevCon this week, Alfresco said they recognize that was not handled as best as it could have been and pushed hard to get the new release out.

Version 4.0 of the SDK uses the same familiar structure that developers used in previous versions and continues to use Maven for dependency management and packaging. But there are some significant changes happening under-the-covers.

Prior releases of the SDK used an embedded version of Tomcat and an in-memory database to allow devs to launch and run Alfresco, along with their customizations, without having to separately download and install the platform. Adding in a tool that does hot Java class reloading such as JRebel or Hotswap Agent adds a greater productivity boost because changes to things like actions, behaviors, and web scripts can be run immediately, with no restart in most cases.

From a developer’s perspective, your “flow” doesn’t change–the SDK still bootstraps your project into a familiar structure and runs Alfresco with your changes, along with hot-swapping, if you want. The SDK no longer uses embedded Tomcat and H2. Instead, it relies on Docker and Docker Compose. When developers run an SDK project, images from Docker Hub (Community Edition) or Quay.io (Enterprise Edition) are downloaded, overlayed with the developer’s customizations, and launched.

If that sounds painful, relax, it’s not that bad. And the SDK 4.0 docs have everything you need to get productive quickly.

If you’re like me, though, you have many projects, open source and otherwise, that you must now upgrade so you can test them against 6.x. Doing it manually isn’t terrible but it is a bit mind-numbing and can be error-prone. Never fear, though; for help, read on!

Lots of projects to upgrade? DevCon hackers have you covered!

I had the pleasure of participating in the Hack-a-Thon at DevCon again this year, organized, as usual, by community icon, Axel Faust. I wasn’t sure what project I would work on when I woke up that morning, but when I saw there was a group of folks interested in working with SDK 4.0, I joined the team.

First, the group of eight fellow hackers started testing the SDK. For many it was their first time working with SDK 4.0. Windows, MacOS, and Linux were all represented and the group covered the various types of archetypes (all-in-one, repo-only, share-only). Every developer was successful bootstrapping a project and launching the Docker containers using the script that ships with the SDK.

JRebel has worked fine for me in SDK 4.0 for both Community Edition and Enterprise Edition, but no one in the group could get HotSwap Agent, the free alternative to JRebel, working. Filip promised to file a issue on Github, so hopefully it is easy to fix.

While the crew of testers were hammering away, I documented the steps needed to upgrade from 3.0 to 4.0 and filed a pull request to add that to the already-helpful SDK 4.0 documentation. Ole has already merged it. Thanks, Ole!

With the upgrade steps documented and the rest of the team familiar with the tool, we moved on to the next phase: Automating the upgrade. The result is a new Github project called alfresco-sdk-upgrader that you can leverage to upgrade your own SDK projects. It isn’t as full-featured as we wanted. For example, if you’ve customized your SDK pom files you’ll need to manually merge those changes. But I think it is still useful in its current state.

Here’s a video of the script in action:

You can see that I start out with a project based on SDK 3.0.1. The alfresco-sdk-upgrader script does everything needed to convert it from SDK 3.0.1 to 4.0. After it runs, the video shows the new project structure and then you can see that the run script fires up the Docker containers.

Mitch and Omar did a lot of work on the script. I don’t think any of us were planning on writing bash when we arrived that morning, but they happily rolled up their sleeves and knocked it out. We’d love it if you’d test it out on your projects and, if you feel so inclined, make it better by filing a pull request.

Even if you don’t want to use the script, you should give SDK 4.0 a try while it is still in beta so you can provide your feedback. And, if you’re curious about what other fun stuff got cranked out a the Hack-a-Thon, take a look here.

Photo Credit: Upgrade in Progress by Ged Carroll, CC-by-2.0

Alfresco needs business-focused innovation to reclaim its “visionary” status

Alfresco DevCon is coming up, so I’ve been wondering about what kind of new and innovative things Alfresco might be sharing with us at the conference. That got me thinking about whether or not Alfresco is still innovating and if those innovations need to appeal to developers or business users for Alfresco to stay relevant. My opinion on that might surprise you. Let me explain.

Back in 2010 I wrote a blog post called “Alfresco, NoSQL, and the future of ECM“. In that post I pointed out that NoSQL offered many features attractive to developers of content-centric solutions such as the lack of a schema, ease of replication, and their ability to scale. I predicted that new content management and document management vendors would enter the market with native NoSQL solutions, existing vendors would start to take advantage of NoSQL, and customers would develop their own content-centric solutions built on NoSQL instead of relational repositories.

It didn’t take long for all of these predictions to come true (not that they were much of a stretch!). New content management players like Contentful and CloudCMS arrived (See “The Emerging Content-as-a-service market“, 2014), both of which rely heavily on NoSQL stores.

Nuxeo, who Gartner named a visionary in the ECM space, now offers MongoDB instead of or along-side a relational database. Nuxeo claims to be the most performant content services platform on the market, due in large part to their move to a NoSQL back-end.

Alfresco never did anything serious around NoSQL but it is interesting to note that one of their partners did. Chicago-based Technology Services Group made a big investment in Hadoop back in 2015, essentially offering it as a back-end alternative to Documentum and Alfresco as part of their OpenContent offering. TSG has multiple clients on Hadoop including a not-for-profit, a pharmaceutical firm, and a nuclear power plant. According to TSG’s founder and president, Dave Giordano, his clients running the Hadoop-based repository couldn’t be happier. Now the firm has added Amazon’s DynamoDB as an additional back-end repository option.

TSG is providing Hadoop and Dynamo as back-end options for their business solutions. But what about something developers can take advantage of when building their own solutions? Some colleagues and I did some experimentation a couple of years ago around building a simple content repository using DynamoDB for metadata storage, Amazon S3 for object storage, and Lambda for the API and it worked pretty well.

Sometimes all you really need is a place to store digital objects and a place to manage metadata about those objects. You don’t need a full ECM platform installation to do that. When TSG sells OpenContent it is the solution they are selling–the back-end is just an implementation detail.

Which brings me back to that 2010 blog post. In addition to predictions about NoSQL eventually being a featured architectural component of content management systems, I also wondered what the rise of NoSQL meant for Alfresco:

“Where does that leave Alfresco? It seems their positioning as a developer-focused, “Internet-scale” repository ultimately leads to them competing directly against NOSQL repositories for certain types of applications.” — Jeff Potts, 2010

I actually worked at Alfresco around this time. Part of my job was to reach out to developers to convince them to build their solutions on top of Alfresco. The broader developer audience was not on board. A big reason is that those developers were already using things like MongoDB and CouchDB for JSON stores. These were much lighter, more flexible, and far more scalable. There is just no comparison between native JSON repositories and Alfresco by these measures.

Several years later, I still get inquiries from people that can be summarized as, “We’re thinking of building this custom solution that has nothing to do with managing office documents but does need an unstructured repository. Do you think Alfresco would be a good fit?”. The answer is usually no. This isn’t a knock on Alfresco–it’s just about purpose-of-fit. If you don’t need versioning, check-in/check-out, online editing, or transformations, why pay the overhead?

So, to answer the question from my past self about where that leaves Alfresco, it was never really a contest. Developers adopted technologies like MongoDB and others in droves. Rather than a light-but-scalable piece of infrastructure that devs routinely incorporate into larger solutions, Alfresco is a full-fledged platform–with all of the good and bad that entails–whose price tag and footprint demand serious justification before being implemented.

What this means for Alfresco today

Back when I wrote the NoSQL blog post, Alfresco thought its most likely entry point was via developers who needed a repository, grabbed Community Edition, and eventually converted into paying customers. But the very broad population of developers have other technologies–not Alfresco–top of mind when it comes to building custom applications. People are continuing to download Alfresco, but I think the “who” and “why” has shifted.

If you look at what Alfresco has done lately, the 6.0 and 6.1 releases are mostly about customization and deployment. The Application Developer Framework (ADF), the new Docker containers and Helm charts in 6.0, and SDK 4.0, which is heavily Docker-based, are all welcome additions.

Absolutely, the platform has to be easier to extend, customize, and deploy, so I’m glad to see that being addressed, but my customers don’t actually care as much about those things. There have been some great new end-user features added recently, such as the Search and Insights Engine and the Digital Workspace, but more are needed if Alfresco wants to reclaim its “visionary” status.

Alfresco is not in the “content repository” market. Developers can create a schema-less, scalable, replicated repository easily with NoSQL and other technologies. Scoff at the buzzwords if you want, but I think “Digital Business Platform” actually describes Alfresco really well. The key is that a “Digital Business Platform” isn’t for developers, although they need to extend and customize it. The platform is for business users.

At DevCon, we’re going to see a ton about ADF and Docker, and those topics are important to the DevCon audience. But my customers are looking for innovative, business-friendly features ready to use, out-of-the-box. It may sound strange coming from me, but those end-user innovations are what will keep Alfresco relevant and appealing to the market they are actually in.

Photo Credit: Mirror, by Vadzim Vinakur, CC BY-NC 2.0