Category: Apache Jackrabbit

Apache Jackrabbit is the reference implementation of the JSR-170 (Java Content Repository API) specification.

Back to the Future of Content Repositories

mcflyFive years ago I wrote a blog post called, “Alfresco, NoSQL, and the Future of ECM“. Today is “Back to the Future Day”–the exact date Marty McFly time-traveled to in the movie Back to the Future. The movie made many observations about what life would be like on October 21, 2015–some were spot on, some not so much. I thought it fitting that we take a look at my old blog post and see where we are now with regard to content repositories and NoSQL.

One point of the post was that NOSQL might be a more fitting back-end for content repositories than relational:

But why shouldn’t the Content Management tier benefit from the scalability and replication capabilities of a NOSQL repository? And why can’t a NOSQL repository have an end-user focused user interface with integrated workflow, a form service, and other traditional DM/CMS/WCM functionality? It should, it can and they will.

This has definitely turned out to be the case. New content management solution vendors like CloudCMS have built their platform on NOSQL technology while older vendors, such as Nuxeo, have started to integrate NOSQL into their solutions.

Open source projects are also taking advantage of the technology. Apache Jackrabbit provides an implementation of the JCR standard. Its “next generation” offering, Jackrabbit Oak, is essentially JCR with MongoDB as the back-end.The second point of the post was that as NOSQL repositories become more widely adopted, they compete directly with content repositories in use cases where those content repositories are used primarily as a back-end for developers’ custom content-centric solutions.

In other words, 5 or 10 years ago, if you were a developer looking to implement a custom application, and you wanted something other than a relational back-end, you might build your application on top of something like Alfresco. Now developers may be less likely to go that route. That’s because today there is an explosion of stacks out there. Many of them assume a NOSQL back-end. Look at mean.io as just one example, which combines Node.js, Express (the Node.js web framework), AngularJS, and MongoDB and wraps it up with time-saving tooling.

Many people use Alfresco as a back-end. Their front-end uses a RESTful API implemented as web scripts to talk to the repository. The value the repository brings to the table is the ability to store documents in a hierarchy along with custom metadata defined in a content model. They may not be using Alfresco Share or much of the other functionality that Alfresco bundles with their offering–for their custom solution Alfresco is just a repository. When it is used like this, Alfresco is doing nothing more than what NOSQL repositories offer, and, in fact, it does less because NOSQL repositories have a more flexible schema and are built to be clustered and massively distributable–for free.

Years ago, Alfresco shifted its focus away from developers looking to build custom solutions on top of a bare repository. Its developer outreach is now more about customizing Alfresco Share and the underlying repository. Nuxeo, on the other hand, has doubled-down on its developer focus. I’ll spend some more time on this in a future post.

I guess this trend wasn’t terribly hard to predict five years ago, but it does feel kinda nice to see it come to pass. Now, if I could just have a hoverboard.

Adobe acquires Day Software for $240 million

I’ll admit it. I did not see this one coming: Adobe announced today that it will acquire Day Software for $240 million USD. (Thanks, @pmonks, for the heads-up tweet).

Honestly, I thought Adobe would acquire Alfresco by the end of last year and I was surprised when it didn’t happen. They had done a big OEM deal making Alfresco part of LiveCycle and they did a gigantic Alfresco implementation as part of standing up Adobe’s acrobat.com site. Heck, Adobe even hosted Alfresco’s community event back in 2008. All small potatoes in the grand scheme of things, I know, but I can’t help but feel like the proud parent who’s daughter brought home a keeper, only to find out the guy’s been dating a hottie from Switzerland the whole time.

Day has a FAQ up on their site. As you would expect, Day promises that current customers have nothing to fear and that the products will continue to live on.

Day has some really cool stuff in both their commercial products and their open source projects (Sling, Jackrabbit). I hope the acquisition gives Day a huge injection of resources they are able to invest in the open source side of things.

Congrats to Erik Hansen, David Nuescheler, Kevin Cochrane, and the rest of the Day team!

Apache Chemistry: A Proposed Reference CMIS Implementation

Kas Thomas, over at CMSWatch, says a new Apache project is in the works. Chemistry is a reference CMIS implementation that has committers from Day, Alfresco, and Nuxeo. The goal is to provide a vendor-neutral reference implementation and compatibility tests around the proposed CMIS specification.

It looks like this will be a standard set of REST APIs and SOAP services that implement the CMIS spec and hook into a back-end repository. Not surprisingly, the first back-end the Chemistry team plans to support is Apache Jackrabbit, Apache’s reference implementation of the JCR API.

Catching up on XForms, XRX, XProc, and Orbeon

I recently spent a little time looking at open source components we could assemble to provide a basic web form authoring solution embedded within one of our SaaS offerings. Rather than full-blown Web Content Management, all that the solution really called for was the ability for non-technical users to enter data in a form and to upload binary objects which may be related in some way to that form data. There could be several forms with some chunks of forms being reused, and at some point, it might be nice for non-technical people to create their own forms.

For the forms piece I immediately thought of XForms because (1) I knew we wanted the data stored as XML and (2) I like the MVC pattern that XForms follows.

It had been a while since I played with XForms directly. Alfresco’s web forms engine is currently based on the Chiba implementation of XForms, but you don’t normally get exposed to the XForms details. There are a few things going on in the world of XForms that caught my attention:

XProc. XProc is a W3C specification for an XML Pipeline Language. If you’ve ever worked with Apache Cocoon you’ll get this concept immediately as Cocoon was an early implementor of the XML pipeline approach. Think of raw XML going in a pipeline on one end, having it processed with one or more steps as it goes through the pipeline, and then possibly new XML emerging from the other end. Those processing steps can be thought of as modules that can be reused and recombined in different ways to build new pipelines.

One of our past clients was doing something similar to this with their own home grown solution. They were taking XML data feeds from sporting events, and then performing various operations on that XML before it was eventually posted on the web site in the form of scoreboards and stats pages. They called the process definition a “workflow” and it was described in XML. XProc would be ideal for something like that.

XRX. XRX stands for XForms/REST/XQuery. It is not a standard–it’s an approach for building web applications. It means using XForms on the front-end to present and capture data, REST between the front-end and the back-end, and XQuery to retrieve and transform XML from the back-end. This approach allows you to build a web application without any object-relational mapping. The data you are dealing with is always XML so there is no translation necessary.

eXist. eXist is an open source, native XML database. If you’re dealing exclusively in XML, why go to the trouble of translating your XML into rows and columns (and then back in to XML when it is retrieved)? Native XML databases do a better job of storing XML with no translation required while preserving your ability to efficiently do things like XQuery and XPath statements across the entire scope of your dataset. I had previously played with Apache Xindice but Xindice doesn’t support XQuery which is a major focus for eXist (plus, things seem a little quiet over at the Xindice project).

Orbeon Forms is a server-side XForms implementation. If you’re looking for an open source forms solution, you need to take a look. Orbeon is XForms, XProc, and eXist, all rolled into a single offering. You can merge the Orbeon WAR file with your web app’s WAR or you can deploy Orbeon in its own web app and simply tell it to handle all of the XForms tasks. Orbeon also has a graphical forms builder but I didn’t get a chance to play with that.

Thinking I might want to use Apache Sling/Jackrabbit as my repository, I decided to see how easy it would be to persist the XForms data into Jackrabbit instead of eXist, as Orbeon’s tutorial does by default. As I suspected, it turned out to be a 2 minute task. Because Sling provides a REST API into Jackrabbit, and because XForms can persist data via REST natively, it was simply a matter of changing the post URL from the eXist REST URL to the Sling REST URL and it was a done deal. Deciding whether or not Jackrabbit (instead of or in combination with eXist) is the right way to go is a decision for another day.

I’ll provide an update at some point down the road after we’ve done some implementation work on this embedded forms stuff and we’ll see how it actually held up.

Slinging some ideas around RESTful content

Via Seth Gottlieb news that Apache Sling has been officially released. Sling is interesting–I’ve played with it only a bit. You can read more about it on Seth’s post but essentially it is a REST API that sits on top of Apache Jackrabbit. Jackrabbit is the reference implementation of the JCR spec.

I’m not crazy about the JCR API because it is Java-only (yes, I know there are bridges out there). Plus, it doesn’t seem to be rich enough for many types of implementations. For example, Alfresco is a JCR-compliant (Level One) repository, but you don’t see too many people doing JCR-only interactions with Alfresco.

What is interesting to me, though, is the idea that you can abstract repositories at a higher level: the REST API. If we’re all going to talk to our repositories via REST, why not do it in a standard way?

Alfresco introduced a REST framework in 2.1 called “web scripts” (learn more). But Alfresco does not yet have a full-blown “REST API”. Yes, there are a few out-of-the-box REST calls but for the most part, when you interact with Alfresco via REST you are going to roll your own API. On Optaros projects, this has not yet been a huge burden. Quite the opposite, in fact–we’ve been able to develop everything from web script-backed JSR-168 portlets to a streamlined version of the Alfresco web client (soon to be released as an open source project), all on web scripts.

As part of 3.0, Alfresco anticipates rolling out additional out-of-the-box URLs to more fully establish the REST API. The new 3.0 web clients are based almost entirely on REST so they might as well build a REST API that we can all use.

When I look at Apache Sling, I’m thinking, why don’t we agree on a standard REST API for working with content repositories. Then, we could write a front-end once and theoretically use it with multiple back-end repositories. If someone then wants to use a JCR-compliant repository behind the scenes, then that’s cool, but it isn’t a requirement.

Obviously, there is a granularity challenge here. If the API is too granular the front-end ends up making too many calls. If you are aggregating multiple calls from within an intermediate application and then returning the result to the front-end (rather than making a bunch of AJAX calls initiated from the browser client), that’s not as much of an issue. If the API is too coarse, it is too hard to reuse across many different types of front-end applications.

Still, if we agree that REST is the preferred interaction model in the “content as a service” world, at least for the moment, and further, that front-end developers tend to want to have the option of using non-Java technologies for the presentation layer, a standard REST API for interacting with content repositories makes sense, whether that’s Sling, whatever Alfresco comes up with, or the best of both.

Maybe the Atom Publishing Protocol is close to what I’m thinking here. Maybe Alfresco thinks so too. I noticed the Abdera JAR was added to the Alfresco Community dependencies fairly recently. Abdera is an Apache incubator project for working with Atom.

We actually have an engagement with Alfresco right now to develop some of the new 3.0 web client modules. When I get some time I’ll explore this idea a little further by checking in on the 3.0 web client team, looking at Abdera, and doing a deeper dive on sling.