Category: XML

The lingua franca of the Internet.

Catching up on XForms, XRX, XProc, and Orbeon

I recently spent a little time looking at open source components we could assemble to provide a basic web form authoring solution embedded within one of our SaaS offerings. Rather than full-blown Web Content Management, all that the solution really called for was the ability for non-technical users to enter data in a form and to upload binary objects which may be related in some way to that form data. There could be several forms with some chunks of forms being reused, and at some point, it might be nice for non-technical people to create their own forms.

For the forms piece I immediately thought of XForms because (1) I knew we wanted the data stored as XML and (2) I like the MVC pattern that XForms follows.

It had been a while since I played with XForms directly. Alfresco’s web forms engine is currently based on the Chiba implementation of XForms, but you don’t normally get exposed to the XForms details. There are a few things going on in the world of XForms that caught my attention:

XProc. XProc is a W3C specification for an XML Pipeline Language. If you’ve ever worked with Apache Cocoon you’ll get this concept immediately as Cocoon was an early implementor of the XML pipeline approach. Think of raw XML going in a pipeline on one end, having it processed with one or more steps as it goes through the pipeline, and then possibly new XML emerging from the other end. Those processing steps can be thought of as modules that can be reused and recombined in different ways to build new pipelines.

One of our past clients was doing something similar to this with their own home grown solution. They were taking XML data feeds from sporting events, and then performing various operations on that XML before it was eventually posted on the web site in the form of scoreboards and stats pages. They called the process definition a “workflow” and it was described in XML. XProc would be ideal for something like that.

XRX. XRX stands for XForms/REST/XQuery. It is not a standard–it’s an approach for building web applications. It means using XForms on the front-end to present and capture data, REST between the front-end and the back-end, and XQuery to retrieve and transform XML from the back-end. This approach allows you to build a web application without any object-relational mapping. The data you are dealing with is always XML so there is no translation necessary.

eXist. eXist is an open source, native XML database. If you’re dealing exclusively in XML, why go to the trouble of translating your XML into rows and columns (and then back in to XML when it is retrieved)? Native XML databases do a better job of storing XML with no translation required while preserving your ability to efficiently do things like XQuery and XPath statements across the entire scope of your dataset. I had previously played with Apache Xindice but Xindice doesn’t support XQuery which is a major focus for eXist (plus, things seem a little quiet over at the Xindice project).

Orbeon Forms is a server-side XForms implementation. If you’re looking for an open source forms solution, you need to take a look. Orbeon is XForms, XProc, and eXist, all rolled into a single offering. You can merge the Orbeon WAR file with your web app’s WAR or you can deploy Orbeon in its own web app and simply tell it to handle all of the XForms tasks. Orbeon also has a graphical forms builder but I didn’t get a chance to play with that.

Thinking I might want to use Apache Sling/Jackrabbit as my repository, I decided to see how easy it would be to persist the XForms data into Jackrabbit instead of eXist, as Orbeon’s tutorial does by default. As I suspected, it turned out to be a 2 minute task. Because Sling provides a REST API into Jackrabbit, and because XForms can persist data via REST natively, it was simply a matter of changing the post URL from the eXist REST URL to the Sling REST URL and it was a done deal. Deciding whether or not Jackrabbit (instead of or in combination with eXist) is the right way to go is a decision for another day.

I’ll provide an update at some point down the road after we’ve done some implementation work on this embedded forms stuff and we’ll see how it actually held up.

Gilbane: Microsoft Offers Office Document Formats to ECMA for Standardization

From Gilbane Report News, Microsoft Offers Office Document Formats to ECMA for Standardization. According to the post, several industry leaders have formed…

…an open technical committee that Ecma members can join to standardize and fully document the Open XML formats for Word, Excel and PowerPoint from the next generation of Office technologies, code-named Office “12,” as an Ecma standard, and to help maintain the evolution of the formats.

Another take on XML eForms

Another take on XML eForms

The average person—for example, the HR person who wants to take vacation requisitions online, or the accountant who wants to computerize purchase orders—doesn’t want to get IT involved for one or two lousy forms. They don’t want to lay out six-to-10 figures for enterprise-class software for forms creation and a back-end system to support the 25 people who will use the simple forms.


I agree with this sentiment. It is frustrating that the industry still struggles to catch up to where Lotus Notes was nearly 15 years ago.

I love the idea of capturing forms data as XML but the XForms stuff I’ve played with has either been too simplistic for complex forms (Documentum Forms Builder), too difficult to integrate with back-end repositories (Adobe), or not ready for prime time (Chiba). All of them require more than business expertise to implement.

Trying out Forrest for writing doc

I’m trying out a new approach (new for me, anyway) to writing doc. Over the years I’ve gotten so tired of messing around with Word. Assuming you get your styles to cooperate the format you wind up with is the format your doc will be in forever. Sure, you can go to PDF using Distiller, but what about other formats, like HTML? Ever seen the HTML Word produces? Yuck!

I’ve always wanted to write my doc in a simple XML vocabulary and then transform it into the desired format. I took a look at Docbook, and that looked promising, but it seemed like too much for what I needed. Plus, I didn’t have a structured authoring tool.

Thursday I came across Apache Forrest. I had seen Forrest before while messing around with Cocoon but had never taken the time to explore it. As it turns out, it was just what I was looking for. It is a simple XML-based publishing system. Under the covers it uses the power of Cocoon to turn a simple XML format into a basic web site, PDF, or whatever format you need. Forrest is simple to get up-and-running, but the format, as simple as it may be, is a bit too painful for unassisted plain-text editing. So I still needed a structured authoring tool.

A colleague reminded me that James Clark wrote a mode for Emacs called nxml that works with RelaxNG Schemas. He also wrote a tool called trang that converts RelaxNG schemas into RelaxNG compact syntax, which is something nxml can understand. Both are easy to install.

So now, thanks to Apache Forrest, Emacs’ nxml-mode, and trang, I’ve got a sweet little XML publishing setup.

XML editor reviews

Evaluating XML editors. Check out this nice review of XML editors (thanks to Owen Ambur of We would have liked to see some other options included, such as Ektron’s browser-based XML editor and the well-regarded tool from Xopus, along with perhaps more emphasis on usability. In any case, the long list of criteria here suggests that there is more than first meets the eye when looking at tools for managing structured content…… [CMSWatch Trends and Features]

Dynamic charting of Documentum data using Cocoon and Xindice

Got charting working. The XMLDB pieces I noted in Step 3 and Step 4 of this post were actually very easy. The syntax for getting XML into Xindice is simple as is the querying. Once I got that going it was just a matter of hooking of the pieces of my pipeline to do what I wanted to do. I did have to tweak the XSLT that produces the SVG. I didn’t build it to handle enough data points (bars were too wide, not enough graph area, etc.).

The cron and xmldb samples were really helpful in getting this working, both from a code perspective and from a functional perspective. As I stored XML in Xindice, I’d pop over to the xmldb browser sample and browse my collection to verify that it worked as expected. I used the cron sample OOTB to set up a task to run the DQL queries against Documentum on a schedule. Going forward, I’ll need to incorporate an admin/config interface into my app for creating the cron task and browsing the xmldb collections.

Documentum-Cocoon integration progress

When I dusted off my Documentum-Cocoon integration stuff I had to do a bit of a fix up. It seems that my WDK install had either rearranged some classpath entries (maybe different versions of JARs Cocoon dependend on behind its own) or made the classpath too long. In any case, I had to update the catalina.bat file to remove the WDK entries as a temporary fix.

I then noticed that when I ran any pipelines that used my Documentum-Cocoon components, they didn’t seem to be getting called. My loggers weren’t showing any entries and the page was just coming up blank. It turned out I had taken a little too much out of my classpath. Obviously, Tomcat needs to be able to find the Documentum DFC classes because my components rely on those. It was frustrating that no one was returning a helpful message to alert me to my blunder.

Something helpful in this situation is the Cocoon Status page in the Samples area. On that page you can show the classpath. If it doesn’t see the DFC JAR and the Documentum config directory, you could be in trouble.