Category: Open Source

Screencast: Basic Alfresco-Kaltura integration

Bryan Spaulding, Media Practice Lead at Optaros, and I have been thinking about lightweight digital asset management and Alfresco. Alfresco can manage any kind of asset, including rich media. It has some built-in functionality for doing image transformations and you can easily integrate with open source solutions like ffmpeg to work with video. But many of our clients need something more, especially when it comes to video.

That’s where Kaltura comes in. Kaltura is a fully hosted video solution that provides full analytics, flexible and customizable players and playlists, and robust back-end CDN and hosting services. You can also download the open source Kaltura Community Edition and run it yourself if you want.

There are a variety of ways Alfresco and Kaltura could work together. We decided to start with a basic integration focused on the Alfresco DM repository. The idea is to use that as a foundation, expanding in the future based on community and client feedback to include deeper functionality for the DM repository or broader integration with other Alfresco products like Alfresco Share and Alfresco WCM.

In this short screencast, I demo the basic CRUD functions the integration provides. You will probably want to hit the “full screen” icon on the Kaltura player to see the detail.

The integration is available as open source. You can download the integration from Kaltura’s community site and use it on your projects, or better yet, expand on it and contribute back the code. The readme that is included with the source includes installation and configuration instructions.

Yet another reason to love Open Source Content Management

Man, I don’t miss delivering solutions on top of Documentum. After reading Laurence Hart’s post on Documentum Developer Edition, I’m reminded how much I take for granted working exclusively in the open source content management world.

Laurence’s post was intended to discuss the ins and outs of Documentum’s efforts to make it easier for developers, and, as usual, he’s done a good job of that. But it also underscores the benefits enjoyed by those who work in open source land. In case you don’t know how good you’ve got it, my open source brothers and sisters, check it out:

Developers working with closed source ECM vendors have to pay to get the software

As Laurence points out,

“There are lots of independent consultants out there that have trouble keeping-up with the technology because they can’t afford to become partners for the requisite fee.”

If you are a developer looking to go deep on closed source software, you have no choice but to pay. There’s no other way to get access to the software. Sometimes you can’t even get access to the documentation or the bug database without a paid-up partner account (or a client that lets you use theirs).

[UPDATE: Jerry Silver, from EMC, points out that the Documentum Developer Edition is a free download. My original post made it sound like you had to be part of the partner program to obtain the download.]

With open source, the barrier to entry is much lower. You pay nothing to get the software. It’s all about the time and energy you put into learning the product and implementing cool solutions.

To be fair, commercial open source vendors often charge partner fees as well, but the bottom line is that it costs nothing to get started with the code.

Developers working with closed source ECM vendors struggle with giant developer footprints

I feel sorry for Laurence’s laptop:

“The complete Development install calls for 3GB of RAM (after a 1.7+GB download).  That is no small thing for a development laptop.  It needs to be on a newer machine.  If you can move the database service to a different box, that will make your life easier.”

Oh dear. A 1.7GB download for a developer setup? Am I downloading a VM image or a content management server? Let’s look at Alfresco for a comparison. Assuming you are starting from scratch, and assuming you are going to go full-on with the Alfresco platform, your total download is right around 300MB. That includes:

  • Alfresco SDK
  • Alfresco WAR
  • Alfresco WCM (Deployment listener and add-on to core repo)
  • Apache Tomcat
  • Sun JDK
  • MySQL (Server and connector)

All of which runs comfortably in 2GB of RAM and won’t even cause your fan to kick on in 4GB.

Developers working with closed source ECM vendors have less choice

Optaros consultants are now split fairly evenly in their choice of OS across Windows, Mac OS X, and some flavor of Linux. Some people prefer MySQL and some prefer PostgreSQL. Mostly we use Eclipse for Java development but everyone’s got a preference. I use Tomcat for everything locally while others like JBoss. The point is, developers want to use their tools the way they want to. It’s not a stubbornness thing it’s an efficiency thing.

Within my CMS I want the same flexibility. I want to tweak settings. I want to name my database what I want. I want the flexibility to deploy across as many (or as few) nodes as I need to. From Laurence’s post, it sounds like Documentum clearly falls down here.

Developers working with closed source ECM vendors can’t see the code

It’s obvious, I know. For developers that work with open source it is extremely natural to use the CMS source code when debugging or for reference. You don’t even think about it–it’s just there and you use it. Imagine the frustration of someone who works with closed source CMS who has to routinely decompile classes to figure out what’s going on. That truly sucks. What good is a “Developer Edition” that doesn’t come with source code?

Partner defections from closed source are on the rise

I’ve seen recent announcements from multiple partners who were previously exclusive to closed source vendors but are now adding open source to their partner list. This is a reflection of increasing demand by customers who are realizing the business value of open source, especially in tough economic times as well as partners’ desire to make up for sagging demand in the proprietary world. But could it also be that more firms are realizing how much more productive and pleasant it is to work with open source content management?

Help your employer/client see the light

Open source ECM technologies like Alfresco, Drupal, Liferay, Lucene, and many others, are now at or beyond their closed source equivalents. If you are a developer who’s sick of the shackles closed source CMS places on you, why not suggest exploring open source alternatives?

Notes from OSCON 2009 in San Jose

I’m back from San Jose. My colleage, Dave Gynn, and I had fun at the O’Reilly Open Source Conference (OSCON) and learned a lot. Dave’s ability to pick out open source rockstars from a crowd is uncanny. It was pretty sweet seeing Larry Wall (and his family) hanging out and then hearing him speak. Although there are all kinds of topics on all things Open Source, the conference does have a heavy Perl bias.

Dave and I decided we were glad we went but we don’t feel like we have to be there every year going forward. This was my first time, but Dave said the general excitement level seemed low for some reason. Maybe it was Allison Randal’s seriously downbeat welcome address. Not sure. Anyway, here are my rough notes from some of the sessions I attended…

“Open Source in Government” was a big theme at OSCON this year. Speakers tried to instill a sense of urgency in the audience by saying that the window of opportunity for getting the government behind open source in a big way will only be open for a few more months. If you want to get involved, check out some of these links:

Data.gov mash-up contest
http://sunlightlabs.com/contests/appsforamerica2/

Machine readable datasets from the US Govt
http://www.data.gov/

Help the government make better use of open source
http://www.opensourceforamerica.org/

Some folks from Liferay presented on a new UI framework they’ve created called Alloy. Alloy is aimed at providing a single framework that addresses HTML, CSS, and JavaScript in a way that is abstracted from the underlying libraries. Alloy basically extends/subclasses JQuery and YUI. Liferay is migrating a lot of their OOTB portlets now to the new framework. It is expected to ship as part of 5.3. This talk was more about the “why” and less about the “what”. I would have liked to see more examples/demos.

Went to a talk on “using Django for election audits” that turned out to be more about how screwed up our elections process is and the minutiae of performing an audit on election results with not so much on how Django was used to solve the problem. The speaker did give a shout out to the Django Debug Toolbar that might prove to be useful. The presenter is looking for help with the project. He needs everything from UI help to people who can send him election results from their local election boards.

Saw a decent talk on Apache CouchDB. Couch is a schema-less database that is built for massive distributed scalability. Instead of SQL you use map-reduce functions to query. Key to Couch is the concept of “eventual consistency”–in a Couch app, data can be consistent over time instead of right now. Couch always knows either the correct old value or the correct current value, but it may take time to propogate the current value to every node in the system.

Noteworthy bullet points:

  • Couch can idle in 4MB of RAM. With a couple of production databases Couch will use about 20MB.
  • Canonical is including Couch in the Karmic Koala release. This will give apps running on Karmic the ability to easily sync data between nodes. Couch will also be running as part of Ubuntu One which means Karmic desktops can sync data with the Ubuntu cloud (See the Ubuntu wiki).
  • Someone is currently working on a JavaScript implementation of Couch. Among other things, this would give you the ability to replicate your CouchDB to a local version of Couch running in someone’s browser.
  • Current ACL is limited to “you are either an admin or you aren’t”. ACL for writers *might* make it into 1.0. ACL for readers won’t.

I went to the “JRuby on AppEngine” talk not for the JRuby, but because it was the only Google AppEngine session I could find. I was looking for some factoids on who’s using AppEngine. Here’s what they said:

  • 200,000 registered developers
  • 85,000 applications
  • Household names such as: eBay, Best Buy, Forbes, Whitehouse.gov.

Whitehouse.gov was a cool scalability story for AppEngine. They used AppEngine to moderate questions submitted during Obama’s first online town hall. According to the Google Code blog,

“During the 48-hour open voting period, the site peaked at 700 hits per second, and 92,934 people submitted 104,073 questions and cast 3,605,984 votes. In total, over one million unique visitors visited the site before the town hall. Even while the site was featured on major news outlets and even the Google homepage the other 50,000 apps built on App Engine were fully supported and experienced no adverse effects.”

The Erlang talk provided a good history of the language. I would have liked more on the language itself and less of the detailed history behind Ericsson’s telecom switches (even though Erlang played a critical role in those products). I was aware that CouchDB is built with Erlang but the speaker mentioned a couple of other open source projects that leverage Erlang that I hadn’t heard of: ejabberd is an Erlang-based chat server and RabbitMQ is an Erlang-based messaging server.

The “building a business on an open source distributed cloud” talk by Bradford Stephens was good. The speaker’s company, Visible Technologies, mines social networks and the internet in general for consumer sentiment on its customer’s brands. Their system ingests vast subsets of the Internet, parses the results, processes it, and indexes it so that they can run analytics against it for their clients. They moved from an all-Microsoft stack to an open source stack and have been very happy with it.

This was the third “noSQL”-themed talk I saw. He made a good point that when we design apps, we should be saying, “I need persistence” and then figure out what is the best provider of that given scalability and other constraints rather than starting out with “I need a relational database”.

The open source stack used by Visible Technologies includes the usual search players (Lucene, Nutch, Solr) as well as one I haven’t heard of: Katta is used to shard large Lucene indexes across multiple servers. They also use a couple of Hadoop sub-projects, HBase and ZooKeeper, and several others.

The New York Times API and NPR API talks were very good. I didn’t realize how many different API’s NYT has exposed. You can check out their API’s around people, news, search, movies, and books at http://developer.nytimes.com. Their blog is also worth checking out.

Lots of apps have been built using the NYT API. A personal favorite is InstantWatcher. It is a mash-up of NYT’s movies API with Netflix that helps you find good movies available to watch instantly.

NPR’s talk focused less on their specific API and more on how it is being used. Noteworthy bullets:

  • You can build API calls with their query generator (requires a free API key) or by hand (doc).
  • NPR offers tiered key levels. If you create something cool and drive a little traffic their way, you can get your key upgraded to a higher tier.
  • There are no rate limits. NPR believes they have built an infrastructure that can take “anything we can throw at it”.
  • The API has 2,000 users and serves 24 million requests (per ?) averaging 2 million requests per month.
  • 50% of the API requests are for NPRML with less than 0.1% requesting ATOM. NPR API results are also available as JSON, RSS, and several other formats.
  • The NPR Digital Media team blogs at http://www.npr.org/blogs/inside/
  • Interesting side-note: NPR is currently migrating off of Oracle 10g to MySQL

After the NYT and NPR talks, they held a developer meet-up of sorts. Unfortunately I had to head to the airport so I missed out on that.

Apache Directory Studio: Look it up

In my work with ECM technology I am almost always dealing with an LDAP directory. Even on my laptop I run OpenLDAP because it is easier for me to slam in a bunch of test users and groups by generating and importing LDIF files than it is to otherwise prep a repository for tests or demos. Plus, once you’ve configured your platform to authenticate against your local LDAP directory there’s a decent chance it will work with your client’s assuming the schemas aren’t terribly different.

If you also frequently find yourself tweaking the directory and you’re an Eclipse user, you should download and install the Apache Directory Studio Eclipse plug-in and put an end to clunky, platform-specific directory admin tools.

As an aside, as readers know, I run Ubuntu as my primary OS. On Ubuntu, installing OpenLDAP is as easy as running “sudo apt-get install slapd”. But pre-compiled Windows binaries of OpenLDAP seem elusive. Apache Directory Server might be a decent alternative. The Studio plug-in works with both.

Open Source CMS Alfresco Releases 3.0 Preview

Alfresco has just announced the availability of the Alfresco Labs 3.0 Preview. If you’ve been regularly updating from HEAD there may not be a whole lot of stuff that’s new to you but if you haven’t, it might be a good time to see what the team in Maidenhead has been up to.

The first thing you’ll notice is that Alfresco has changed the name of their freely-available Community edition to “Labs”. Alfresco has always insisted that this edition is a developer build that really isn’t suitable for production use. The name change is an attempt to further drive that point home.

Surf’s Up

Alfresco Surf is essentially Alfresco’s name for the web script framework plus some pre-built components with a framework for defining and assembling pages. The web script framework (and therefore, Surf-based sites) can now be run separately from the Alfresco repository process. This has actually been possible since 2.9 Community but now Alfresco is starting to do something with it (See “Share the Love”). In fact, some of my Optaros teammates have been working hard for Alfresco (as a client) to develop some of the content-centric components that are part of Surf and one of the new clients, Share. So Surf is essentially a web application development framework built on REST, JavaScript, FreeMarker, and YUI that you could use to build your own web apps without ever touching an Alfresco repository if you really wanted to. Assuming you do want to pull content from the repository, Surf let’s you make remote calls from within Web Script controllers back to the Alfresco repository, or via AJAX using YUI components from the browser.

Share the Love

Alfresco is using Surf to build its new web client offerings. One such offering is called Share. If you’ve been following Alfresco’s progress you’ll probably recognize it by its code name, Slingshot. Share is a collaborative workspace that allows you to spawn “sites” that include things like a Document Library, Blog, Discussion Forum, Wiki, Team Calendar, and Activity Feeds. Activity Feeds are sort of like a Facebook News Feed, but instead of tracking who poked whom you are being alerted when someone updates a document, makes a new blog post, etc.. The Share client will be the core for Alfresco’s frontal assault on Microsoft Sharepoint.

Speaking of, Share implements the SharePoint protocol. What does that really mean? It means that if one of the things you liked about Microsoft SharePoint was how you could work with a SharePoint Shared Workspace from within Microsoft Office applications, you no longer have to settle for an all-Microsoft stack on the back-end. You can use an Alfresco server instead. That means your users can have the functionality they like when collaborating on Office apps, while the IT department gets to keep their options open from operating system to database to application server and doesn’t have to worry about scalability concerns inherent in SharePoint. Unlike prior Alfresco add-ons for Microsoft Office integration, this approach requires no additional installations on the client because Office already has the hooks for talking to SharePoint, and Alfresco Share implements the SharePoint protocol.

Jon Newton, Alfresco CTO, said in his blog post on the release, that we should expect another Labs update in September with an Enterprise release to follow some time in October.

Apache CouchDB looks interesting

Here’s something to add to my “dive deeper when I have the time” list: Apache CouchDB. It’s a document database accessible via REST, which by itself isn’t terribly unique. What caught my eye was that it was built from the ground-up to be distributed. You can replicate documents across multiple nodes, maintain partial replicas, and sync for offline use. The roadmap has some significant features that need to be implemented before you most people would use it in production, but still, it’s something to keep an eye on.

Kablink press release goes kerplunk

I don’t know why this rankled me so much. Maybe I should just write it off as somebody’s PR firm getting a little too aggressive. But check out this claim made in an announcement yesterday by open source collaboration software company Kablink (formerly ICEcore):

“The only open source collaboration solution to offer
workflow” (Source)

I know. I had to read it twice.

Maybe Kablink defines “open source” or “collaboration” or “workflow” differently than I do. But solutions like Plone, Drupal, and Alfresco have had workflow of some kind for quite a while. It isn’t like there’s just one other open source collaboration offering out there with workflow, there are several. I’m not sure how Kablink thought they’d get this one past anyone. Maybe they’ll comment here to attempt to justify their claim.

Thoughts on social software and events

It sounds like Ringside has some work brewing around events. I haven’t updated my Ringside source code in a while so I don’t know how much of this can be played with right now but I’m anxious to take a look and you can bet I’ll report back here when I do.

The problem with today’s event sites is that they are too focused (live music, social gatherings, etc.) and too isolated (people have to sign up to use them, they are really only used for RSVP-ing, etc.). I’ve also found that finding interesting events can be tough. I think Meetup.com is a particularly bad offender–they’ve got a weird taxonomy thing going with their events. Their search doesn’t appear to be full-text indexed across meetup names or descriptions. Try to search meetup for “Alfresco”, for example. Although I know there are multiple Alfresco meetup groups out there, you won’t turn up one with a keyword search even with your search scope set to “100 miles of USA”. And when you create an event, it seems like there is a limited taxonomy for categorization. You have to decide if your meetup is about “Software” or “Technology”. Why would I pay them to host an event no one can find attended by a set of people who’s profiles I can only leverage in the context of meetup.com?

This is a sticking point for me. We all belong to different communities with different interests. And sometimes those overlap. Our social graphs shouldn’t be in silos. Neither should the events we attend. Managing your connections across networks together and exposing events to sub-sections of your connections (or across your entire network, regardless of where it is hosted) is really powerful. After all, as Bob says, it is through these events by which we form and strengthen those connections in the first place. Hopefully, this is what you’ll be able to do with Ringside.

His post got me thinking about what I might like to do with events in my own community. So here’s a list off the top of my head. Maybe Bob will comment on how/if this maps to the Ringside roadmap.

Attend/host flag & security settings. An individual ought to be able to publish an event, make public/private settings about that event, and indicate whether they are attending or hosting the event.

Event matching/de-duplication. What would be great is if there was a way to match up events. If I say I’m going to a Wilco concert, and you say you’re going to a Wilco concert, there needs to be a way to figure out if those are the same event.

In-network/same-event notification. Once you figure out two events are the same, people in the same network can discover the fact they share similar interests. The system should facilitate this kind of thing.

Targeted event promo. You should also be able to publicize an event to particular cross-sections of your graph. I might want to host a Ringside meet-up that only goes to my open source/E2.0 friends without spamming my family about it.

Interest level indication. An individual ought to be able to specify whether they are thinking about attending an event or are definitely attending an event. For example, someone might post an event they would only go to if someone else from one of their networks is also going.

Events as tags. Obviously events integrate with the rest of the model. Activity feeds certainly have to know when someone attended an event. But you should also have the ability to tag any item with an event. A photo library app needs to be able to let users tag photos that pertain to certain events, for example.

Event discovery. Events should be easily discoverable by tag/topic, full-text keyword search, by geography, and by attendee. I’d like to see a mash-up between Dopplr and an event database, for example, that knows what kind of events I like to attend and then cross references that with my travel schedule so that if I am traveling to San Francisco, and one of my favorite bands happens to be playing, the site can let me know that, including which of my friends might also be planning to attend (or would attend if they knew I was going to be in town).

Slice-and-dice RSS subscriptions. I should be able to get an RSS feed for each of the following: All events happening in a particular cross-section of my social graph, all events happening in a particular tag/topic, all events in a geography, all events in a particular date range, all events attended by a particular individual in one of my networks, or any combination of these (Live music shows happening in Dallas that my friend Jim is going to).

RSVP options. People should have the option of whether or not to track attendance to an event. Even for an event they are not hosting, they may or may not care who else is attending.

Configurable reminders. People need to be able to choose whether or not to send reminders to attendees. Attendees need to be able to opt out of receiving reminders.

Event ratings, comments, and UGC. People should be able to rate, comment on, and upload content related to an event.

Flexible event types. Events don’t have to be of any particular type. An event is really just a span of time during which something that might be potentially interesting to others is happening. “I’m going to Taco Bell for lunch tomorrow” and “I’ll be spending an hour in the Ubuntu forums Saturday” are both legitimate events that people might want to publish.

Calendar view with the same filtering capability as the “slice-and-dice RSS feeds” requirement. And the calendar ought to be widget-able so that anyone can embed it on their own site.

Standard calendar options for events including start and end time, duration, “all day event”, recurring event. I guess if the event was (or could be exported as) an iCal compliant piece of data that might be enough?

Who’s bringing what. Obviously everyone is familiar with the concept in a social gathering (You aren’t the guy who always just brings the chips, are you? Come on, make an effort, man). But this is also relevant to professional events, particularly for “un-conference” or bar camp type events where attendees are expected to present.

What about an ecommerce component? Maybe you ought to be able to sell tickets for an event. This could open up a can of worms regarding capacity, tiered pricing and availability, ticket authenticity verification, etc., but it might be cool/fun to provide something that could loosen the stranglehold a small number of vendors have on the “live event” market. Just a thought. At the very least, if an event requires a charge, you should at least be able to link to a shopping cart somewhere.