I’m back from San Jose. My colleage, Dave Gynn, and I had fun at the O’Reilly Open Source Conference (OSCON) and learned a lot. Dave’s ability to pick out open source rockstars from a crowd is uncanny. It was pretty sweet seeing Larry Wall (and his family) hanging out and then hearing him speak. Although there are all kinds of topics on all things Open Source, the conference does have a heavy Perl bias.
Dave and I decided we were glad we went but we don’t feel like we have to be there every year going forward. This was my first time, but Dave said the general excitement level seemed low for some reason. Maybe it was Allison Randal’s seriously downbeat welcome address. Not sure. Anyway, here are my rough notes from some of the sessions I attended…
“Open Source in Government” was a big theme at OSCON this year. Speakers tried to instill a sense of urgency in the audience by saying that the window of opportunity for getting the government behind open source in a big way will only be open for a few more months. If you want to get involved, check out some of these links:
Data.gov mash-up contest
http://sunlightlabs.com/contests/appsforamerica2/
Machine readable datasets from the US Govt
http://www.data.gov/
Help the government make better use of open source
http://www.opensourceforamerica.org/
Some folks from Liferay presented on a new UI framework they’ve created called Alloy. Alloy is aimed at providing a single framework that addresses HTML, CSS, and JavaScript in a way that is abstracted from the underlying libraries. Alloy basically extends/subclasses JQuery and YUI. Liferay is migrating a lot of their OOTB portlets now to the new framework. It is expected to ship as part of 5.3. This talk was more about the “why” and less about the “what”. I would have liked to see more examples/demos.
Went to a talk on “using Django for election audits” that turned out to be more about how screwed up our elections process is and the minutiae of performing an audit on election results with not so much on how Django was used to solve the problem. The speaker did give a shout out to the Django Debug Toolbar that might prove to be useful. The presenter is looking for help with the project. He needs everything from UI help to people who can send him election results from their local election boards.
Saw a decent talk on Apache CouchDB. Couch is a schema-less database that is built for massive distributed scalability. Instead of SQL you use map-reduce functions to query. Key to Couch is the concept of “eventual consistency”–in a Couch app, data can be consistent over time instead of right now. Couch always knows either the correct old value or the correct current value, but it may take time to propogate the current value to every node in the system.
Noteworthy bullet points:
- Couch can idle in 4MB of RAM. With a couple of production databases Couch will use about 20MB.
- Canonical is including Couch in the Karmic Koala release. This will give apps running on Karmic the ability to easily sync data between nodes. Couch will also be running as part of Ubuntu One which means Karmic desktops can sync data with the Ubuntu cloud (See the Ubuntu wiki).
- Someone is currently working on a JavaScript implementation of Couch. Among other things, this would give you the ability to replicate your CouchDB to a local version of Couch running in someone’s browser.
- Current ACL is limited to “you are either an admin or you aren’t”. ACL for writers *might* make it into 1.0. ACL for readers won’t.
I went to the “JRuby on AppEngine” talk not for the JRuby, but because it was the only Google AppEngine session I could find. I was looking for some factoids on who’s using AppEngine. Here’s what they said:
- 200,000 registered developers
- 85,000 applications
- Household names such as: eBay, Best Buy, Forbes, Whitehouse.gov.
Whitehouse.gov was a cool scalability story for AppEngine. They used AppEngine to moderate questions submitted during Obama’s first online town hall. According to the Google Code blog,
“During the 48-hour open voting period, the site peaked at 700 hits per second, and 92,934 people submitted 104,073 questions and cast 3,605,984 votes. In total, over one million unique visitors visited the site before the town hall. Even while the site was featured on major news outlets and even the Google homepage the other 50,000 apps built on App Engine were fully supported and experienced no adverse effects.”
The Erlang talk provided a good history of the language. I would have liked more on the language itself and less of the detailed history behind Ericsson’s telecom switches (even though Erlang played a critical role in those products). I was aware that CouchDB is built with Erlang but the speaker mentioned a couple of other open source projects that leverage Erlang that I hadn’t heard of: ejabberd is an Erlang-based chat server and RabbitMQ is an Erlang-based messaging server.
The “building a business on an open source distributed cloud” talk by Bradford Stephens was good. The speaker’s company, Visible Technologies, mines social networks and the internet in general for consumer sentiment on its customer’s brands. Their system ingests vast subsets of the Internet, parses the results, processes it, and indexes it so that they can run analytics against it for their clients. They moved from an all-Microsoft stack to an open source stack and have been very happy with it.
This was the third “noSQL”-themed talk I saw. He made a good point that when we design apps, we should be saying, “I need persistence” and then figure out what is the best provider of that given scalability and other constraints rather than starting out with “I need a relational database”.
The open source stack used by Visible Technologies includes the usual search players (Lucene, Nutch, Solr) as well as one I haven’t heard of: Katta is used to shard large Lucene indexes across multiple servers. They also use a couple of Hadoop sub-projects, HBase and ZooKeeper, and several others.
The New York Times API and NPR API talks were very good. I didn’t realize how many different API’s NYT has exposed. You can check out their API’s around people, news, search, movies, and books at http://developer.nytimes.com. Their blog is also worth checking out.
Lots of apps have been built using the NYT API. A personal favorite is InstantWatcher. It is a mash-up of NYT’s movies API with Netflix that helps you find good movies available to watch instantly.
NPR’s talk focused less on their specific API and more on how it is being used. Noteworthy bullets:
- You can build API calls with their query generator (requires a free API key) or by hand (doc).
- NPR offers tiered key levels. If you create something cool and drive a little traffic their way, you can get your key upgraded to a higher tier.
- There are no rate limits. NPR believes they have built an infrastructure that can take “anything we can throw at it”.
- The API has 2,000 users and serves 24 million requests (per ?) averaging 2 million requests per month.
- 50% of the API requests are for NPRML with less than 0.1% requesting ATOM. NPR API results are also available as JSON, RSS, and several other formats.
- The NPR Digital Media team blogs at http://www.npr.org/blogs/inside/
- Interesting side-note: NPR is currently migrating off of Oracle 10g to MySQL
After the NYT and NPR talks, they held a developer meet-up of sorts. Unfortunately I had to head to the airport so I missed out on that.
Thanks a lot for mentioning my talk at OSCON 🙂 I appreciate the kind words! You can check out the slides at http://www.slideshare.net/lusciouspear/building-a-business-on-hadoop-hbase-and-open-source-distributed-computing — the video should be up in a few hours!
You a PNW local? Feel free to stop by the Hadoop User Group sometime if you’re interested — last Wed of every month. Details usually on my blog.