Storj.io: An open source, massively distributed object store and API for developers

I’ve been playing with a new object storage solution that’s kind of cool. It’s called Storj. Before I describe how it works, let me start by comparing it to a more familiar solution.

Probably the best-known example of object storage is Amazon S3. It allows you to define buckets and then upload files into those buckets. Amazon charges you based on the amount you store and the amount you transfer, plus a little based on the total number of objects stored. There are three tiers of storage based on frequency of access and pricing varies by region, but for discussion purposes let’s say it is about $0.023 per GB per month. To store 500 GB that would cost about $138 per year without transfer fees.

For that $138 you can be sure that Amazon is replicating your data across multiple facilities and devices. Amazon says that S3 offers 99.999999999% durability. That’s pretty impressive.

But one consideration with using S3 or any other traditional cloud storage solution is that your data is sitting in data centers owned by a single vendor. Of course you could take steps to replicate that data to other providers, but that is kind of a pain. Even then you will still end up with your data sitting behind a relatively small number of vendors, none of whom are really geared toward transparency and openness.

Storj.io was built to address this problem. It’s an open source, distributed object storage platform. Like S3, the model consists of buckets and files in those buckets. The difference is in how your data is stored. When you upload a file to Storj, your file is broken into small pieces called shards, encrypted using keys you hold, and then uploaded to several nodes around the world.

Here’s where it gets really interesting. The nodes that store data are not owned by a single entity. Instead, nodes are run by “storage farmers”. Disk farming is kind of like crypto currency farming, but instead of solving mathematical computations to earn crypto coins, farmers receive micro payments based on how much their space gets utilized. Storj actually leverages the Ethereum blockchain to make this work, and if you are interested in the nitty gritty details, you should check out the whitepaper.

A farmer might be an individual with 50 GB of spare disk space or it could be an organization with lots and lots of space. You don’t know and you don’t really care. Their space gets selected based on a number of factors, which includes things like stability of the node, bandwidth, and total space available. If a farmer tries to tamper with any data on their node they get dropped and they don’t get paid.

Right now Storj is offering 25 GB of free space for one year. After that, their current pricing is $.015 per GB per month. So using my 500 GB example, that’s $90 per year without transfer fees. And if you have some extra storage sitting around, you could become a farmer and offset your costs a little bit.

To be clear, Storj is a tool for developers. After signing up you’ll get presented with a GUI for creating buckets, but when it’s time to start moving data into those buckets you’ll need the API. Right now there is a NodeJS library or you can use a command-line tool provided by a native installer.

This service definitely looks promising, but it is important to know that it is still early. One thing to think about is what happens if farmers start dropping out of the network. When your file is split into shards, each shard is copied to multiple nodes. In my quick test, shards were spread across five nodes, which is plenty to give me confidence that I will be able to get my file back.

If a node drops offline, it is supposed to trigger a replication of your shard to another farmer using one of the remaining good nodes. This works great unless all of the farmers who hold one of your shards drop at once, but with 19,000 farmers and climbing, and assuming your shards are always on multiple nodes, the chances of that happening seem very, very low. The docs say that Storj is working on rolling out additional mirroring strategies. And, you can always use the API to ask Storj which nodes your file is sharded across. It looks like you can make an API call to move a shard yourself, but I haven’t tried that yet.

One last thing to point out is that this is an open source project. You are welcome to contribute. You can even grab the software and run a completely private Storj network, if you want.

I feel like some of my clients are still getting used to the idea of putting their data in the cloud. And some like one throat to choke. A distributed cloud like this may be a tougher sell for conservative customers, even if the security and the durability are there. Still, I love the concept. What do you think?

ECM Architect

One comment