Ethereum | ECM Architect

I’ve been exploring blockchain technology lately. In this post I’ll share what I’ve learned so far. I should point out that my interest is squarely on how blockchain can be applied to solving business problems–I’m less interested in cryptocurrency.

Blockchain: A Brief Intro

A blockchain, as the name suggests, is a linked chain of blocks. Each block contains arbitrary data and a link to the previous block. Because these links are created using cryptographic hashes, blocks are hard to alter. The chain is continuously growing. Transactions run which add new blocks, and those transactions are validated by other servers before they are accepted into the chain.

So a blockchain is like a database, but instead of running on a central server controlled by a single entity, the blockchain is distributed across many servers, and, depending on which blockchain you are talking about, the blocks in the chain may be visible to the public.

The reason blockchain is often referred to in the context of cryptocurrency is because cryptocurrencies use a blockchain as a distributed ledger to track cryptocurrency trades. But that is only one of many possible use cases.

Wrapping up this brief intro, realize there is no single blockchain. You can run your own private blockchain internally. Or, you could set up a private blockchain between your company and your partners. Or, you could leverage one of the public blockchain networks. It really depends on exactly what you are trying to do.

For more on the origins of blockchain, see The Wired Guide to the Blockchain.

Use Cases

To identify problems that are good candidates for blockchain applications it helps to summarize the technology’s most noteworthy characteristics:

Data blocks are immutable. Once added to the chain they can never be changed. This will lend itself to problems where we need to retain the data for a long time or where we need to ensure that once created, the data does not change. Currency trades have already been mentioned, but you can easily come up with other examples such as health records, vehicle service records, government documents, or electronic votes.
Distributed/No single point of control. The distributed nature of the blockchain is attractive for multiple reasons. First, there is no single point of failure. If a server goes down or a company goes out of business, the chain is still intact and functional. Second, the data is automatically made available to every node in the network. This means you don’t have to solve data sync or replication–it just happens.
Transparency. In a public blockchain, every block is visible to everyone. Some of the data might be encrypted or hard to make sense of out-of-context, but you can inspect it, if you want.
Trustworthiness. I’ve already mentioned the difficulty in hacking the data. Another aspect of trustworthiness is about making sure that any logic that runs as part of the blockchain runs the same way every time. Yes, that logic could contain bugs, backdoors, or bad actors doing nefarious things, but the blockchain guarantees that it will always execute the same way every time.

Here’s a simple example that leverages all of these strengths. Suppose we decide to start a temperature prediction market in which participants try to predict the temperature in a given city on some date months into the future. We could store the prediction (predicted temperature, city, date, user) on the blockchain as well as the actual daily temperature readings.

We’re capturing data and that data is distributed across the network, which is cool, but we can also store logic on the blockchain–some implementations call these “smart contracts”, but they are really just scripts. In this example, when an actual temperature reading is saved the script will determine winners and losers by comparing the actual temp with the predictions.

If we wanted to add a reward mechanism to our prediction market, a prediction could have an associated wager amount. The smart contract would then not only be responsible for tracking the predictions that came true, but could also move funds from losers to winners, all without any middleman to make sure that happens.

This example takes advantage of the strengths of blockchain:

Once a prediction is made or an actual temperature reading is captured it cannot be changed (immutability).
If a server shuts down the actual readings can still be captured, predictions can be validated, and wagers can be paid (distributed).
If someone wants to make sure their prediction was captured correctly, they can inspect the block (transparency).
And, finally, no intermediary is needed to pay off the bet–the contract is guaranteed to be executed as written when the conditions that trigger it are met (trustworthiness).

A Developer’s Perspective

I wanted to experiment with some of these concepts on my own, so I started with Ethereum. Ethereum is an open source blockchain platform. You may have heard of people trading a cryptocurrency called Ether (ETH). You can use the same Ethereum platform to develop and run your own distributed applications (“dapps”) regardless of whether or not they are related to cryptocurrency.

Ethereum developers use a command line tool called geth to perform tasks with their Ethereum nodes and a tool called Solidity to compile JavaScript into smart contracts.

To try this out I spun up an Ubuntu server, installed geth and Solidity, then worked through the Greeter example. It’s not very exciting–Greeter is essentially a hello world example. But, it does help you understand what it takes to start up a local test network and to write and compile smart contracts.

One of the immediate learnings is that transactions on Ethereum cost money (called “gas”). Even if you are running a test network, you must first mine enough Ether on your test network before you can run a transaction. On the live network, the cost of running a transaction goes to the miners who are running servers that are busy validating transactions and updating the blockchain.

Another gotcha for me was that your miner has to keep working while you deploy your smart contract. It makes sense now–the smart contract is stored on the blockchain and therefore it must be validated like any other transaction, but it was definitely not clear to me while working through the tutorial.

Ethereum was definitely interesting, but I was looking for something a little more business-oriented. That’s when I found Hyperledger Fabric.

Hyperledger is an umbrella open source project that includes several blockchain related sub-projects. One of those is called Fabric, which, like Ethereum, is a blockchain implementation, but Fabric has some features that make it more attractive to businesses, such as a modular architecture and the ability to assign permissions to data and smart contracts.

Another very attractive sub-project is Hyperledger Composer. Composer is a set of tools that abstract away some of the nitty-gritty details related to creating and managing smart contracts (called “chaincode” in Hyperledger parlance).

The easiest way to try Fabric and Composer is the IBM Blockchain Platform Playground Tutorial (IBM Is a major contributor to the Hyperledger project). This is essentially a quick intro that you work through in your web browser.

The Playground Tutorial is worth it to get exposed to the concepts, but I wanted to run everything locally. You can do that easily with Docker. IBM’s Developer Tutorial walks you through setup and then takes you through a very simple “commodity trading” example. This tutorial is just a little deeper than the Playground, but at least you are running everything in containers running on your machine.

Finally, I came across J Steven Perry’s Hyperledger tutorial on developerWorks. This three part tutorial was awesome. You essentially build a “perishable goods business network” that models how contracts, shipments, and payments work with growers, shippers, and importers. By the time you are done, you’ve got GPS and temperature sensors sending data to the blockchain, chaincode that is looking at the contract and sensor data to determine how much stakeholders should get paid based on contract terms, and a RESTful API that you can call to make the whole thing work.

This tutorial really highlights how a Hyperledger solution can be used to share data between business partners in a distributed fashion while ensuring that only relevant operations and data are exposed depending on stakeholder role.

The developer-friendly tooling also flattened the learning curve significantly when compared to my out-of-the-box experience with Ethereum. I’m not saying one platform is inherently better than the other–it depends on what you are trying to do. I just felt much more productive with Hyperledger Fabric and Composer.

Tangential Topics

As I was exploring blockchain I went down a few interesting “bunny trails” here and there. Rather than go into details, I’ll just provide links in case any of it looks interesting:

IBM Blockchain Developer Center. Lots of quality learning resources.
Evernym. A startup focused on digital identity.
Sovrin. A foundation focused on self-sovereign identity and decentralized trust.
Hyperledger Indy. A distributed ledger specifically designed for handling identity.
IPFS. Decentralized object storage that some say could replace the way we reference objects today using HTTP.
Juan Benet TEDx San Francisco talk on IPFS
Filecoin. Take IPFS and add monetization to incentivize disk space farmers. Similar to storj.io.
Ethereum White Paper.
Proof of Work.
Hashcash Algorithm.

Summary

I’m really glad I took the time to dig into blockchain. The hype is definitely at a fever pitch. They say blockchain engineers are in high demand right now. I suspect that one day we’ll all have blockchain in our toolbox, selecting it to be part of our solutions when it make sense, just like we do with any other technology.

Photo Credit: Chain, by Astro, CC-BY 2.0

I’ve been playing with a new object storage solution that’s kind of cool. It’s called Storj. Before I describe how it works, let me start by comparing it to a more familiar solution.

Probably the best-known example of object storage is Amazon S3. It allows you to define buckets and then upload files into those buckets. Amazon charges you based on the amount you store and the amount you transfer, plus a little based on the total number of objects stored. There are three tiers of storage based on frequency of access and pricing varies by region, but for discussion purposes let’s say it is about $0.023 per GB per month. To store 500 GB that would cost about $138 per year without transfer fees.

For that $138 you can be sure that Amazon is replicating your data across multiple facilities and devices. Amazon says that S3 offers 99.999999999% durability. That’s pretty impressive.

But one consideration with using S3 or any other traditional cloud storage solution is that your data is sitting in data centers owned by a single vendor. Of course you could take steps to replicate that data to other providers, but that is kind of a pain. Even then you will still end up with your data sitting behind a relatively small number of vendors, none of whom are really geared toward transparency and openness.

Storj.io was built to address this problem. It’s an open source, distributed object storage platform. Like S3, the model consists of buckets and files in those buckets. The difference is in how your data is stored. When you upload a file to Storj, your file is broken into small pieces called shards, encrypted using keys you hold, and then uploaded to several nodes around the world.

Here’s where it gets really interesting. The nodes that store data are not owned by a single entity. Instead, nodes are run by “storage farmers”. Disk farming is kind of like crypto currency farming, but instead of solving mathematical computations to earn crypto coins, farmers receive micro payments based on how much their space gets utilized. Storj actually leverages the Ethereum blockchain to make this work, and if you are interested in the nitty gritty details, you should check out the whitepaper.

A farmer might be an individual with 50 GB of spare disk space or it could be an organization with lots and lots of space. You don’t know and you don’t really care. Their space gets selected based on a number of factors, which includes things like stability of the node, bandwidth, and total space available. If a farmer tries to tamper with any data on their node they get dropped and they don’t get paid.

Right now Storj is offering 25 GB of free space for one year. After that, their current pricing is $.015 per GB per month. So using my 500 GB example, that’s $90 per year without transfer fees. And if you have some extra storage sitting around, you could become a farmer and offset your costs a little bit.

To be clear, Storj is a tool for developers. After signing up you’ll get presented with a GUI for creating buckets, but when it’s time to start moving data into those buckets you’ll need the API. Right now there is a NodeJS library or you can use a command-line tool provided by a native installer.

This service definitely looks promising, but it is important to know that it is still early. One thing to think about is what happens if farmers start dropping out of the network. When your file is split into shards, each shard is copied to multiple nodes. In my quick test, shards were spread across five nodes, which is plenty to give me confidence that I will be able to get my file back.

If a node drops offline, it is supposed to trigger a replication of your shard to another farmer using one of the remaining good nodes. This works great unless all of the farmers who hold one of your shards drop at once, but with 19,000 farmers and climbing, and assuming your shards are always on multiple nodes, the chances of that happening seem very, very low. The docs say that Storj is working on rolling out additional mirroring strategies. And, you can always use the API to ask Storj which nodes your file is sharded across. It looks like you can make an API call to move a shard yourself, but I haven’t tried that yet.

One last thing to point out is that this is an open source project. You are welcome to contribute. You can even grab the software and run a completely private Storj network, if you want.

I feel like some of my clients are still getting used to the idea of putting their data in the cloud. And some like one throat to choke. A distributed cloud like this may be a tougher sell for conservative customers, even if the security and the durability are there. Still, I love the concept. What do you think?

ECM Architect

Tag: Ethereum

Exploring Blockchain