Hyperledger Fabric Consensus Explained

What is consensus?

Definition: Consensus is the mechanism that ensures all copies of a distributed ledger are the same i.e., at all times I will have exactly the same copy of the ledger as you. This is critical – imagine my copy of the ledger saying you owe me $100 bucks whereas your copy of the ledger saying I owe you $100 bucks – and the ensuing mess.

Consensus is not a new thing that was invented with blockchain technology. It is an essential component of any distributed database (a distributed database is a database in which multiple copies of the database exist on multiple computers, referred to as nodes in literature) and algorithms for establishing consensus (Paxos, Raft, BFT, etc.) were developed in distributed systems literature way before blockchain was invented.

How does Hyperledger Fabric achieve consensus?

Hyperledger Fabric achieves consensus through its ordering service. This service establishes a total order on the transactions submitted to the network.

This is best illustrated with the WhatsApp analogy. Have you ever used WhatsApp, Slack, Teams, HipChat, RocketChat or another chat application where you received messages out of order? What do we mean by out of order – it means the order in which you received the messages was not the same as the order in which messages were sent. It happened to me once when I was using the built-in chat in OfferUp to communicate with a buyer. I sent two messages and the buyer received the one I sent later first, followed by the one I sent first later. So the order in which I sent messages was (A, B) but the receiver received the messages as (B, A).

Why does it happen? Imagine a chat room with one hundred or a thousand participants. Messages are being generated at a fast rate – lets say more than 10 per second. There are two architectures possible: 1) an architecture in which there is a backend server to which messages are submitted; this server then announces availability of new messages to the receivers (known as a broadcast) followed by the receivers pulling the new messages from the server. 2) Another way to architect the system is to implement a p2p network in which there is no central server.

Lets try to understand what happens in both cases. In case of the central server, because of network latency it is possible that the order in which the server will receive the messages is not the same as the order in which messages were sent e.g., if a computer is geographically close to the server, its message may arrive earlier than a computer who generated the message first, but whose message has to travel a greater distance. This is not the only factor. Its possible that the first computer may be on a higher bandwidth connection than the second. Then, in practice if we consider a large-scale system, there will not be a single backend server – the backend server software would be running on multiple computers to divide and conquer the flood of incoming messages. You might think that the problem can be avoided by having a timestamp as part of the message when its generated on the client – then the server can unambiguously determine which message came first. But think about what will happen in practice. Lets say the server got a message A with timestamp 12:00:00 and it broadcasted it to all the receivers. Then 3 minutes later it got a message B with timestamp 11:57:00. It knows that B should come before A but now it cannot undo the broadcast that has been done – it cannot ask receivers to undo all the actions they may have taken as result of the broadcast of A. To push it even further, in practice it is possible that the clocks on different clients in different geographies and time zones won’t be in sync with each other so one cannot rely on timestamps in messages to establish chronological order of messages. And let’s not even involve this guy (nevermind if you don’t get the prank).

In the other case of p2p network, the messages spread using a gossip protocol. Here it is even more likely that messages can arrive in different order on different nodes since the gossip involves periodic, pairwise, inter-process interactions with some form of randomness in the peer selection. Btw in case you didn’t notice, this is a new problem; in previous paragraph we were discussing messages arriving out of order on the server, but now we have switched to messages arriving in different order on different nodes.

In fact, this phenomenon of messages arriving in different order on different nodes happens with Bitcoin also. Bitcoin protocol ensures that blocks are generated every 10 minutes or so with some spread. But still its possible for two blocks to be generated very close to each other in (terms of time) and when that happens depending on which block reaches a node first, there are temporary forks in the blockchain (also known as branches, illustrated graphically in this article). But then bitcoin protocol ensures that only one branch will survive – the way it does this is by mandating that in the presence of competing branches, all nodes have to select the longest branch. And due to the nature of the system, it is guaranteed that a longest branch will emerge eventually – read the probabilistic calculations in this article for details. This is how Bitcoin achieves consensus. And also Ethereum.

Hyperledger Fabric achieves consensus in a different way. It relies on a backend service (known as the ordering service) that intermediates the messages between senders and receivers. This backend service will ensure that all receivers will see messages in same order – it follows that if all receivers see messages in same order, they will perform the same actions/commits etc. Voila! consensus is achieved. How does it do this? By using Apache Kafka, a widely used open source pubsub service developed much before blockchain was invented. In fact I believe it is also used by applications like WhatsApp, Slack, Teams etc. just for this very purpose – so that all clients see messages coming in the same order and forms the backbone of these applications on the backend. In case of applications like WhatsApp, its not a big deal if some messages arrive in different order on different peers but it makes a big difference in case of a blockchain where it can cause the ledger to fall out of sync between peers.

The consensus mechanism is one of the key ways in which Hyperledger Fabric is different from other blockchains such as Bitcoin or Ethereum and its important to understand what it is and what it is not. For sometime I thought Fabric doesn’t really come with a consensus mechanism but that is not true. Fabric is using the intermediate server architecture we covered above where messages are sent to a server which then broadcasts the messages to receivers ensuring that all receivers will see the exact same order of messages. Btw, note that this order need not be the chronological order – in fact as we have seen above the concept of chronological order is not very well defined in a distributed system. It just needs to be a order – the simplest way to order messages is to order them in the order in which they are received on the server and this is exactly what the Solo orderer does. The Solo orderer is called so because it runs a single instance of the orderer and in this case it is trivial to establish a total order on messages. In practice in a production system, we don’t want a single point of failure and thus want to have more than one ordering node – that is where Apache Kafka is used. As explained in this article: “In Kafka, only the leader does the ordering and only the in-sync replicas can be voted as leader. This provides crash fault-tolerance (CFT) and finality happens in a matter of seconds. While Kafka is crash fault tolerant, it is not Byzantine fault tolerant, (BFT) which prevents the system from reaching agreement in the case of malicious or faulty nodes”. What it is saying is that there is a master node which does the ordering and if the master node fails, someone else is available to take over. This is known as crash fault tolerance. There is another, much more difficult, type of fault that is studied in distributed database replication – known as the Byzantine Fault. Kafka does not protect against that. For more details on Byzantine Fault refer to here and here.

Whereas Bitcoin and Ethereum use the p2p network without any intermediating service in between who is in charge of establishing a total order on the transactions. Incidentally this means the temporary forks in Bitcoin will never happen with Hyperledger Fabric – a desirable property in enterprise applications I think. With Bitcoin a seller has to wait for 6 blocks or 1 hour as a rule of thumb to be sure that the payment made to them will end up in the blockchain [ref].

The ordering service is completely agnostic to the contents of the messages – it does not look into the message to see what it is. Thus it does not look or analyze in any way the read-write set produced by the endorsing peers. Its sole purpose is to establish total order on the messages. Messages could be thought of as events. Messages, transactions, events are all synonymous in this discussion.

Few concluding notes: you may encounter articles on the web saying HL Fabric uses BFT or PBFT for consensus but that is not true and a result of people copying and pasting something they read on the internet without verifying the facts. As of this writing, Fabric (v1.4) uses Kafka for consensus. Kafka in turn uses Zookeeper. Upcoming versions of Fabric will replace Kafka with Raft – another protocol to achieve consensus. The work is being tracked here. Also read ordering-faqs. Also some articles on the web will state Fabric’s consensus protocol is pluggable – while this is not incorrect, it is easier said than done. If you want to use a protocol other than Solo or Kafka, you will have to write your own plugin and compile your own custom binary of fabric-orderer. For details refer to the question titled “I want to write a consensus implementation for Fabric. Where do I begin?” in the orderer-faqs.

Advertisements
This entry was posted in Software. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s