What is consensus?
Definition: Consensus is the mechanism that
ensures all copies of a distributed ledger are the same i.e., at all times
I will have exactly the same copy of the ledger as you. This is critical –
imagine my copy of the ledger saying you owe me $100 bucks whereas your copy of
the ledger saying I owe you $100 bucks – and the ensuing mess.
Consensus is not a new
thing that was invented with blockchain technology. It is an essential
component of any distributed database (a distributed database is a database in
which multiple copies of the database exist on multiple computers, referred to
as nodes in literature) and algorithms for establishing consensus (Paxos, Raft,
BFT, etc.) were developed in distributed systems literature way before blockchain
How does Hyperledger Fabric achieve consensus?
achieves consensus through its ordering service. This service establishes a
total order on the transactions submitted to the network.
This is best illustrated
with the WhatsApp analogy. Have you ever used WhatsApp, Slack, Teams, HipChat,
RocketChat or another chat application where you received messages out of
order? What do we mean by out of order – it means the order in which you
received the messages was not the same as the order in which messages were
sent. It happened to me once when I was using the built-in chat in OfferUp to
communicate with a buyer. I sent two messages and the buyer received the one I
sent later first, followed by the one I sent first later. So the order in which
I sent messages was (A, B) but the receiver received the messages as (B, A).
Why does it happen? Imagine a chat room with
one hundred or a thousand participants. Messages are being generated at a fast
rate – lets say more than 10 per second. There are two architectures possible:
1) an architecture in which there is a backend server to which messages are
submitted; this server then announces availability of new messages to the
receivers (known as a broadcast) followed
by the receivers pulling the new messages from
the server. 2) Another way to architect the system is to implement a p2p network in which there is no central server.
Lets try to understand what happens in both
cases. In case of the central server, because of network latency it is possible that the order in
which the server will receive the messages is not the same as the order in
which messages were sent e.g., if a computer is geographically close to the
server, its message may arrive earlier than a computer who generated the
message first, but whose message has to travel a greater distance. This is not
the only factor. Its possible that the first computer may be on a higher
bandwidth connection than the second. Then, in practice if we consider a
large-scale system, there will not be a single backend server – the backend
server software would be running on multiple computers to divide and conquer
the flood of incoming messages. You might think that the problem can be avoided
by having a timestamp as part of the message when its generated on the client –
then the server can unambiguously determine which message came first. But think
about what will happen in practice. Lets say the server got a message A with
timestamp 12:00:00 and it broadcasted it to all the receivers. Then 3 minutes
later it got a message B with timestamp 11:57:00. It knows that B should come
before A but now it cannot undo the broadcast that has been done – it cannot
ask receivers to undo all the actions they may have taken as result of the
broadcast of A. To push it even further, in practice it is possible that the
clocks on different clients in different geographies and time zones won’t be in
sync with each other so one cannot rely on timestamps in messages to establish
chronological order of messages. And let’s not even involve this guy
(nevermind if you don’t get the prank).
In the other case of p2p network, the messages spread using a gossip protocol. Here it is even more likely that messages can arrive in different order on different nodes since the gossip involves periodic, pairwise, inter-process interactions with some form of randomness in the peer selection. Btw in case you didn’t notice, this is a new problem; in previous paragraph we were discussing messages arriving out of order on the server, but now we have switched to messages arriving in different order on different nodes.
In fact, this phenomenon of messages arriving in different order on different nodes happens
with Bitcoin also. Bitcoin protocol ensures that blocks are generated every 10
minutes or so with some spread. But still its possible for two blocks to be
generated very close to each other in (terms of time) and when that happens
depending on which block reaches a node first, there are temporary forks in the blockchain (also known as branches, illustrated graphically in this article). But then bitcoin protocol ensures
that only one branch will survive – the way it does this is by mandating that
in the presence of competing branches, all nodes have to select the longest
branch. And due to the nature of the system, it is guaranteed that a longest
branch will emerge eventually – read the probabilistic calculations in this article
for details. This is how Bitcoin achieves consensus. And also Ethereum.
Hyperledger Fabric achieves consensus in a
different way. It relies on a backend service (known as the ordering service)
that intermediates the messages between senders and
receivers. This backend service will ensure that all
receivers will see messages in same order – it follows that if all receivers see messages in same order, they
will perform the same actions/commits etc. Voila! consensus is
achieved. How does it do this? By using Apache Kafka, a widely used open source pubsub service developed much before blockchain
was invented. In fact I believe it is also used by applications like WhatsApp,
Slack, Teams etc. just for this very purpose – so that all clients see messages
coming in the same order and forms the backbone of these applications on the
backend. In case of applications like WhatsApp, its not a big deal if some
messages arrive in different order on different peers but it makes a big
difference in case of a blockchain where it can cause the ledger to fall out of
sync between peers.
The consensus mechanism is one of the key ways in which Hyperledger Fabric is different from other blockchains such as Bitcoin or Ethereum and its important to understand what it is and what it is not. For sometime I thought Fabric doesn’t really come with a consensus mechanism but that is not true. Fabric is using the intermediate server architecture we covered above where messages are sent to a server which then broadcasts the messages to receivers ensuring that all receivers will see the exact same order of messages. Btw, note that this order need not be the chronological order – in fact as we have seen above the concept of chronological order is not very well defined in a distributed system. It just needs to be a order – the simplest way to order messages is to order them in the order in which they are received on the server and this is exactly what the Solo orderer does. The Solo orderer is called so because it runs a single instance of the orderer and in this case it is trivial to establish a total order on messages. In practice in a production system, we don’t want a single point of failure and thus want to have more than one ordering node – that is where Apache Kafka is used. As explained in this article: “In Kafka, only the leader does the ordering and only the in-sync replicas can be voted as leader. This provides crash fault-tolerance (CFT) and finality happens in a matter of seconds. While Kafka is crash fault tolerant, it is not Byzantine fault tolerant, (BFT) which prevents the system from reaching agreement in the case of malicious or faulty nodes”. What it is saying is that there is a master node which does the ordering and if the master node fails, someone else is available to take over. This is known as crash fault tolerance. There is another, much more difficult, type of fault that is studied in distributed database replication – known as the Byzantine Fault. Kafka does not protect against that. For more details on Byzantine Fault refer to here and here.
Whereas Bitcoin and Ethereum use the p2p network without any intermediating service in
between who is in charge of establishing a total order on the transactions.
Incidentally this means the temporary forks in Bitcoin will never happen with
Hyperledger Fabric – a desirable property in enterprise applications I think.
With Bitcoin a seller has to wait for 6 blocks or 1 hour as a rule of thumb to
be sure that the payment made to them will end up in the blockchain [ref].
The ordering service is completely agnostic to
the contents of the messages – it does not look into the message to see what it
is. Thus it does not look or analyze in any way the read-write set produced by the endorsing peers.
Its sole purpose is to establish total order on the messages. Messages could be
thought of as events. Messages, transactions, events are all synonymous in this
Few concluding notes: you may encounter articles on the web saying HL Fabric uses BFT or PBFT for consensus but that is not true and a result of people copying and pasting something they read on the internet without verifying the facts. As of this writing, Fabric (v1.4) uses Kafka for consensus. Kafka in turn uses Zookeeper. Upcoming versions of Fabric will replace Kafka with Raft – another protocol to achieve consensus. The work is being tracked here. Also read ordering-faqs. Also some articles on the web will state Fabric’s consensus protocol is pluggable – while this is not incorrect, it is easier said than done. If you want to use a protocol other than Solo or Kafka, you will have to write your own plugin and compile your own custom binary of fabric-orderer. For details refer to the question titled “I want to write a consensus implementation for Fabric. Where do I begin?” in the orderer-faqs.