Ddia-Consensus

Consistency and Consensus

Linearizability

In short, linearizability makes a system appear as if there is only one copy of the data. In particular, as soon as a new value is written or read, all subsequent reads see that new value. Due to unbounded network delays, linearizability is often slow in practice. Many systems can actually get by with just a partial causal ordering.

  • Consensus algorithms: linearizable
  • Single-leader replication: potentially linearizable
  • Multi-leader replication: not linearizable
  • Leaderless replication: not linearizable

Distributed Transactions

Two Phase Commit (2PC): happens in two phases: prepare and commit. Once all nodes have prepared and responded that they will, under any circumstances, commit if directed, a coordinator can then issue the true commit, and inform the nodes to commit as well.
This is a blocking atomic commit protocol: if the coordinator goes down while everyone is prepared, they are stuck waiting indefinitely and likely blocking other transactions.

In general a nonblocking atomic commit requires a perfect failure detector for crashed nodes; in a network with unbounded delays, timeouts are not reliable for failure detectors.

Consensus Providers

There are a few popular fault-tolerant consensus algorithms, e.g.

  • Viewstamped Replication
  • Paxos
  • Raft
  • Zab
    that are implemented via total order broadcast. Rather than implement these from scratch per application, we can use tools that offer these consensus mechanisms, like
  • Zookeeper
  • etcd
    For example, HBase, Hadoop YARN, and Kafka are able to be fault tolerant by relying on ZooKeeper running in the background and "outsourcing" the hard consensus logic.