Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (Extended Version)
Fedor Ryabinin, Alexey Gotsman, Pierre Sutra
TL;DR
This work addresses the fragility and complexity of leaderless SMR protocols by presenting EPaxos*, a simpler and correct variant of Egalitarian Paxos. The authors introduce a streamlined failure-recovery mechanism and provide formal proofs of correctness for both thrifty and non-thrifty modes, while generalizing the protocol to the full spectrum of fault-tolerance parameters with $n \ge \max\{2e+f-1, 2f+1\}$, showing these bounds are optimal. EPaxos* preserves the fast-path advantage (2-message delays) under appropriate failure thresholds, reducing the latencies for non-co-located clients without sacrificing safety or availability. The contributions have practical impact on wide-area deployments and inform the design of robust, leaderless SMR systems that maintain non-zero throughput under arbitrary failures.
Abstract
Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are not co-located with it. As a response to these drawbacks, Egalitarian Paxos introduced an alternative, leaderless approach, that allows replicas to order commands collaboratively. Not relying on a single leader allows the protocol to maintain non-zero throughput with up to $f$ crashes of any processes out of a total of $n = 2f+1$. The protocol furthermore allows any process to execute a command $c$ fast, in $2$ message delays, provided no more than $e = \lceil\frac{f+1}{2}\rceil$ other processes fail, and all concurrently submitted commands commute with $c$; the latter condition is often satisfied in practical systems. Egalitarian Paxos has served as a foundation for many other replication protocols. But unfortunately, the protocol is very complex, ambiguously specified and suffers from nontrivial bugs. In this paper, we present EPaxos* -- a simpler and correct variant of Egalitarian Paxos. Our key technical contribution is a simpler failure-recovery algorithm, which we have rigorously proved correct. Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds $f$ and $e$ such that $n \ge \max\{2e+f-1, 2f+1\}$ -- the number of processes that we show to be optimal.
