RACS-SADL: Robust and Understandable Randomized Consensus in the Cloud
Pasindu Tennage, Antoine Desjardins, Lefteris Kokoris-Kogias
TL;DR
RACS addresses robustness gaps in cloud consensus by combining a Raft-inspired synchronous path with a randomized fallback mode that operates under adversarial networks. It preserves one-round-trip commit latency in synchronous periods and remains easy to integrate because it extends Raft, while introducing SADL as a separate, throughput-focused dissemination layer. Experimental results on Amazon EC2 show that RACS achieves 28k cmd/sec under adversarial WAN conditions (versus 2.8k for Raft/Multi-Paxos) and that SADL-RACS can reach up to 380k cmd/sec in WAN-scale setups and 420k cmd/sec in LAN, with latency around 300 ms in synchronous WAN scenarios. The work provides formal proofs and practical guidance for deploying randomized consensus in clouds, and points to future directions including energy efficiency and enhanced partition resilience.
Abstract
Widely deployed consensus protocols in the cloud are often leader-based and optimized for low latency under synchronous network conditions. However, cloud networks can experience disruptions such as network partitions, high-loss links, and configuration errors. These disruptions interfere with the operation of leader-based protocols, as their view change mechanisms interrupt the normal case replication and cause the system to stall. We propose RACS, a novel randomized consensus protocol that ensures robustness against adversarial network conditions. RACS achieves optimal one-round trip latency under synchronous network conditions while remaining resilient to adversarial network conditions. RACS follows a simple design inspired by Raft, the most widely used consensus protocol in the cloud, and therefore enables seamless integration with the existing cloud software stack. Experiments with a prototype running on Amazon EC2 show that RACS achieves 28k cmd/sec throughput, ninefold higher than Raft under adversarial cloud network conditions. Under synchronous network conditions, RACS matches the performance of Multi-Paxos and Raft, achieving a throughput of 200k cmd/sec with a median latency of 300ms, confirming that RACS introduces no unnecessary overhead. Finally, SADL-RACS, a throughput-optimized version of RACS, achieves a throughput of 500k cmd/sec, delivering 150% higher throughput than Raft.
