Optimizing Distributed Protocols with Query Rewrites [Technical Report]
David Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky Katahanas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard
TL;DR
This work introduces rule-driven rewrites to scale distributed protocols by treating decoupling and partitioning as general, correct-by-construction optimizations applicable to any Dedalus-expressed protocol. By leveraging order-insensitivity, monotonicity, and data-dependency analyses (FDs/CDs), the authors devise preconditions and mechanisms for coordinating-free decoupling and partitioning, achieving substantial throughput gains across Voting, 2PC, and Paxos. The approach provides formal correctness arguments via local rewrites and linearizability, and demonstrates performance parity with state-of-the-art ad hoc rewrites in Paxos, while highlighting opportunities for automation. The results suggest a practical path toward automated optimizers for distributed protocols, with implications for scalable cloud systems and future research into adaptive, dataflow-inspired protocol design.
Abstract
Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. New scalable distributed protocols are developed through careful analysis and rewrites, but this process is ad hoc and error-prone. This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimization. Distributed protocol rewrites entail a new burden: reasoning about spatiotemporal correctness. We leverage order-insensitivity and data dependency analysis to systematically identify correct coordination-free scaling opportunities. We apply this analysis to create preconditions and mechanisms for coordination-free decoupling and partitioning, two fundamental vertical and horizontal scaling techniques. Manual rule-driven applications of decoupling and partitioning improve the throughput of 2PC by $5\times$ and Paxos by $3\times$, and match state-of-the-art throughput in recent work. These results point the way toward automated optimizers for distributed protocols based on correct-by-construction rewrite rules.
