LARK -- Linearizability Algorithms for Replicated Keys in Aerospike
Andrew Goodng, Kevin Porter, Thomas Lopatic, Ashish Shinde, Sunil Sayyaparaju, Srinivasan Seshadri, V. Srinivasan
TL;DR
LARK tackles the tension between strong consistency, latency, and availability in distributed key-value stores by replacing per-partition quorum logs with Partition Availability Conditions (PAC) and a log-free, per-key replication path. It achieves linearizability through per-key duplicate resolution and background migration while tolerating bounded view skew, enabling immediate partition readiness after leader changes. The work provides formal safety arguments and a TLA+ specification, and demonstrates substantial availability gains (approximately $p^{f+1}$ scaling for LARK vs $inom{2f+1}{f+1}p^{f+1}$ for quorum-log baselines) across RF values, plus favorable throughput during outages without sacrificing steady-state latency. Practically, LARK reduces infrastructure costs and enables zero-downtime rolling restarts in production Aerospike deployments, making it a compelling approach for real-time, strongly consistent KV stores.
Abstract
We present LARK (Linearizability Algorithms for Replicated Keys), a synchronous replication protocol that achieves linearizability while minimizing latency and infrastructure cost, at significantly higher availability than traditional quorum-log consensus. LARK introduces Partition Availability Conditions (PAC) that reason over the entire database cluster rather than fixed replica sets, improving partition availability under independent failures by roughly 3x when tolerating one failure and 10x when tolerating two. Unlike Raft, Paxos, and Viewstamped Replication, LARK eliminates ordered logs, enabling immediate partition readiness after leader changes -- with at most a per-key duplicate-resolution round trip when the new leader lacks the latest copy. Under equal storage budgets -- where both systems maintain only f+1 data copies to tolerate f failures -- LARK continues committing through data-node failures while log-based protocols must pause commits for replica rebuilding. These properties also enable zero-downtime rolling restarts even when maintaining only two copies. We provide formal safety arguments and a TLA+ specification, and we demonstrate through analysis and experiments that LARK achieves significant availability gains.
