Practical Rateless Set Reconciliation
Lei Yang, Yossi Gilad, Mohammad Alizadeh
TL;DR
The paper tackles the inefficiency of existing set reconciliation methods by introducing Rateless Invertible Bloom Lookup Tables (Rateless IBLT), a rateless, universal encoder that streams coded symbols encoding the set difference without requiring prior knowledge of the difference size. Through a carefully designed mapping probability $\rho(i)=\frac{1}{1+\alpha i}$ (with $\alpha=0.5$) and a closed-form sampling method, Rateless IBLT achieves decodability via a peeling decoder with an average communication overhead that converges to $1.35$ coded symbols per difference as $d$ grows, while maintaining low computation costs ($O(\ell\log d)$ per item). The authors provide rigorous density-evolution analysis, Monte Carlo validation, a compact Go implementation, and extensive evaluation against state-of-the-art schemes, demonstrating substantial reductions in communication and computation across large and small differences, including practical Ethereum state synchronization advantages. They also explore irregular Rateless IBLTs to further reduce overhead, trading some speed for improved efficiency. The practical impact is notable for distributed systems requiring scalable, low-latency state reconciliation across peers, with real-world benefits demonstrated on Ethereum ledger synchronization and potential applicability to blockchains and large-scale replicated services.
Abstract
Set reconciliation, where two parties hold fixed-length bit strings and run a protocol to learn the strings they are missing from each other, is a fundamental task in many distributed systems. We present Rateless Invertible Bloom Lookup Tables (Rateless IBLT), the first set reconciliation protocol, to the best of our knowledge, that achieves low computation cost and near-optimal communication cost across a wide range of scenarios: set differences of one to millions, bit strings of a few bytes to megabytes, and workloads injected by potential adversaries. Rateless IBLT is based on a novel encoder that incrementally encodes the set difference into an infinite stream of coded symbols, resembling rateless error-correcting codes. We compare Rateless IBLT with state-of-the-art set reconciliation schemes and demonstrate significant improvements. Rateless IBLT achieves 3--4x lower communication cost than non-rateless schemes with similar computation cost, and 2--2000x lower computation cost than schemes with similar communication cost. We show the real-world benefits of Rateless IBLT by applying it to synchronize the state of the Ethereum blockchain, and demonstrate 5.6x lower end-to-end completion time and 4.4x lower communication cost compared to the system used in production.
