Table of Contents
Fetching ...

Rateless Bloom Filters: Set Reconciliation for Divergent Replicas with Variable-Sized Elements

Pedro Silva Gomes, Carlos Baquero

TL;DR

This work addresses set reconciliation for variable-sized elements by introducing Rateless Bloom Filters (RBFs) and a hybrid protocol combining RBFs with Rateless IBLTs (RIBLTs). The key novelty is a parameter-free, adaptive approach that handles unknown symmetric-difference sizes and adapts the Bloom-filter capacity to the actual difference, achieving near-optimal communication cost. Empirical results show substantial metadata reductions (up to 92%) and competitive total data transfers across a wide range of similarities, especially when $J(S_A,S_B)\le 0.85$. The method balances three-phase processing (unidirectional Bloom-filter streaming, bidirectional partitioning, and rateless IBLT reconciliation) to efficiently reconcile variable-sized datasets in distributed systems.

Abstract

Set reconciliation protocols typically make two critical assumptions: they are designed for fixed-sized elements and they are optimized for when the difference cardinality, d, is very small. When adapting to variable-sized elements, the current practice is to synchronize fixed-size element digests. However, when the number of differences is considerable, such as after a network partition, this approach can be inefficient. Our solution is a two-stage hybrid protocol that introduces a preliminary Bloom filter step, specifically designed for this regime. The novelty of this approach, however, is in solving a core technical challenge: determining the optimal Bloom filter size without knowing d. Our solution is the Rateless Bloom Filter (RBF), a dynamic filter that naturally adapts to arbitrary symmetric differences, closely matching the communication complexity of an optimally configured static filter without requiring any prior parametrization. Our evaluation in sets of variable-sized elements shows that for Jaccard indices below 85%, our RBF-IBLT hybrid protocol reduces the total communication cost by up to over 20% compared to the state-of-the-art.

Rateless Bloom Filters: Set Reconciliation for Divergent Replicas with Variable-Sized Elements

TL;DR

This work addresses set reconciliation for variable-sized elements by introducing Rateless Bloom Filters (RBFs) and a hybrid protocol combining RBFs with Rateless IBLTs (RIBLTs). The key novelty is a parameter-free, adaptive approach that handles unknown symmetric-difference sizes and adapts the Bloom-filter capacity to the actual difference, achieving near-optimal communication cost. Empirical results show substantial metadata reductions (up to 92%) and competitive total data transfers across a wide range of similarities, especially when . The method balances three-phase processing (unidirectional Bloom-filter streaming, bidirectional partitioning, and rateless IBLT reconciliation) to efficiently reconcile variable-sized datasets in distributed systems.

Abstract

Set reconciliation protocols typically make two critical assumptions: they are designed for fixed-sized elements and they are optimized for when the difference cardinality, d, is very small. When adapting to variable-sized elements, the current practice is to synchronize fixed-size element digests. However, when the number of differences is considerable, such as after a network partition, this approach can be inefficient. Our solution is a two-stage hybrid protocol that introduces a preliminary Bloom filter step, specifically designed for this regime. The novelty of this approach, however, is in solving a core technical challenge: determining the optimal Bloom filter size without knowing d. Our solution is the Rateless Bloom Filter (RBF), a dynamic filter that naturally adapts to arbitrary symmetric differences, closely matching the communication complexity of an optimally configured static filter without requiring any prior parametrization. Our evaluation in sets of variable-sized elements shows that for Jaccard indices below 85%, our RBF-IBLT hybrid protocol reduces the total communication cost by up to over 20% compared to the state-of-the-art.

Paper Structure

This paper contains 33 sections, 5 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Insertion into a standard Bloom filter ($m=12, k=3$). Elements $x_1$ and $x_2$ are inserted.
  • Figure 2: Lookup in a standard Bloom filter. $x_1$ is a true positive, $y_1$ is a true negative, and $y_2$ is a false positive.
  • Figure 3: An example IBLT.
  • Figure 4: Set reconciliation with IBLTs.
  • Figure 5: Static Bloom Filters vs. Rateless Bloom filters
  • ...and 4 more figures