Table of Contents
Fetching ...

Better space-time-robustness trade-offs for set reconciliation

Djamal Belazzougui, Gregory Kucherov, Stefan Walzer

TL;DR

The paper tackles set reconciliation by reconciling symmetric differences from sketches with a tunable trade-off between space, time, and failure probability. It builds on space-efficient IBLTs and augments them with a stash, together with a refined anomaly analysis that yields exponential decay in the failure probability, $2^{-\,\Omega(r)}$, for recovering close to the original set. A secondary contribution introduces a mechanism to distinguish which side of the symmetric difference an element originates from, enabling more informative outputs. The approach delivers a practical, parameterizable solution that approaches the efficiency of IBLTs while providing strong guarantees, with potential impact on genomic databases, distributed systems, and streaming reconciliation scenarios.

Abstract

We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer insufficient success guarantees for many applications. Here we propose a tunable trade-off between the two approaches combining the efficiency of IBLTs with exponentially decreasing failure probability. The proof relies on a refined analysis of IBLTs proposed in (Baek Tejs Houen et al. SOSA 2023) which has an independent interest. We also propose a modification of our algorithm that enables telling apart the elements of each set in the symmetric difference.

Better space-time-robustness trade-offs for set reconciliation

TL;DR

The paper tackles set reconciliation by reconciling symmetric differences from sketches with a tunable trade-off between space, time, and failure probability. It builds on space-efficient IBLTs and augments them with a stash, together with a refined anomaly analysis that yields exponential decay in the failure probability, , for recovering close to the original set. A secondary contribution introduces a mechanism to distinguish which side of the symmetric difference an element originates from, enabling more informative outputs. The approach delivers a practical, parameterizable solution that approaches the efficiency of IBLTs while providing strong guarantees, with potential impact on genomic databases, distributed systems, and streaming reconciliation scenarios.

Abstract

We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer insufficient success guarantees for many applications. Here we propose a tunable trade-off between the two approaches combining the efficiency of IBLTs with exponentially decreasing failure probability. The proof relies on a refined analysis of IBLTs proposed in (Baek Tejs Houen et al. SOSA 2023) which has an independent interest. We also propose a modification of our algorithm that enables telling apart the elements of each set in the symmetric difference.
Paper Structure (12 sections, 3 theorems, 1 figure, 1 table, 5 algorithms)

This paper contains 12 sections, 3 theorems, 1 figure, 1 table, 5 algorithms.

Key Result

Theorem 1

Whenever $n > c_k m$, a random $k$-hypergraph is peelable except with probability $\mathcal{O}(1/n^{k-2})$.

Figures (1)

  • Figure 1: IBLT implementation from HPW22 with added time limit in decode.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Lemma 3