Table of Contents
Fetching ...

Range-Based Set Reconciliation via Range-Summarizable Order-Statistics Stores

Elvio G. Amparore

Abstract

Range-Based Set Reconciliation (RBSR) synchronizes ordered sets by recursively comparing summaries of contiguous ranges and refining only the mismatching parts. While its communication complexity is well understood, its local computational cost fundamentally depends on the storage backend that must answer repeated range-summary, rank, and enumeration queries during refinement. We argue that a natural storage abstraction for RBSR implementations based on composable range aggregates is a \emph{range-summarizable order-statistics store} (RSOS): a dynamic ordered-set structure supporting composable summaries of contiguous ranges together with rank/select navigation. This identifies and formalizes the backend contract needed for efficient recursive refinement, combining range-summary support with order-statistics navigation for balanced partitioning. We then show that a specific augmentation of B\textsuperscript{+}-trees with subtree counts and composable summaries realizes a RSOS, and we derive corresponding bounds on local reconciliation work in this abstract storage model. Finally, we introduce AELMDB, an extension of LMDB that realizes this design inside a persistent memory-mapped engine, and evaluate it through an integration with Negentropy. The results show that placing the reconciliation oracle inside the storage tree substantially reduces local reconciliation cost on the evaluated reconciliation-heavy workloads compared with an open-source persistent baseline based on auxiliary tree caches, while the window-subrange ablation further confirms the usefulness of the systems optimizations built on top of the core aggregate representation.

Range-Based Set Reconciliation via Range-Summarizable Order-Statistics Stores

Abstract

Range-Based Set Reconciliation (RBSR) synchronizes ordered sets by recursively comparing summaries of contiguous ranges and refining only the mismatching parts. While its communication complexity is well understood, its local computational cost fundamentally depends on the storage backend that must answer repeated range-summary, rank, and enumeration queries during refinement. We argue that a natural storage abstraction for RBSR implementations based on composable range aggregates is a \emph{range-summarizable order-statistics store} (RSOS): a dynamic ordered-set structure supporting composable summaries of contiguous ranges together with rank/select navigation. This identifies and formalizes the backend contract needed for efficient recursive refinement, combining range-summary support with order-statistics navigation for balanced partitioning. We then show that a specific augmentation of B\textsuperscript{+}-trees with subtree counts and composable summaries realizes a RSOS, and we derive corresponding bounds on local reconciliation work in this abstract storage model. Finally, we introduce AELMDB, an extension of LMDB that realizes this design inside a persistent memory-mapped engine, and evaluate it through an integration with Negentropy. The results show that placing the reconciliation oracle inside the storage tree substantially reduces local reconciliation cost on the evaluated reconciliation-heavy workloads compared with an open-source persistent baseline based on auxiliary tree caches, while the window-subrange ablation further confirms the usefulness of the systems optimizations built on top of the core aggregate representation.
Paper Structure (32 sections, 4 theorems, 49 equations, 1 figure, 3 tables, 2 algorithms)

This paper contains 32 sections, 4 theorems, 49 equations, 1 figure, 3 tables, 2 algorithms.

Key Result

Proposition 4.1

Assume that whenever the protocol returns Skip on a queried range $[l,u)$, one has Then RBSR computes the exact symmetric difference $\Delta(X,Y)$.

Figures (1)

  • Figure 1: Simplified AELMDB on-disk page layout. Aggregate metadata is stored in the per-database descriptor and in branch-node prefixes, while leaf pages keep the records themselves. Extensions of the LMDB format are underlined.

Theorems & Definitions (21)

  • Definition 3.1: Ordered universe
  • Definition 3.2: Replica state
  • Definition 3.3: Reconciliation instance
  • Definition 3.4: Element-summary monoid
  • Definition 3.5: Range aggregate
  • Definition 3.6: Range comparison map
  • Definition 3.7: Rank and select
  • Definition 3.8: Balanced $b$-partition
  • Definition 3.9: RSOS
  • Proposition 4.1: Protocol correctness under sound skip decisions
  • ...and 11 more