Table of Contents
Fetching ...

Improved Interactive Protocol for Synchronizing From Deletions

Haolun, Ni, Lev Tauz, Ryan Gabrys, Lara Dolecek

TL;DR

This work advances data synchronization under deletions by embedding multi-deletion correction codes into a baseline interactive protocol and introducing adaptive segmenting. The proposed three-module scheme—Matching, Deletion Recovery, and Error Correction—leverages segment-length tuning and generalized upper bounds to lower communication cost while maintaining polynomial-time feasibility. The authors provide a rigorous upper bound on transmitted bits and validate improvements experimentally, showing notable reductions in redundancy compared to the baseline. The approach enables low-redundancy synchronization in environments with deletions, with potential extensions to broader edit models and more efficient multi-deletion codes.

Abstract

Data synchronization is a fundamental problem with applications in diverse fields such as cloud storage, genomics, and distributed systems. This paper addresses the challenge of synchronizing two files, one of which is a subsequence of the other and related through a constant rate of deletions, using an improved communication protocol. Building upon prior work, we integrate advanced multi-deletion correction codes into an existing baseline protocol, which previously relied on single-deletion correction. Our proposed protocol reduces communication cost by leveraging more general partitioning techniques as well as multi-deletion error correction. We derive a generalized upper bound on the expected number of transmitted bits, applicable to a broad class of deletion correction codes. Experimental results demonstrate that our approach outperforms the baseline in communication cost. These findings establish the efficacy of the improved protocol in achieving low-redundancy synchronization in scenarios where deletion errors occur.

Improved Interactive Protocol for Synchronizing From Deletions

TL;DR

This work advances data synchronization under deletions by embedding multi-deletion correction codes into a baseline interactive protocol and introducing adaptive segmenting. The proposed three-module scheme—Matching, Deletion Recovery, and Error Correction—leverages segment-length tuning and generalized upper bounds to lower communication cost while maintaining polynomial-time feasibility. The authors provide a rigorous upper bound on transmitted bits and validate improvements experimentally, showing notable reductions in redundancy compared to the baseline. The approach enables low-redundancy synchronization in environments with deletions, with potential extensions to broader edit models and more efficient multi-deletion codes.

Abstract

Data synchronization is a fundamental problem with applications in diverse fields such as cloud storage, genomics, and distributed systems. This paper addresses the challenge of synchronizing two files, one of which is a subsequence of the other and related through a constant rate of deletions, using an improved communication protocol. Building upon prior work, we integrate advanced multi-deletion correction codes into an existing baseline protocol, which previously relied on single-deletion correction. Our proposed protocol reduces communication cost by leveraging more general partitioning techniques as well as multi-deletion error correction. We derive a generalized upper bound on the expected number of transmitted bits, applicable to a broad class of deletion correction codes. Experimental results demonstrate that our approach outperforms the baseline in communication cost. These findings establish the efficacy of the improved protocol in achieving low-redundancy synchronization in scenarios where deletion errors occur.

Paper Structure

This paper contains 17 sections, 15 theorems, 60 equations, 3 figures.

Key Result

Theorem 1

Let $s > 0$ be the segment length multiplier. Let $c>0$ be the delimiter length coefficient in the Deletion Recovery Module. Let $w$ be the maximum number of deletions that we can correct with a set of multi-deletion correction codes. Let $a_i \geq 1, i \in [1, \dots, w]$, be the efficiency of the $

Figures (3)

  • Figure 1: Graph of the redundancy coefficient $r$ as a function of $s$ for various values of $w$ with fixed parameters $a = 1$ and $c = 3$.
  • Figure 2: Number of bits transmitted for both the baseline and improved protocols across varying segment length coefficients, where the segment length is scaled by the coefficient. Each subfigure represents a different deletion rate.
  • Figure 3: Number of bits transmitted for the baseline and improved protocols at different deletion rates, with sequence length $n=50,000$ and segment length $L_S = \frac{2}{\beta}$.

Theorems & Definitions (18)

  • Theorem 1
  • Theorem 2: from Lemma 1, ori
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 8 more