Table of Contents
Fetching ...

An Instance-Based Approach to the Trace Reconstruction Problem

Kayvon Mazooji, Ilan Shomorony

TL;DR

This paper defines the "Levenshtein difficulty" of a problem instance (s,T) as the probability that the resulting traces do not provide enough information for correct recovery with full certainty, and describes the scaling of T for which the Levenshtein difficulty goes to zero.

Abstract

In the trace reconstruction problem, one observes the output of passing a binary string $s \in \{0,1\}^n$ through a deletion channel $T$ times and wishes to recover $s$ from the resulting $T$ "traces." Most of the literature has focused on characterizing the hardness of this problem in terms of the number of traces $T$ needed for perfect reconstruction either in the worst case or in the average case (over input sequences $s$). In this paper, we propose an alternative, instance-based approach to the problem. We define the "Levenshtein difficulty" of a problem instance $(s,T)$ as the probability that the resulting traces do not provide enough information for correct recovery with full certainty. One can then try to characterize, for a specific $s$, how $T$ needs to scale in order for the Levenshtein difficulty to go to zero, and seek reconstruction algorithms that match this scaling for each $s$. We derive a lower bound on the Levenshtein difficulty, and prove that $T$ needs to scale exponentially fast in $n$ for the Levenshtein difficulty to approach zero for a very broad class of strings. For a class of binary strings with alternating long runs, we design an algorithm whose probability of reconstruction error approaches zero whenever the Levenshtein difficulty approaches zero. For this class, we also prove that the error probability of this algorithm decays to zero at least as fast as the Levenshtein difficulty.

An Instance-Based Approach to the Trace Reconstruction Problem

TL;DR

This paper defines the "Levenshtein difficulty" of a problem instance (s,T) as the probability that the resulting traces do not provide enough information for correct recovery with full certainty, and describes the scaling of T for which the Levenshtein difficulty goes to zero.

Abstract

In the trace reconstruction problem, one observes the output of passing a binary string through a deletion channel times and wishes to recover from the resulting "traces." Most of the literature has focused on characterizing the hardness of this problem in terms of the number of traces needed for perfect reconstruction either in the worst case or in the average case (over input sequences ). In this paper, we propose an alternative, instance-based approach to the problem. We define the "Levenshtein difficulty" of a problem instance as the probability that the resulting traces do not provide enough information for correct recovery with full certainty. One can then try to characterize, for a specific , how needs to scale in order for the Levenshtein difficulty to go to zero, and seek reconstruction algorithms that match this scaling for each . We derive a lower bound on the Levenshtein difficulty, and prove that needs to scale exponentially fast in for the Levenshtein difficulty to approach zero for a very broad class of strings. For a class of binary strings with alternating long runs, we design an algorithm whose probability of reconstruction error approaches zero whenever the Levenshtein difficulty approaches zero. For this class, we also prove that the error probability of this algorithm decays to zero at least as fast as the Levenshtein difficulty.
Paper Structure (12 sections, 7 theorems, 29 equations, 1 algorithm)

This paper contains 12 sections, 7 theorems, 29 equations, 1 algorithm.

Key Result

Theorem 1

Let $c^* = \ell \log(\frac{1}{1-p^r})$. For a sequence of strings $\{s_n\} \in {\cal Q}(r,\ell n)$ where $r, \ell$ are constants such that $r \geq 1$ and $0 < \ell \leq 1$, the instance difficulty satisfies, as $n\to \infty$,

Theorems & Definitions (7)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4