Table of Contents
Fetching ...

Channels with Input-Correlated Synchronization Errors

Roni Con, João Ribeiro

TL;DR

This work develops a general theory for channels with input-correlated synchronization errors, introducing the notion of admissible channels and proving that their information capacity is achieved by stationary ergodic inputs and equals the coding capacity, i.e., $\mathsf{ICap}(Z)=\mathsf{SCap}(Z)=\lim_{m\to\infty}\mathsf{SCap}^{(m)}(Z)=\mathsf{CCap}(Z)$. The authors provide a comprehensive set of capacity theorems, including existence of dense capacity-achieving codes, and show these results extend to multi-trace channels and to runlength-dependent deletion models relevant to DNA-based data storage. Leveraging the Perrnice–Li–Wootters and Brakensiek–Li–Spang frameworks, they construct efficient capacity-achieving codes for single- and multi-trace runlength-dependent deletions, with linear-time encoding and near-linear to quadratic-time decoding, and establish practical lower bounds for threshold deletion channels. The work unifies and extends prior results (e.g., MDK18) by permitting input-dependent error distributions and offering explicit code constructions that approach capacity under realistic error correlations, with potential impact on DNA storage and other synchronization-error-prone systems.

Abstract

"Independent and identically distributed" errors do not accurately capture the noisy behavior of real-world data storage and information transmission technologies. Motivated by this, we study channels with input-correlated synchronization errors, meaning that the distribution of synchronization errors (such as deletions and insertions) applied to the $i$-th input $x_i$ may depend on the whole input string $x$. We begin by identifying conditions on the input-correlated synchronization channel under which the channel's information capacity is achieved by a stationary ergodic input source and is equal to its coding capacity. These conditions capture a wide class of channels, including channels with correlated errors observed in DNA-based data storage systems and their multi-trace versions, and generalize prior work. To showcase the usefulness of the general capacity theorem above, we combine it with techniques of Pernice-Li-Wootters (ISIT 2022) and Brakensiek-Li-Spang (FOCS 2020) to obtain explicit capacity-achieving codes for multi-trace channels with runlength-dependent deletions, motivated by error patterns observed in DNA-based data storage systems.

Channels with Input-Correlated Synchronization Errors

TL;DR

This work develops a general theory for channels with input-correlated synchronization errors, introducing the notion of admissible channels and proving that their information capacity is achieved by stationary ergodic inputs and equals the coding capacity, i.e., . The authors provide a comprehensive set of capacity theorems, including existence of dense capacity-achieving codes, and show these results extend to multi-trace channels and to runlength-dependent deletion models relevant to DNA-based data storage. Leveraging the Perrnice–Li–Wootters and Brakensiek–Li–Spang frameworks, they construct efficient capacity-achieving codes for single- and multi-trace runlength-dependent deletions, with linear-time encoding and near-linear to quadratic-time decoding, and establish practical lower bounds for threshold deletion channels. The work unifies and extends prior results (e.g., MDK18) by permitting input-dependent error distributions and offering explicit code constructions that approach capacity under realistic error correlations, with potential impact on DNA storage and other synchronization-error-prone systems.

Abstract

"Independent and identically distributed" errors do not accurately capture the noisy behavior of real-world data storage and information transmission technologies. Motivated by this, we study channels with input-correlated synchronization errors, meaning that the distribution of synchronization errors (such as deletions and insertions) applied to the -th input may depend on the whole input string . We begin by identifying conditions on the input-correlated synchronization channel under which the channel's information capacity is achieved by a stationary ergodic input source and is equal to its coding capacity. These conditions capture a wide class of channels, including channels with correlated errors observed in DNA-based data storage systems and their multi-trace versions, and generalize prior work. To showcase the usefulness of the general capacity theorem above, we combine it with techniques of Pernice-Li-Wootters (ISIT 2022) and Brakensiek-Li-Spang (FOCS 2020) to obtain explicit capacity-achieving codes for multi-trace channels with runlength-dependent deletions, motivated by error patterns observed in DNA-based data storage systems.

Paper Structure

This paper contains 53 sections, 45 theorems, 105 equations, 2 figures, 1 table.

Key Result

Theorem 1

Let $Z$ be an admissible channel (see sec:admissible). Then, its information capacity equals its coding capacity, and the information capacity is achieved by stationary ergodic sources.

Figures (2)

  • Figure 1: Lower bounds on the capacity of the $\textup{BDC-Thr}(\tau,d)$ for $\tau=2$.
  • Figure 2: Lower bounds on the capacity of the $\textup{BDC-Thr}(\tau,d)$ for $\tau=3$.

Theorems & Definitions (118)

  • Theorem 1: Informal, see \ref{['thm:cap-gen']} for a formal statement
  • Theorem 2: Efficient capacity-achieving single-trace codes, informal. See \ref{['thm:efficient-bounded-rl']} for a formal statement
  • Theorem 3: Efficient capacity-achieving multi-trace codes, informal. See \ref{['thm:efficient-bounded-multi-trace-rl']} for a formal statement
  • Definition 1: Entropy rate
  • Definition 2: Information rate
  • Definition 3: Information capacity
  • Definition 4: Block-independent process
  • Definition 5: Stationary ergodic process
  • Definition 6: Stationary capacity
  • Definition 7: $m$-th order Markov capacity
  • ...and 108 more