Channels with Input-Correlated Synchronization Errors
Roni Con, João Ribeiro
TL;DR
This work develops a general theory for channels with input-correlated synchronization errors, introducing the notion of admissible channels and proving that their information capacity is achieved by stationary ergodic inputs and equals the coding capacity, i.e., $\mathsf{ICap}(Z)=\mathsf{SCap}(Z)=\lim_{m\to\infty}\mathsf{SCap}^{(m)}(Z)=\mathsf{CCap}(Z)$. The authors provide a comprehensive set of capacity theorems, including existence of dense capacity-achieving codes, and show these results extend to multi-trace channels and to runlength-dependent deletion models relevant to DNA-based data storage. Leveraging the Perrnice–Li–Wootters and Brakensiek–Li–Spang frameworks, they construct efficient capacity-achieving codes for single- and multi-trace runlength-dependent deletions, with linear-time encoding and near-linear to quadratic-time decoding, and establish practical lower bounds for threshold deletion channels. The work unifies and extends prior results (e.g., MDK18) by permitting input-dependent error distributions and offering explicit code constructions that approach capacity under realistic error correlations, with potential impact on DNA storage and other synchronization-error-prone systems.
Abstract
"Independent and identically distributed" errors do not accurately capture the noisy behavior of real-world data storage and information transmission technologies. Motivated by this, we study channels with input-correlated synchronization errors, meaning that the distribution of synchronization errors (such as deletions and insertions) applied to the $i$-th input $x_i$ may depend on the whole input string $x$. We begin by identifying conditions on the input-correlated synchronization channel under which the channel's information capacity is achieved by a stationary ergodic input source and is equal to its coding capacity. These conditions capture a wide class of channels, including channels with correlated errors observed in DNA-based data storage systems and their multi-trace versions, and generalize prior work. To showcase the usefulness of the general capacity theorem above, we combine it with techniques of Pernice-Li-Wootters (ISIT 2022) and Brakensiek-Li-Spang (FOCS 2020) to obtain explicit capacity-achieving codes for multi-trace channels with runlength-dependent deletions, motivated by error patterns observed in DNA-based data storage systems.
