On noisy duplication channels with Markov sources
Brendon McBain, James Saunderson, Emanuele Viterbo
TL;DR
This work analyzes channels with noisy duplications motivated by nanopore sequencing and proves an asymptotic equipartition property (AEP) for outputs and joint input-output processes when inputs are ergodic Markov sources, yielding information stability and that the Markov-constrained capacity equals $C_{\mathsf{Markov}} = \sup_{P \in \mathcal{P}} I(\mathbb{S}; \mathbb{Y}^{\mathbb{T}})$. It further relates the AEP for noisy duplications to hidden semi-Markov processes (HSMPs) via embedding arguments and SMB, establishing a bridge between random-length outputs and fixed-length analyses. The paper provides Monte Carlo-based lower bounds for the binary symmetric channel with Bernoulli and geometric duplications and discusses how these bounds connect to sticky-channel capacities. By linking randomly indexed entropy rates, SMP embeddings, and HSMPs, the work lays a theoretical foundation for capacity estimation and coding strategies in nanopore-inspired DNA storage systems. Overall, it identifies open challenges in constructing capacity-achieving Markov codes and emphasizes the practical significance for efficient data storage with noisy duplication channels.
Abstract
Channels with noisy duplications have recently been used to model the nanopore sequencer. This paper extends some foundational information-theoretic results to this new scenario. We prove the asymptotic equipartition property (AEP) for noisy duplication processes based on ergodic Markov processes. A consequence is that the noisy duplication channel is information stable for ergodic Markov sources, and therefore the channel capacity constrained to Markov sources is the Markov-constrained Shannon capacity. We use the AEP to estimate lower bounds on the capacity of the binary symmetric channel with Bernoulli and geometric duplications using Monte Carlo simulations. In addition, we relate the AEP for noisy duplication processes to the AEP for hidden semi-Markov processes.
