On Achievable Rates Over Noisy Nanopore Channels
V. Arvind Rameshwar, Nir Weinberger
TL;DR
This work analyzes the noisy nanopore channel (NNC), modeled as a cascade of a memoryful duplication stage and a memoryless noise stage driven by a $\tau$-mer Markov input, to characterize information rates and guide practical decoding. It derives a tight lower bound for the noiseless NNC capacity and computable lower/upper bounds for general noisy NNCs, highlighting when synchronization-like errors limit performance. In two practical regimes, the authors show that rates approaching the noise-free capacity are achievable: (i) for erasure noise with large $\tau$ via a no-self-loop de Bruijn Markov input and a simple decoder, yielding $\lim_{\tau\to\infty} C^{(\tau)}(W_{\text{nn,EC}})=1$; and (ii) at high sampling rates, a change-point detection-based decoder can achieve rates near $C_\tau^{\text{no-noise,no-loop}}$, which itself tends to 1 as $\tau$ grows. These results offer practically relevant strategies for decoding nanopore data with structured memory and random duplications, and they point to avenues for sharpening bounds and extending the approach to broader noise models.
Abstract
In this paper, we consider a recent channel model of a nanopore sequencer proposed by McBain, Viterbo, and Saunderson (2024), termed the noisy nanopore channel (NNC). In essence, an NNC is a duplication channel with structured, Markov inputs, that is corrupted by memoryless noise. We first discuss a (tight) lower bound on the capacity of the NNC in the absence of random noise. Next, we present lower and upper bounds on the channel capacity of general noisy nanopore channels. We then consider two interesting regimes of operation of an NNC: first, where the memory of the input process is large and the random noise introduces erasures, and second, where the rate of measurements of the electric current (also called the sampling rate) is high. For these regimes, we show that it is possible to achieve information rates close to the noise-free capacity, using low-complexity encoding and decoding schemes. In particular, our decoder for the regime of high sampling rates makes use of a change-point detection procedure -- a subroutine of immediate relevance for practitioners.
