On Achievable Rates Over Noisy Nanopore Channels

V. Arvind Rameshwar; Nir Weinberger

On Achievable Rates Over Noisy Nanopore Channels

V. Arvind Rameshwar, Nir Weinberger

TL;DR

This work analyzes the noisy nanopore channel (NNC), modeled as a cascade of a memoryful duplication stage and a memoryless noise stage driven by a $\tau$-mer Markov input, to characterize information rates and guide practical decoding. It derives a tight lower bound for the noiseless NNC capacity and computable lower/upper bounds for general noisy NNCs, highlighting when synchronization-like errors limit performance. In two practical regimes, the authors show that rates approaching the noise-free capacity are achievable: (i) for erasure noise with large $\tau$ via a no-self-loop de Bruijn Markov input and a simple decoder, yielding $\lim_{\tau\to\infty} C^{(\tau)}(W_{\text{nn,EC}})=1$; and (ii) at high sampling rates, a change-point detection-based decoder can achieve rates near $C_\tau^{\text{no-noise,no-loop}}$, which itself tends to 1 as $\tau$ grows. These results offer practically relevant strategies for decoding nanopore data with structured memory and random duplications, and they point to avenues for sharpening bounds and extending the approach to broader noise models.

Abstract

In this paper, we consider a recent channel model of a nanopore sequencer proposed by McBain, Viterbo, and Saunderson (2024), termed the noisy nanopore channel (NNC). In essence, an NNC is a duplication channel with structured, Markov inputs, that is corrupted by memoryless noise. We first discuss a (tight) lower bound on the capacity of the NNC in the absence of random noise. Next, we present lower and upper bounds on the channel capacity of general noisy nanopore channels. We then consider two interesting regimes of operation of an NNC: first, where the memory of the input process is large and the random noise introduces erasures, and second, where the rate of measurements of the electric current (also called the sampling rate) is high. For these regimes, we show that it is possible to achieve information rates close to the noise-free capacity, using low-complexity encoding and decoding schemes. In particular, our decoder for the regime of high sampling rates makes use of a change-point detection procedure -- a subroutine of immediate relevance for practitioners.

On Achievable Rates Over Noisy Nanopore Channels

TL;DR

This work analyzes the noisy nanopore channel (NNC), modeled as a cascade of a memoryful duplication stage and a memoryless noise stage driven by a

-mer Markov input, to characterize information rates and guide practical decoding. It derives a tight lower bound for the noiseless NNC capacity and computable lower/upper bounds for general noisy NNCs, highlighting when synchronization-like errors limit performance. In two practical regimes, the authors show that rates approaching the noise-free capacity are achievable: (i) for erasure noise with large

via a no-self-loop de Bruijn Markov input and a simple decoder, yielding

; and (ii) at high sampling rates, a change-point detection-based decoder can achieve rates near

, which itself tends to 1 as

grows. These results offer practically relevant strategies for decoding nanopore data with structured memory and random duplications, and they point to avenues for sharpening bounds and extending the approach to broader noise models.

Abstract

Paper Structure (14 sections, 18 theorems, 69 equations, 2 figures, 1 algorithm)

This paper contains 14 sections, 18 theorems, 69 equations, 2 figures, 1 algorithm.

Introduction
Notation and Preliminaries
Notation
Channel Model
Channel Capacity
Organization of the Paper
Capacity of the Noiseless Nanopore Channel
General Bounds on the Capacity of the Noisy Nanopore Channel
Achievable Rates Over NNCs with Erasure Noise For Long $\tau$-mer Lengths
Properties of de Bruijn Markov Processes With No Self-Loops
Proof of Theorem \ref{['thm:erasuremain']}
A Change-Point Detection-Based Decoder for High Sampling Rates
Conclusion and Future Work
Proof of Lemma \ref{['lem:eig']}

Key Result

Theorem 2.1

The ergodic-capacity $C(W_\text{\normalfont nn})$ is given by where the supremum is over all stationary and ergodic transition kernels $P_{S|S^-}$ of the de Bruijn Markov process $S^m$.

Figures (2)

Figure 1: The noisy nanopore channel $W_\text{nn}$
Figure 2: (a) Our lower bound for $C(W_\text{nn,EC})$, for an i.i.d. duplication channel with parameter $p = 0.999$; (b) Our upper bound for $C(W_\text{nn,EC})$, for an i.i.d. duplication channel with parameter $p = 0.3$. In both cases, we use $|\mathcal{X}| = 3$ and $\tau = 2$.

Theorems & Definitions (38)

Theorem 2.1
Remark
Theorem 3.1
Remark
Corollary 3.1
proof
Example 3.1: Elementary i.i.d. duplication channel
Example 3.2: Binomial duplication channel
Lemma 3.1
proof
...and 28 more

On Achievable Rates Over Noisy Nanopore Channels

TL;DR

Abstract

On Achievable Rates Over Noisy Nanopore Channels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (38)