Table of Contents
Fetching ...

On Achievable Rates Over Noisy Nanopore Channels

V. Arvind Rameshwar, Nir Weinberger

TL;DR

This work analyzes the noisy nanopore channel (NNC), modeled as a cascade of a memoryful duplication stage and a memoryless noise stage driven by a $\tau$-mer Markov input, to characterize information rates and guide practical decoding. It derives a tight lower bound for the noiseless NNC capacity and computable lower/upper bounds for general noisy NNCs, highlighting when synchronization-like errors limit performance. In two practical regimes, the authors show that rates approaching the noise-free capacity are achievable: (i) for erasure noise with large $\tau$ via a no-self-loop de Bruijn Markov input and a simple decoder, yielding $\lim_{\tau\to\infty} C^{(\tau)}(W_{\text{nn,EC}})=1$; and (ii) at high sampling rates, a change-point detection-based decoder can achieve rates near $C_\tau^{\text{no-noise,no-loop}}$, which itself tends to 1 as $\tau$ grows. These results offer practically relevant strategies for decoding nanopore data with structured memory and random duplications, and they point to avenues for sharpening bounds and extending the approach to broader noise models.

Abstract

In this paper, we consider a recent channel model of a nanopore sequencer proposed by McBain, Viterbo, and Saunderson (2024), termed the noisy nanopore channel (NNC). In essence, an NNC is a duplication channel with structured, Markov inputs, that is corrupted by memoryless noise. We first discuss a (tight) lower bound on the capacity of the NNC in the absence of random noise. Next, we present lower and upper bounds on the channel capacity of general noisy nanopore channels. We then consider two interesting regimes of operation of an NNC: first, where the memory of the input process is large and the random noise introduces erasures, and second, where the rate of measurements of the electric current (also called the sampling rate) is high. For these regimes, we show that it is possible to achieve information rates close to the noise-free capacity, using low-complexity encoding and decoding schemes. In particular, our decoder for the regime of high sampling rates makes use of a change-point detection procedure -- a subroutine of immediate relevance for practitioners.

On Achievable Rates Over Noisy Nanopore Channels

TL;DR

This work analyzes the noisy nanopore channel (NNC), modeled as a cascade of a memoryful duplication stage and a memoryless noise stage driven by a -mer Markov input, to characterize information rates and guide practical decoding. It derives a tight lower bound for the noiseless NNC capacity and computable lower/upper bounds for general noisy NNCs, highlighting when synchronization-like errors limit performance. In two practical regimes, the authors show that rates approaching the noise-free capacity are achievable: (i) for erasure noise with large via a no-self-loop de Bruijn Markov input and a simple decoder, yielding ; and (ii) at high sampling rates, a change-point detection-based decoder can achieve rates near , which itself tends to 1 as grows. These results offer practically relevant strategies for decoding nanopore data with structured memory and random duplications, and they point to avenues for sharpening bounds and extending the approach to broader noise models.

Abstract

In this paper, we consider a recent channel model of a nanopore sequencer proposed by McBain, Viterbo, and Saunderson (2024), termed the noisy nanopore channel (NNC). In essence, an NNC is a duplication channel with structured, Markov inputs, that is corrupted by memoryless noise. We first discuss a (tight) lower bound on the capacity of the NNC in the absence of random noise. Next, we present lower and upper bounds on the channel capacity of general noisy nanopore channels. We then consider two interesting regimes of operation of an NNC: first, where the memory of the input process is large and the random noise introduces erasures, and second, where the rate of measurements of the electric current (also called the sampling rate) is high. For these regimes, we show that it is possible to achieve information rates close to the noise-free capacity, using low-complexity encoding and decoding schemes. In particular, our decoder for the regime of high sampling rates makes use of a change-point detection procedure -- a subroutine of immediate relevance for practitioners.
Paper Structure (14 sections, 18 theorems, 69 equations, 2 figures, 1 algorithm)

This paper contains 14 sections, 18 theorems, 69 equations, 2 figures, 1 algorithm.

Key Result

Theorem 2.1

The ergodic-capacity $C(W_\text{\normalfont nn})$ is given by where the supremum is over all stationary and ergodic transition kernels $P_{S|S^-}$ of the de Bruijn Markov process $S^m$.

Figures (2)

  • Figure 1: The noisy nanopore channel $W_\text{nn}$
  • Figure 2: (a) Our lower bound for $C(W_\text{nn,EC})$, for an i.i.d. duplication channel with parameter $p = 0.999$; (b) Our upper bound for $C(W_\text{nn,EC})$, for an i.i.d. duplication channel with parameter $p = 0.3$. In both cases, we use $|\mathcal{X}| = 3$ and $\tau = 2$.

Theorems & Definitions (38)

  • Theorem 2.1
  • Remark
  • Theorem 3.1
  • Remark
  • Corollary 3.1
  • proof
  • Example 3.1: Elementary i.i.d. duplication channel
  • Example 3.2: Binomial duplication channel
  • Lemma 3.1
  • proof
  • ...and 28 more