Table of Contents
Fetching ...

On the Capacity of Insertion Channels for Small Insertion Probabilities

Busra Tegin, Tolga M Duman

TL;DR

This work analyzes the binary insertion channel in the regime of small insertion probability $\alpha$, deriving the capacity expansion $C(\alpha) = 1 + \alpha \log(\alpha) + G_1 \alpha + \mathcal{O}(\alpha^{3/2-\epsilon})$ with $G_1 \approx 0.4901$. The authors decompose the rate via a detailed entropy-based decomposition and compute the leading terms using a run-length framework, a modified insertion process, and a perturbed process to bound ambiguities. Achievability is established using i.i.d. Bernoulli$(1/2)$ inputs, while the converse leverages stationary ergodic inputs and run-length truncation to show the first two terms are tight. The results yield a highly accurate capacity approximation for small $\alpha$, with potential extensions to nonbinary alphabets and related synchronization-error channels, relevant to DNA storage and data reconstruction.

Abstract

Channels with synchronization errors, such as deletion and insertion errors, are crucial in DNA storage, data reconstruction, and other applications. These errors introduce memory to the channel, complicating its capacity analysis. This paper analyzes binary insertion channels for small insertion probabilities, identifying dominant terms in the capacity expansion and establishing capacity in this regime. Using Bernoulli(1/2) inputs for achievability and a converse based on the use of stationary and ergodic processes, we demonstrate that capacity closely aligns with achievable rates using independent and identically distributed (i.i.d.) inputs, differing only in higher-order terms.

On the Capacity of Insertion Channels for Small Insertion Probabilities

TL;DR

This work analyzes the binary insertion channel in the regime of small insertion probability , deriving the capacity expansion with . The authors decompose the rate via a detailed entropy-based decomposition and compute the leading terms using a run-length framework, a modified insertion process, and a perturbed process to bound ambiguities. Achievability is established using i.i.d. Bernoulli inputs, while the converse leverages stationary ergodic inputs and run-length truncation to show the first two terms are tight. The results yield a highly accurate capacity approximation for small , with potential extensions to nonbinary alphabets and related synchronization-error channels, relevant to DNA storage and data reconstruction.

Abstract

Channels with synchronization errors, such as deletion and insertion errors, are crucial in DNA storage, data reconstruction, and other applications. These errors introduce memory to the channel, complicating its capacity analysis. This paper analyzes binary insertion channels for small insertion probabilities, identifying dominant terms in the capacity expansion and establishing capacity in this regime. Using Bernoulli(1/2) inputs for achievability and a converse based on the use of stationary and ergodic processes, we demonstrate that capacity closely aligns with achievable rates using independent and identically distributed (i.i.d.) inputs, differing only in higher-order terms.

Paper Structure

This paper contains 11 sections, 13 theorems, 44 equations.

Key Result

Theorem 1

Let $C(\alpha)$ be the capacity of the insertion channel with insertion probability $\alpha$. Then, for small $\alpha$ and any $\epsilon > 0$, we have where which is approximated as $G_1 \approx 0.4901$Specifically, we express $G_1$ as $G_1 = \Tilde{G}_L + R_L,$ where $\Tilde{G}_L = -\log(e) + \frac{1}{2} \sum_{l=1}^{L} 2^{-l-1} l \log l + \frac{1}{2} \sum_{a=1}^{L} \sum_{b=1}^{L} (b+1) 2^{-a-b}

Theorems & Definitions (23)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Lemma 6
  • proof
  • Lemma 7
  • ...and 13 more