Table of Contents
Fetching ...

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

Lev Yohananov, Moshe Schwartz

TL;DR

The work determines the asymptotic coding capacity for duplication-correcting codes under reverse-complement and palindromic duplications. It shows a dichotomy: capacity vanishes for duplication length $k\ge 2$, while for $k=1$ the capacity is alphabet-dependent and can be achieved by explicit optimal constructions (e.g., $R_q(*)^{\mathrm{rc}}_1=\log_q(q-2)$ for even $q\ge 4$ and $R_q(*)^{\mathrm{pal}}_1=\log_q(q-1)$). The analysis hinges on structural characterizations such as common-descendant criteria, signatures for $k=1$, and $k$-summaries for $k\ge 2$, yielding tight bounds and explicit code sizes. These results clarify fundamental limits of duplication errors in DNA storage-like channels and guide design of optimal $k=1$ codes, while highlighting open questions for finite error budgets ($t<\infty$) and multi-duplication corrections. Overall, the paper provides a complete picture of capacity behavior across duplication types and lengths, with practical implications for robust sequencestorage coding schemes.

Abstract

We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for duplication-length $1$, the coding capacity is $0$. When the duplication length is $1$, the coding capacity depends on the alphabet size, and we construct optimal codes.

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

TL;DR

The work determines the asymptotic coding capacity for duplication-correcting codes under reverse-complement and palindromic duplications. It shows a dichotomy: capacity vanishes for duplication length , while for the capacity is alphabet-dependent and can be achieved by explicit optimal constructions (e.g., for even and ). The analysis hinges on structural characterizations such as common-descendant criteria, signatures for , and -summaries for , yielding tight bounds and explicit code sizes. These results clarify fundamental limits of duplication errors in DNA storage-like channels and guide design of optimal codes, while highlighting open questions for finite error budgets () and multi-duplication corrections. Overall, the paper provides a complete picture of capacity behavior across duplication types and lengths, with practical implications for robust sequencestorage coding schemes.

Abstract

We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for duplication-length , the coding capacity is . When the duplication length is , the coding capacity depends on the alphabet size, and we construct optimal codes.
Paper Structure (8 sections, 16 theorems, 85 equations, 1 table)

This paper contains 8 sections, 16 theorems, 85 equations, 1 table.

Key Result

Lemma 1

Let $a\in\mathbb{Z}_2$ be a bit. Then for any $w\in\mathbb{Z}_2^*$.

Theorems & Definitions (39)

  • Example 1
  • Example 2
  • Definition 1
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Corollary 3
  • proof
  • Example 3
  • ...and 29 more