On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

Lev Yohananov; Moshe Schwartz

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

Lev Yohananov, Moshe Schwartz

TL;DR

The work determines the asymptotic coding capacity for duplication-correcting codes under reverse-complement and palindromic duplications. It shows a dichotomy: capacity vanishes for duplication length $k\ge 2$, while for $k=1$ the capacity is alphabet-dependent and can be achieved by explicit optimal constructions (e.g., $R_q(*)^{\mathrm{rc}}_1=\log_q(q-2)$ for even $q\ge 4$ and $R_q(*)^{\mathrm{pal}}_1=\log_q(q-1)$). The analysis hinges on structural characterizations such as common-descendant criteria, signatures for $k=1$, and $k$-summaries for $k\ge 2$, yielding tight bounds and explicit code sizes. These results clarify fundamental limits of duplication errors in DNA storage-like channels and guide design of optimal $k=1$ codes, while highlighting open questions for finite error budgets ($t<\infty$) and multi-duplication corrections. Overall, the paper provides a complete picture of capacity behavior across duplication types and lengths, with practical implications for robust sequencestorage coding schemes.

Abstract

We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for duplication-length $1$, the coding capacity is $0$. When the duplication length is $1$, the coding capacity depends on the alphabet size, and we construct optimal codes.

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

TL;DR

, while for

the capacity is alphabet-dependent and can be achieved by explicit optimal constructions (e.g.,

for even

and

). The analysis hinges on structural characterizations such as common-descendant criteria, signatures for

, and

-summaries for

, yielding tight bounds and explicit code sizes. These results clarify fundamental limits of duplication errors in DNA storage-like channels and guide design of optimal

codes, while highlighting open questions for finite error budgets (

) and multi-duplication corrections. Overall, the paper provides a complete picture of capacity behavior across duplication types and lengths, with practical implications for robust sequencestorage coding schemes.

Abstract

, the coding capacity is

. When the duplication length is

, the coding capacity depends on the alphabet size, and we construct optimal codes.

Paper Structure (8 sections, 16 theorems, 85 equations, 1 table)

This paper contains 8 sections, 16 theorems, 85 equations, 1 table.

Introduction
Preliminaries
Reverse-Complement Duplication of Length $k=1$
The Binary Case, $q=2$
The Non-binary Case, $q\geqslant 4$
Reverse-Complement Duplication of Length $k\geqslant 2$
Palindromic Duplication
Conclusion

Key Result

Lemma 1

Let $a\in\mathbb{Z}_2$ be a bit. Then for any $w\in\mathbb{Z}_2^*$.

Theorems & Definitions (39)

Example 1
Example 2
Definition 1
Lemma 1
proof
Theorem 2
proof
Corollary 3
proof
Example 3
...and 29 more

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

TL;DR

Abstract

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (39)