On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes
Lev Yohananov, Moshe Schwartz
TL;DR
The work determines the asymptotic coding capacity for duplication-correcting codes under reverse-complement and palindromic duplications. It shows a dichotomy: capacity vanishes for duplication length $k\ge 2$, while for $k=1$ the capacity is alphabet-dependent and can be achieved by explicit optimal constructions (e.g., $R_q(*)^{\mathrm{rc}}_1=\log_q(q-2)$ for even $q\ge 4$ and $R_q(*)^{\mathrm{pal}}_1=\log_q(q-1)$). The analysis hinges on structural characterizations such as common-descendant criteria, signatures for $k=1$, and $k$-summaries for $k\ge 2$, yielding tight bounds and explicit code sizes. These results clarify fundamental limits of duplication errors in DNA storage-like channels and guide design of optimal $k=1$ codes, while highlighting open questions for finite error budgets ($t<\infty$) and multi-duplication corrections. Overall, the paper provides a complete picture of capacity behavior across duplication types and lengths, with practical implications for robust sequencestorage coding schemes.
Abstract
We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for duplication-length $1$, the coding capacity is $0$. When the duplication length is $1$, the coding capacity depends on the alphabet size, and we construct optimal codes.
