On Duplication-Free Codes for Disjoint or Equal-Length Errors
Wenjun Yu, Moshe Schwartz
TL;DR
This work addresses robust coding for tandem-duplication errors in DNA storage by introducing two parametric, duplication-free code constructions that handle disjoint duplications and equal-length duplications, and a combined setting. It develops two main approaches: (i) a disjoint-duplication construction using $F=L\cup L^\Delta$ to ensure non-confusable codewords, and (ii) an equal-length-duplication construction using $F=L$ together with a $\phi_\ell$ transform to enable de-duplication-based decoding. Both yield positive asymptotic rates and subsume prior results as special cases, with detailed decoding strategies and analysis of confusability via mid-cover arguments. The results extend the theory of duplication-correcting codes and offer practical, parametric strategies for DNA storage scenarios with complex mutation patterns, including a combined model for multiple error types. Open questions include the characterization of maximal asymptotic rates given $L$ and the development of efficient decoders for the disjoint-only construction.
Abstract
Motivated by applications in DNA storage, we study a setting in which strings are affected by tandem-duplication errors. In particular, we look at two settings: disjoint tandem-duplication errors, and equal-length tandem-duplication errors. We construct codes, with positive asymptotic rate, for the two settings, as well as for their combination. Our constructions are duplication-free codes, comprising codewords that do not contain tandem duplications of specific lengths. Additionally, our codes generalize previous constructions, containing them as special cases.
