Table of Contents
Fetching ...

On Duplication-Free Codes for Disjoint or Equal-Length Errors

Wenjun Yu, Moshe Schwartz

TL;DR

This work addresses robust coding for tandem-duplication errors in DNA storage by introducing two parametric, duplication-free code constructions that handle disjoint duplications and equal-length duplications, and a combined setting. It develops two main approaches: (i) a disjoint-duplication construction using $F=L\cup L^\Delta$ to ensure non-confusable codewords, and (ii) an equal-length-duplication construction using $F=L$ together with a $\phi_\ell$ transform to enable de-duplication-based decoding. Both yield positive asymptotic rates and subsume prior results as special cases, with detailed decoding strategies and analysis of confusability via mid-cover arguments. The results extend the theory of duplication-correcting codes and offer practical, parametric strategies for DNA storage scenarios with complex mutation patterns, including a combined model for multiple error types. Open questions include the characterization of maximal asymptotic rates given $L$ and the development of efficient decoders for the disjoint-only construction.

Abstract

Motivated by applications in DNA storage, we study a setting in which strings are affected by tandem-duplication errors. In particular, we look at two settings: disjoint tandem-duplication errors, and equal-length tandem-duplication errors. We construct codes, with positive asymptotic rate, for the two settings, as well as for their combination. Our constructions are duplication-free codes, comprising codewords that do not contain tandem duplications of specific lengths. Additionally, our codes generalize previous constructions, containing them as special cases.

On Duplication-Free Codes for Disjoint or Equal-Length Errors

TL;DR

This work addresses robust coding for tandem-duplication errors in DNA storage by introducing two parametric, duplication-free code constructions that handle disjoint duplications and equal-length duplications, and a combined setting. It develops two main approaches: (i) a disjoint-duplication construction using to ensure non-confusable codewords, and (ii) an equal-length-duplication construction using together with a transform to enable de-duplication-based decoding. Both yield positive asymptotic rates and subsume prior results as special cases, with detailed decoding strategies and analysis of confusability via mid-cover arguments. The results extend the theory of duplication-correcting codes and offer practical, parametric strategies for DNA storage scenarios with complex mutation patterns, including a combined model for multiple error types. Open questions include the characterization of maximal asymptotic rates given and the development of efficient decoders for the disjoint-only construction.

Abstract

Motivated by applications in DNA storage, we study a setting in which strings are affected by tandem-duplication errors. In particular, we look at two settings: disjoint tandem-duplication errors, and equal-length tandem-duplication errors. We construct codes, with positive asymptotic rate, for the two settings, as well as for their combination. Our constructions are duplication-free codes, comprising codewords that do not contain tandem duplications of specific lengths. Additionally, our codes generalize previous constructions, containing them as special cases.
Paper Structure (6 sections, 7 theorems, 69 equations, 3 figures)

This paper contains 6 sections, 7 theorems, 69 equations, 3 figures.

Key Result

Theorem 1

Let $L\subseteq\mathbb{N}$ be a given set of duplication lengths, and set $F=L\cup L^\Delta$. Then the code $C_F$, of length $n\in\mathbb{N}$, can correct any number of disjoint tandem duplications with respect to $\mathcal{T}_L$. Namely, for all distinct $x,y\in C_F$ we have

Figures (3)

  • Figure 1: The decomposition of $z$ into factors in the proof of Lemma \ref{['lem:eqmidcover']}.
  • Figure 2: The decomposition of $z$ into factors in the proof of Lemma \ref{['lem:neqmidcover']}.
  • Figure 3: The two decomposition of \ref{['eq:twice']}.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Example 1
  • Theorem 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 11 more