Table of Contents
Fetching ...

Bounds and Constructions of Codes for Ordered Composite DNA Sequences

Zuo Ye, Yuling Li, Zhaojun Lan, Gennian Ge

TL;DR

This paper establishes equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs) and derives novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods.

Abstract

This paper extends the foundational work of Dollma \emph{et al}. on codes for ordered composite DNA sequences. We consider the general setting with an alphabet of size $q$ and a resolution parameter $k$, moving beyond the binary ($q=2$) case primarily studied previously. We investigate error-correcting codes for substitution errors and deletion errors under several channel models, including $(e_1,\ldots,e_k)$-composite error/deletion, $e$-composite error/deletion, and the newly introduced $t$-$(e_1,\ldots,e_t)$-composite error/deletion model. We first establish equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs). This significantly reduces the number of distinct error-parameter sets that require separate analysis. We then derive novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods. These bounds together cover all values of parameters $q$, $k$, $(e_1,\ldots,e_k)$ and $e$. In contrast, previous bounds were only established for $q=2$ and limited choices of $k$, $(e_1,\ldots,e_k)$ and $e$. For CDCCs, we generalize a known non-asymptotic upper bound for $(1,0,\ldots,0)$-CDCCs and then provide a cleaner asymptotic bound. On the constructive side, for any $q\ge2$, we propose $(1,0,\ldots,0)$-CDCCs, $1$-CDCCs and $t$-$(1,\ldots,1)$-CDCCs with near-optimal redundancies. These codes have efficient and systematic encoders. For substitution errors, we design the first explicit encoding and decoding algorithms for the binary $(1,0,\ldots,0)$-CECC constructed by Dollma \emph{et al}, and extend the approach to general $q$. Furthermore, we give an improved construction of binary $1$-CECCs, a construction of nonbinary $1$-CECCs, and a construction of $t$-$(1,\ldots,1)$-CECCs. These constructions are also systematic.

Bounds and Constructions of Codes for Ordered Composite DNA Sequences

TL;DR

This paper establishes equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs) and derives novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods.

Abstract

This paper extends the foundational work of Dollma \emph{et al}. on codes for ordered composite DNA sequences. We consider the general setting with an alphabet of size and a resolution parameter , moving beyond the binary () case primarily studied previously. We investigate error-correcting codes for substitution errors and deletion errors under several channel models, including -composite error/deletion, -composite error/deletion, and the newly introduced --composite error/deletion model. We first establish equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs). This significantly reduces the number of distinct error-parameter sets that require separate analysis. We then derive novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods. These bounds together cover all values of parameters , , and . In contrast, previous bounds were only established for and limited choices of , and . For CDCCs, we generalize a known non-asymptotic upper bound for -CDCCs and then provide a cleaner asymptotic bound. On the constructive side, for any , we propose -CDCCs, -CDCCs and --CDCCs with near-optimal redundancies. These codes have efficient and systematic encoders. For substitution errors, we design the first explicit encoding and decoding algorithms for the binary -CECC constructed by Dollma \emph{et al}, and extend the approach to general . Furthermore, we give an improved construction of binary -CECCs, a construction of nonbinary -CECCs, and a construction of --CECCs. These constructions are also systematic.
Paper Structure (24 sections, 41 theorems, 84 equations)

This paper contains 24 sections, 41 theorems, 84 equations.

Key Result

Proposition 3.1

For any $\left( e_1,e_2,\ldots,e_{k} \right)\in\mathbb{N}^k$, the families $\mathscr{C}_{q,k}^S\left( n;e_1,e_2,\ldots,e_{k} \right)$ and $\mathscr{C}_{q,k}^S\left( n;e_{k},e_{k-1},\ldots,e_1 \right)$ are equivalent.

Theorems & Definitions (55)

  • Example 2.1
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • Corollary 3.1
  • Remark 4.1
  • ...and 45 more