Bounds and Constructions of Codes for Ordered Composite DNA Sequences
Zuo Ye, Yuling Li, Zhaojun Lan, Gennian Ge
TL;DR
This paper establishes equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs) and derives novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods.
Abstract
This paper extends the foundational work of Dollma \emph{et al}. on codes for ordered composite DNA sequences. We consider the general setting with an alphabet of size $q$ and a resolution parameter $k$, moving beyond the binary ($q=2$) case primarily studied previously. We investigate error-correcting codes for substitution errors and deletion errors under several channel models, including $(e_1,\ldots,e_k)$-composite error/deletion, $e$-composite error/deletion, and the newly introduced $t$-$(e_1,\ldots,e_t)$-composite error/deletion model. We first establish equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs). This significantly reduces the number of distinct error-parameter sets that require separate analysis. We then derive novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods. These bounds together cover all values of parameters $q$, $k$, $(e_1,\ldots,e_k)$ and $e$. In contrast, previous bounds were only established for $q=2$ and limited choices of $k$, $(e_1,\ldots,e_k)$ and $e$. For CDCCs, we generalize a known non-asymptotic upper bound for $(1,0,\ldots,0)$-CDCCs and then provide a cleaner asymptotic bound. On the constructive side, for any $q\ge2$, we propose $(1,0,\ldots,0)$-CDCCs, $1$-CDCCs and $t$-$(1,\ldots,1)$-CDCCs with near-optimal redundancies. These codes have efficient and systematic encoders. For substitution errors, we design the first explicit encoding and decoding algorithms for the binary $(1,0,\ldots,0)$-CECC constructed by Dollma \emph{et al}, and extend the approach to general $q$. Furthermore, we give an improved construction of binary $1$-CECCs, a construction of nonbinary $1$-CECCs, and a construction of $t$-$(1,\ldots,1)$-CECCs. These constructions are also systematic.
