Codes for Limited-Magnitude Probability Error in DNA Storage
Wenkai Zhang, Zhiying Wang
TL;DR
This work defines a limited-magnitude probability error (LMPE) channel for composite DNA letters, where each symbol is a probability vector of four nucleotides with fixed resolution $k$ and at most $t$ symbol errors of magnitude at most $l$. It develops a two-layer coding framework that first protects symbol classes (remainder/quotient structure) and then recovers actual probability vectors, and introduces multiple explicit constructions (remainder classes, reduced classes, improved Hamming, BCH-based schemes) with asymptotic optimality proven for the remainder-class approach. The bounds section provides sphere-packing and Gilbert–Varshamov results that guide code-size and rate tradeoffs as $n$ grows, $k$ grows large, and $l,t$ vary, while the systematic LMPE codes with Gray mapping enhance practical deployment. Collectively, the paper delivers concrete, scalable error-correcting schemes for DNA storage using composite letters, balancing redundancy, complexity, and implementation practicality, with clear paths to higher rates via asymptotic optimization and systematic designs.
Abstract
DNA, with remarkable properties of high density, durability, and replicability, is one of the most appealing storage media. Emerging DNA storage technologies use composite DNA letters, where information is represented by probability vectors, leading to higher information density and lower synthesizing costs than regular DNA letters. However, it faces the problem of inevitable noise and information corruption. This paper explores the channel of composite DNA letters in DNA-based storage systems and introduces block codes for limited-magnitude probability errors on probability vectors. First, outer and inner bounds for limited-magnitude probability error correction codes are provided. Moreover, code constructions are proposed where the number of errors is bounded by t, the error magnitudes are bounded by l, and the probability resolution is fixed as k. These constructions focus on leveraging the properties of limited-magnitude probability errors in DNA-based storage systems, leading to improved performance in terms of complexity and redundancy. In addition, the asymptotic optimality for one of the proposed constructions is established. Finally, systematic codes based on one of the proposed constructions are presented, which enable efficient information extraction for practical implementation.
