Constrained Error-Correcting Codes for Efficient DNA Synthesis
Yajuan Liu, Tolga M. Duman
TL;DR
The paper addresses efficient DNA synthesis for storage by designing capacity-achieving constrained codes that operate under a fixed supersequence template. It develops enumeration-based constructions that enforce both $\ell$-RLL and $\epsilon$-balance and provides explicit polynomial-time encoders/decoders via unranking/ranking, achieving synthesis times $T$ in $[n,4n]$ with provable capacity. It further extends these codes to constrained ECCs capable of correcting a single indel using a VT-based scheme, with overall complexity $O(n^4\ell^2)$ and augmented redundancy. These results offer practical, provably high-rate coding solutions to reduce synthesis costs and improve reliability in parallel DNA synthesis for storage.
Abstract
DNA synthesis is considered as one of the most expensive components in current DNA storage systems. In this paper, focusing on a common synthesis machine, which generates multiple DNA strands in parallel following a fixed supersequence,we propose constrained codes with polynomial-time encoding and decoding algorithms. Compared to the existing works, our codes simultaneously satisfy both l-runlength limited and ε-balanced constraints. By enumerating all valid sequences, our codes achieve the maximum rate, matching the capacity. Additionally, we design constrained error-correcting codes capable of correcting one insertion or deletion in the obtained DNA sequence while still adhering to the constraints.
