DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems
Avital Boruchovsky, Daniella Bar-Lev, Eitan Yaakobi
TL;DR
This work introduces DNA-correcting codes that jointly address clustering, reconstruction, and error correction in DNA storage. It defines a DNA-specific distance metric and derives necessary and sufficient conditions for unique recovery of input strands under a $(\\tau,e_i,e_d)_K$-DNA storage model, with distinct regimes based on the copy-allocation parameter $\\tau$. The study develops bounds and constructions for index-correcting codes and analyzes the impact of index length $\\ell$ on code size, including a permutation-based case ($\\ell=\\log(M)$) and a length-expansion approach ($\\ell>\\log(M)$). It further extends the framework to erroneous data-fields ($e_d>0$) by generalizing the distance and providing constructions that couple data-field ECCs with index-correcting codes. The results offer a principled, end-to-end coding approach for DNA storage with practical implications for reliable data retrieval under realistic synthesis/sequencing noise and one-to-many readout scenarios.
Abstract
This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavior of the DNA storage system and provide necessary and sufficient conditions for DNA-correcting codes. The paper also includes several bounds and constructions of DNA-correcting codes.
