Table of Contents
Fetching ...

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

Avital Boruchovsky, Daniella Bar-Lev, Eitan Yaakobi

TL;DR

This work introduces DNA-correcting codes that jointly address clustering, reconstruction, and error correction in DNA storage. It defines a DNA-specific distance metric and derives necessary and sufficient conditions for unique recovery of input strands under a $(\\tau,e_i,e_d)_K$-DNA storage model, with distinct regimes based on the copy-allocation parameter $\\tau$. The study develops bounds and constructions for index-correcting codes and analyzes the impact of index length $\\ell$ on code size, including a permutation-based case ($\\ell=\\log(M)$) and a length-expansion approach ($\\ell>\\log(M)$). It further extends the framework to erroneous data-fields ($e_d>0$) by generalizing the distance and providing constructions that couple data-field ECCs with index-correcting codes. The results offer a principled, end-to-end coding approach for DNA storage with practical implications for reliable data retrieval under realistic synthesis/sequencing noise and one-to-many readout scenarios.

Abstract

This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavior of the DNA storage system and provide necessary and sufficient conditions for DNA-correcting codes. The paper also includes several bounds and constructions of DNA-correcting codes.

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

TL;DR

This work introduces DNA-correcting codes that jointly address clustering, reconstruction, and error correction in DNA storage. It defines a DNA-specific distance metric and derives necessary and sufficient conditions for unique recovery of input strands under a -DNA storage model, with distinct regimes based on the copy-allocation parameter . The study develops bounds and constructions for index-correcting codes and analyzes the impact of index length on code size, including a permutation-based case () and a length-expansion approach (). It further extends the framework to erroneous data-fields () by generalizing the distance and providing constructions that couple data-field ECCs with index-correcting codes. The results offer a principled, end-to-end coding approach for DNA storage with practical implications for reliable data retrieval under realistic synthesis/sequencing noise and one-to-many readout scenarios.

Abstract

This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavior of the DNA storage system and provide necessary and sufficient conditions for DNA-correcting codes. The paper also includes several bounds and constructions of DNA-correcting codes.
Paper Structure (17 sections, 25 theorems, 35 equations, 1 figure)

This paper contains 17 sections, 25 theorems, 35 equations, 1 figure.

Key Result

Corollary 1

It holds that

Figures (1)

  • Figure 1: All possible matchings between $I({\boldsymbol u},Z_1)$ and $I({\boldsymbol u},Z_2)$ for every date field ${\boldsymbol u}\in S(Z_1).$

Theorems & Definitions (35)

  • Definition 1
  • Claim 1
  • Corollary 1
  • Example 1
  • Lemma 1
  • Theorem 1: Hall, 1935
  • Theorem 2
  • Lemma 2
  • Example 2
  • Lemma 3
  • ...and 25 more