DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

Avital Boruchovsky; Daniella Bar-Lev; Eitan Yaakobi

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

Avital Boruchovsky, Daniella Bar-Lev, Eitan Yaakobi

TL;DR

This work introduces DNA-correcting codes that jointly address clustering, reconstruction, and error correction in DNA storage. It defines a DNA-specific distance metric and derives necessary and sufficient conditions for unique recovery of input strands under a $(\\tau,e_i,e_d)_K$-DNA storage model, with distinct regimes based on the copy-allocation parameter $\\tau$. The study develops bounds and constructions for index-correcting codes and analyzes the impact of index length $\\ell$ on code size, including a permutation-based case ($\\ell=\\log(M)$) and a length-expansion approach ($\\ell>\\log(M)$). It further extends the framework to erroneous data-fields ($e_d>0$) by generalizing the distance and providing constructions that couple data-field ECCs with index-correcting codes. The results offer a principled, end-to-end coding approach for DNA storage with practical implications for reliable data retrieval under realistic synthesis/sequencing noise and one-to-many readout scenarios.

Abstract

This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavior of the DNA storage system and provide necessary and sufficient conditions for DNA-correcting codes. The paper also includes several bounds and constructions of DNA-correcting codes.

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

TL;DR

-DNA storage model, with distinct regimes based on the copy-allocation parameter

. The study develops bounds and constructions for index-correcting codes and analyzes the impact of index length

on code size, including a permutation-based case (

) and a length-expansion approach (

). It further extends the framework to erroneous data-fields (

) by generalizing the distance and providing constructions that couple data-field ECCs with index-correcting codes. The results offer a principled, end-to-end coding approach for DNA storage with practical implications for reliable data retrieval under realistic synthesis/sequencing noise and one-to-many readout scenarios.

Abstract

Paper Structure (17 sections, 25 theorems, 35 equations, 1 figure)

This paper contains 17 sections, 25 theorems, 35 equations, 1 figure.

Introduction
Definitions, Problem Statement, and Related Works
Definitions
Problem Statement
Related Work
Error Free Data-Field
The DNA-Distance
Necessary and Sufficient Conditions for DNA-Correcting Codes
Codes for a Fixed Data-Field Set
Index-Correcting Codes
$\ell=\log(M)$
$\ell>\log(M)$
Erroneous Data-Field
Sufficient and Necessary Conditions
Constructions for $e_d>0$
...and 2 more sections

Key Result

Corollary 1

It holds that

Figures (1)

Figure 1: All possible matchings between $I({\boldsymbol u},Z_1)$ and $I({\boldsymbol u},Z_2)$ for every date field ${\boldsymbol u}\in S(Z_1).$

Theorems & Definitions (35)

Definition 1
Claim 1
Corollary 1
Example 1
Lemma 1
Theorem 1: Hall, 1935
Theorem 2
Lemma 2
Example 2
Lemma 3
...and 25 more

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

TL;DR

Abstract

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (35)