Coding for Synthesis Defects

Ziyang Lu; Han Mao Kiah; Yiwei Zhang; Robert N. Grass; Eitan Yaakobi

Coding for Synthesis Defects

Ziyang Lu, Han Mao Kiah, Yiwei Zhang, Robert N. Grass, Eitan Yaakobi

TL;DR

The study addresses synthesis defects in parallel DNA strand synthesis for data storage by formulating two code families: KDCC for known defect cycles and SDCC for unknown defect locations. It develops a reduction from quaternary to binary codes via a signature (tilde{x}) and constructs explicit KDCC schemes for t=1 and t=2, achieving redundancies as low as $\log 4$ and $\log n+18\log 3$, respectively. For the unknown-defect setting, it introduces defect-locating strands to constrain error locations and provides 1-SDCC and 2-SDCC constructions with redundancy bounds scaling as $O((\log n)^2)$ and $O(M\log n)$ terms, respectively, demonstrating substantial redundancy savings over naive per-strand deletion codes. A lower bound shows that the 1-KDCC redundancy is essentially tight up to lower-order terms, underscoring the near-optimality of the proposed KDCC designs, while the SDCC framework offers a practical, scalable approach for multi-strand synthesis with defect localization. Together, the results advance efficient, low-redundancy coding for synthesis-based DNA data storage, enabling faster synthesis and reduced costs.

Abstract

Motivated by DNA based data storage system, we investigate the errors that occur when synthesizing DNA strands in parallel, where each strand is appended one nucleotide at a time by the machine according to a template supersequence. If there is a cycle such that the machine fails, then the strands meant to be appended at this cycle will not be appended, and we refer to this as a synthesis defect. In this paper, we present two families of codes correcting synthesis defects, which are t-known-synthesis-defect correcting codes and t-synthesis-defect correcting codes. For the first one, it is assumed that the defective cycles are known, and each of the codeword is a quaternary sequence. We provide constructions for this family of codes for t = 1, 2, with redundancy log 4 and log n+18 log 3, respectively. For the second one, the codeword is a set of M ordered sequences, and we give constructions for t = 1, 2 to show a strategy for constructing this family of codes. Finally, we derive a lower bound on the redundancy for single-known-synthesis-defect correcting codes, which assures that our construction is almost optimal.

Coding for Synthesis Defects

TL;DR

and

, respectively. For the unknown-defect setting, it introduces defect-locating strands to constrain error locations and provides 1-SDCC and 2-SDCC constructions with redundancy bounds scaling as

and

terms, respectively, demonstrating substantial redundancy savings over naive per-strand deletion codes. A lower bound shows that the 1-KDCC redundancy is essentially tight up to lower-order terms, underscoring the near-optimality of the proposed KDCC designs, while the SDCC framework offers a practical, scalable approach for multi-strand synthesis with defect localization. Together, the results advance efficient, low-redundancy coding for synthesis-based DNA data storage, enabling faster synthesis and reduced costs.

Abstract

Paper Structure (15 sections, 34 theorems, 47 equations, 4 tables, 1 algorithm)

This paper contains 15 sections, 34 theorems, 47 equations, 4 tables, 1 algorithm.

Introduction
Problem Formulation
Problem Formulation
Organization
Constructions of Known-Synthesis-Defect Correcting Codes
Reduction to binary codes
Constructions for $t=1$
Construction of $t=2$
Constructions of Synthesis-Defect Correcting Codes
Defect-Locating Strands
$1$-Synthesis-Defect Correcting Code
$2$-Synthesis-Defect Correcting Code
A Lower Bound on the Redundancy of Known-Synthesis-Defect Correcting Codes
Conclusion and Future Work
Construction of P-bounded two-deletion correcting code for large P

Key Result

Lemma 1

For ${\boldsymbol x}\in\Sigma^n$, and $\delta\in\mathsf{cycle}({\boldsymbol x})$, we have

Theorems & Definitions (51)

Definition 1
Example 1
Example 2
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Lemma 1
Lemma 2
...and 41 more

Coding for Synthesis Defects

TL;DR

Abstract

Coding for Synthesis Defects

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (51)