Table of Contents
Fetching ...

Geno-Weaving: Low-Complexity Capacity-Achieving DNA Storage

Hsin-Po Wang, Venkatesan Guruswami

TL;DR

This paper lays down a rateless code along each strand to encode its index; it then lays down a capacity-achieving block code at the same position across all strands to protect data, and weaves a low-complexity coding scheme that achieves DNA's capacity.

Abstract

As a possible implementation of data storage using DNA, multiple strands of DNA are stored in a liquid container so that, in the future, they can be read by an array of DNA readers in parallel. These readers will sample the strands with replacement to produce a random number of noisy reads for each strand. An essential component of such a data storage system is how to reconstruct data out of these unsorted, repetitive, and noisy reads. It is known that if a single read can be modeled by a substitution channel $W$, then the overall capacity can be expressed by the "Poisson-ization" of $W$. In this paper, we lay down a rateless code along each strand to encode its index; we then lay down a capacity-achieving block code at the same position across all strands to protect data. That weaves a low-complexity coding scheme that achieves DNA's capacity.

Geno-Weaving: Low-Complexity Capacity-Achieving DNA Storage

TL;DR

This paper lays down a rateless code along each strand to encode its index; it then lays down a capacity-achieving block code at the same position across all strands to protect data, and weaves a low-complexity coding scheme that achieves DNA's capacity.

Abstract

As a possible implementation of data storage using DNA, multiple strands of DNA are stored in a liquid container so that, in the future, they can be read by an array of DNA readers in parallel. These readers will sample the strands with replacement to produce a random number of noisy reads for each strand. An essential component of such a data storage system is how to reconstruct data out of these unsorted, repetitive, and noisy reads. It is known that if a single read can be modeled by a substitution channel , then the overall capacity can be expressed by the "Poisson-ization" of . In this paper, we lay down a rateless code along each strand to encode its index; we then lay down a capacity-achieving block code at the same position across all strands to protect data. That weaves a low-complexity coding scheme that achieves DNA's capacity.
Paper Structure (17 sections, 6 theorems, 43 equations, 5 figures)

This paper contains 17 sections, 6 theorems, 43 equations, 5 figures.

Key Result

Theorem 4

SupposeThis is equivalent to assuming that each $X^s$ spends $\log_q n$ letters of their length $\ell + \log_q n$ to encode the strand index $s \in [n]$, and the indexing part is error-free. that a genie reveals $S$. Suppose that, as $\ell$, $m$, and $n$ increase, $\lambda \coloneqq m/n$ remains a c That is, the capacity of DNA coding is determined by the Poisson-ization of the channel $W$ that mo

Figures (5)

  • Figure 1: DNA strands float in liquid; nanopores will read them in parallel. See WZB21 for a more accurate picture.
  • Figure 2: To prove Theorem \ref{['thm:useC']}, we will apply a block code at the same position across all strands to protect data.
  • Figure 3: To prove Theorem \ref{['thm:useBR']}, we will deploy a rateless code on each strand to encode its index and a block code at each position to protect data. The code rate of the latter will depend on $p$. The letter at any intersection will be the mod-$4$ sum of the rateless code symbol and the block code symbol.
  • Figure 4: The decoding part of Theorem \ref{['thm:useBR']}. Step 1: decode $\mathcal{R}$ to obtain indices. Even number steps: subtract indices from $X$ and decode $\mathcal{B}$ to obtain data. Odd number steps: subtract data from $X$ and decode $\mathcal{R}$ to obtain indices.
  • Figure 5: Step 1: decode indices. Even number steps: subtract indices to decode data. Odd number steps: subtract data to decode indices. This figure caps the number of reads at $\kappa \coloneqq 9$.

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 4
  • Theorem 5: presented in ITA 2024
  • Theorem 6: ShH21
  • Theorem 7: presented in ISIT 2024's satellite workshop
  • Proposition 8
  • Lemma 9