Table of Contents
Fetching ...

VL-DNA: Enhance DNA Storage Capacity with Variable Payload (Strand) Lengths

Yixun Wei, Wenlong Wang, Huibing Dong, Bingzhe Li, David Du

TL;DR

The paper tackles primer-payload collisions in PCR-based DNA storage, which severely limit usable primers and tube capacity. It introduces VL-DNA, a post-processing method that uses variable payload lengths (notably 150/160/190/200 bases) to create collision-cutting points, enabling recovery of many primers and substantial capacity gains across encoding schemes. The problem is formulated as a maximum-weight independent set on a collision-conflict graph and solved with a greedy heuristic that runs in $O(n)$ time, where $n$ is the number of collisions. Evaluations across three encoding schemes show thousands of primers recovered and tube-capacity improvements ranging from 18.27% to 19x, demonstrating practical potential for scalable, high-density DNA archival storage.

Abstract

DNA storage is a promising archival data storage solution to today's big data problem. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. For efficient target data retrieving, existing Polymerase Chain Reaction PCR based DNA storage systems apply primers as specific identifier to tag different set of DNA strands. However, the PCR based DNA storage system suffers from primer-payload collisions, causing a significant reduction of storage capacity. This paper proposes using variable strand length, which takes advantage of the inherent payload-cutting process, to split collisions and recover primers. The executing time of our scheme is linear to the number of primer-payload collisions. The scheme serves as a post-processing method to any DNA encoding scheme. The evaluation of three state-of-the-art encoding schemes shows that the scheme can recover thousands of usable primers and improve tube capacity ranging from 18.27% to 19x.

VL-DNA: Enhance DNA Storage Capacity with Variable Payload (Strand) Lengths

TL;DR

The paper tackles primer-payload collisions in PCR-based DNA storage, which severely limit usable primers and tube capacity. It introduces VL-DNA, a post-processing method that uses variable payload lengths (notably 150/160/190/200 bases) to create collision-cutting points, enabling recovery of many primers and substantial capacity gains across encoding schemes. The problem is formulated as a maximum-weight independent set on a collision-conflict graph and solved with a greedy heuristic that runs in time, where is the number of collisions. Evaluations across three encoding schemes show thousands of primers recovered and tube-capacity improvements ranging from 18.27% to 19x, demonstrating practical potential for scalable, high-density DNA archival storage.

Abstract

DNA storage is a promising archival data storage solution to today's big data problem. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. For efficient target data retrieving, existing Polymerase Chain Reaction PCR based DNA storage systems apply primers as specific identifier to tag different set of DNA strands. However, the PCR based DNA storage system suffers from primer-payload collisions, causing a significant reduction of storage capacity. This paper proposes using variable strand length, which takes advantage of the inherent payload-cutting process, to split collisions and recover primers. The executing time of our scheme is linear to the number of primer-payload collisions. The scheme serves as a post-processing method to any DNA encoding scheme. The evaluation of three state-of-the-art encoding schemes shows that the scheme can recover thousands of usable primers and improve tube capacity ranging from 18.27% to 19x.
Paper Structure (13 sections, 7 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Workflow of typical DNA storage system
  • Figure 2: standard PCR and defective PCR with primer-payload collision
  • Figure 3: Distribution of primers with different numbers of collisions (encoding scheme: Blawat, data: 135MB video)
  • Figure 4: Possible cut-points with payload length 150/160/190/200
  • Figure 5: Workflow of the DNA archival storage system equipped with VL-DNA
  • ...and 2 more figures