VL-DNA: Enhance DNA Storage Capacity with Variable Payload (Strand) Lengths
Yixun Wei, Wenlong Wang, Huibing Dong, Bingzhe Li, David Du
TL;DR
The paper tackles primer-payload collisions in PCR-based DNA storage, which severely limit usable primers and tube capacity. It introduces VL-DNA, a post-processing method that uses variable payload lengths (notably 150/160/190/200 bases) to create collision-cutting points, enabling recovery of many primers and substantial capacity gains across encoding schemes. The problem is formulated as a maximum-weight independent set on a collision-conflict graph and solved with a greedy heuristic that runs in $O(n)$ time, where $n$ is the number of collisions. Evaluations across three encoding schemes show thousands of primers recovered and tube-capacity improvements ranging from 18.27% to 19x, demonstrating practical potential for scalable, high-density DNA archival storage.
Abstract
DNA storage is a promising archival data storage solution to today's big data problem. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. For efficient target data retrieving, existing Polymerase Chain Reaction PCR based DNA storage systems apply primers as specific identifier to tag different set of DNA strands. However, the PCR based DNA storage system suffers from primer-payload collisions, causing a significant reduction of storage capacity. This paper proposes using variable strand length, which takes advantage of the inherent payload-cutting process, to split collisions and recover primers. The executing time of our scheme is linear to the number of primer-payload collisions. The scheme serves as a post-processing method to any DNA encoding scheme. The evaluation of three state-of-the-art encoding schemes shows that the scheme can recover thousands of usable primers and improve tube capacity ranging from 18.27% to 19x.
