On the Reliability of Information Retrieval From MDS Coded Data in DNA Storage

Serge Kas Hanna

On the Reliability of Information Retrieval From MDS Coded Data in DNA Storage

Serge Kas Hanna

TL;DR

The paper tackles reliable data retrieval in DNA storage when data are protected by an outer and an inner MDS code under i.i.d substitution errors and nonuniform sequencing coverage. It develops a four-component theoretical framework that links post-consensus nucleotide/error rates, inner code decoding outcomes, outer code retrieval conditions, and a computable lower bound on end-to-end success probability. The main technical contributions include a recurrence-based method to compute the retrieval probability conditioned on a read profile, a Gaussian CLT-based approximation for large systems, and two practical bounds that facilitate optimization of sequencing and synthesis costs. The results yield insights into optimal redundancy allocation between inner and outer codes, demonstrate how nonuniform read distributions increase the required reads, and show that inner codes can be crucial in low-read regimes. The framework also accommodates extensions to asymmetric substitutions and non-MDS outer codes, providing a versatile tool for guiding design choices in practical DNA storage systems.

Abstract

This work presents a theoretical analysis of the probability of successfully retrieving data encoded with MDS codes (e.g., Reed-Solomon codes) in DNA storage systems. We study this probability under independent and identically distributed (i.i.d.) substitution errors, focusing on a common code design strategy that combines inner and outer MDS codes. Our analysis demonstrates how this probability depends on factors such as the total number of sequencing reads, their distribution across strands, the rates of the inner and outer codes, and the substitution error probabilities. These results provide actionable insights into optimizing DNA storage systems under reliability constraints, including determining the minimum number of sequencing reads needed for reliable data retrieval and identifying the optimal balance between the rates of inner and outer MDS codes.

On the Reliability of Information Retrieval From MDS Coded Data in DNA Storage

TL;DR

Abstract

On the Reliability of Information Retrieval From MDS Coded Data in DNA Storage

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)