Shift-Interleave Coding for DNA-Based Storage: Correction of IDS Errors and Sequence Losses
Ryo Shibata, Haruhiko Kaneko
TL;DR
The paper tackles reliable data storage in DNA by addressing insertion, deletion, substitution (IDS) errors and sequence losses. It introduces shift-interleave (SI) coding that uses two binary LDPC codes with a three-stage encoder (shifting, mapping, interleaving) to generate DNA base sequences and embeds synchronization markers. A non-iterative detector/decoder pair leverages a forward-backward algorithm on an IDS-aware HMM for detection and cooperative LDPC decoding to recover codewords in sequence rounds, with prior rounds guiding subsequent ones. Numerical results show that SI achieves strong bit-error-rate performance under IDS and sequence loss conditions, benefiting from longer codewords and the marker-based synchronization, and outperforming baseline schemes.
Abstract
We propose a novel coding scheme for DNA-based storage systems, called the shift-interleave (SI) coding, designed to correct insertion, deletion, and substitution (IDS) errors, as well as sequence losses. The SI coding scheme employs multiple codewords from two binary low-density parity-check codes. These codewords are processed to form DNA base sequences through shifting, bit-to-base mapping, and interleaving. At the receiver side, an efficient non-iterative detection and decoding scheme is employed to sequentially estimate codewords. The numerical results demonstrate the excellent performance of the SI coding scheme in correcting both IDS errors and sequence losses.
