Table of Contents
Fetching ...

Shift-Interleave Coding for DNA-Based Storage: Correction of IDS Errors and Sequence Losses

Ryo Shibata, Haruhiko Kaneko

TL;DR

The paper tackles reliable data storage in DNA by addressing insertion, deletion, substitution (IDS) errors and sequence losses. It introduces shift-interleave (SI) coding that uses two binary LDPC codes with a three-stage encoder (shifting, mapping, interleaving) to generate DNA base sequences and embeds synchronization markers. A non-iterative detector/decoder pair leverages a forward-backward algorithm on an IDS-aware HMM for detection and cooperative LDPC decoding to recover codewords in sequence rounds, with prior rounds guiding subsequent ones. Numerical results show that SI achieves strong bit-error-rate performance under IDS and sequence loss conditions, benefiting from longer codewords and the marker-based synchronization, and outperforming baseline schemes.

Abstract

We propose a novel coding scheme for DNA-based storage systems, called the shift-interleave (SI) coding, designed to correct insertion, deletion, and substitution (IDS) errors, as well as sequence losses. The SI coding scheme employs multiple codewords from two binary low-density parity-check codes. These codewords are processed to form DNA base sequences through shifting, bit-to-base mapping, and interleaving. At the receiver side, an efficient non-iterative detection and decoding scheme is employed to sequentially estimate codewords. The numerical results demonstrate the excellent performance of the SI coding scheme in correcting both IDS errors and sequence losses.

Shift-Interleave Coding for DNA-Based Storage: Correction of IDS Errors and Sequence Losses

TL;DR

The paper tackles reliable data storage in DNA by addressing insertion, deletion, substitution (IDS) errors and sequence losses. It introduces shift-interleave (SI) coding that uses two binary LDPC codes with a three-stage encoder (shifting, mapping, interleaving) to generate DNA base sequences and embeds synchronization markers. A non-iterative detector/decoder pair leverages a forward-backward algorithm on an IDS-aware HMM for detection and cooperative LDPC decoding to recover codewords in sequence rounds, with prior rounds guiding subsequent ones. Numerical results show that SI achieves strong bit-error-rate performance under IDS and sequence loss conditions, benefiting from longer codewords and the marker-based synchronization, and outperforming baseline schemes.

Abstract

We propose a novel coding scheme for DNA-based storage systems, called the shift-interleave (SI) coding, designed to correct insertion, deletion, and substitution (IDS) errors, as well as sequence losses. The SI coding scheme employs multiple codewords from two binary low-density parity-check codes. These codewords are processed to form DNA base sequences through shifting, bit-to-base mapping, and interleaving. At the receiver side, an efficient non-iterative detection and decoding scheme is employed to sequentially estimate codewords. The numerical results demonstrate the excellent performance of the SI coding scheme in correcting both IDS errors and sequence losses.
Paper Structure (16 sections, 13 equations, 7 figures, 1 algorithm)

This paper contains 16 sections, 13 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Cascaded IDS and block erasure channel model.
  • Figure 2: Diagram of the proposed system based on the SI coding scheme.
  • Figure 3: Illustration of message passing around a mapping node.
  • Figure 4: $p_{\rm e}=0$
  • Figure 5: $p_{\rm e}=0.1$
  • ...and 2 more figures

Theorems & Definitions (2)

  • Example 1
  • Example 2