Table of Contents
Fetching ...

Half-Marker Codes for Deletion Channels with Applications in DNA Storage

Javad Haghighat, Tolga M. Duman

TL;DR

This paper introduces half-marker codes, a variant of marker codes for DNA storage synchronization over IDS channels, by reserving one bit per 4-ary symbol for synchronization. The authors formalize half-marker construction, adapt the FB decoding to the altered statistics, and quantify mutual information gains via $I(\boldsymbol{u};\boldsymbol{L})$, aiding the design of concatenated LDPC codes. Numerical results show that half-marker codes can achieve higher overall achievable rates and lower end-to-end BER/SER than standard markers across multiple parameter settings, with HMC1 and HMC2 often outperforming their standard counterparts. The work highlights a practical pathway to improved DNA storage reliability and throughput, while also suggesting directions for optimizing LDPC–half-marker schemes in future studies.

Abstract

DNA storage systems face significant challenges, including insertion, deletion, and substitution (IDS) errors. Therefore, designing effective synchronization codes, i.e., codes capable of correcting IDS errors, is essential for DNA storage systems. Marker codes are a favorable choice for this purpose. In this paper, we extend the notion of marker codes by making the following key observation. Since each DNA base is equivalent to a 2-bit storage unit, one bit can be reserved for synchronization, while the other is dedicated to data transmission. Using this observation, we propose a new class of marker codes, which we refer to as half-marker codes. We demonstrate that this extension has the potential to significantly increase the mutual information between the input symbols and the soft outputs of an IDS channel modeling a DNA storage system. Specifically, through examples, we show that when concatenated with an outer error-correcting code, half-marker codes outperform standard marker codes and significantly reduce the end-to-end bit error rate of the system.

Half-Marker Codes for Deletion Channels with Applications in DNA Storage

TL;DR

This paper introduces half-marker codes, a variant of marker codes for DNA storage synchronization over IDS channels, by reserving one bit per 4-ary symbol for synchronization. The authors formalize half-marker construction, adapt the FB decoding to the altered statistics, and quantify mutual information gains via , aiding the design of concatenated LDPC codes. Numerical results show that half-marker codes can achieve higher overall achievable rates and lower end-to-end BER/SER than standard markers across multiple parameter settings, with HMC1 and HMC2 often outperforming their standard counterparts. The work highlights a practical pathway to improved DNA storage reliability and throughput, while also suggesting directions for optimizing LDPC–half-marker schemes in future studies.

Abstract

DNA storage systems face significant challenges, including insertion, deletion, and substitution (IDS) errors. Therefore, designing effective synchronization codes, i.e., codes capable of correcting IDS errors, is essential for DNA storage systems. Marker codes are a favorable choice for this purpose. In this paper, we extend the notion of marker codes by making the following key observation. Since each DNA base is equivalent to a 2-bit storage unit, one bit can be reserved for synchronization, while the other is dedicated to data transmission. Using this observation, we propose a new class of marker codes, which we refer to as half-marker codes. We demonstrate that this extension has the potential to significantly increase the mutual information between the input symbols and the soft outputs of an IDS channel modeling a DNA storage system. Specifically, through examples, we show that when concatenated with an outer error-correcting code, half-marker codes outperform standard marker codes and significantly reduce the end-to-end bit error rate of the system.

Paper Structure

This paper contains 5 sections, 11 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Block diagram of the considered DNA storage system.
  • Figure 2: Comparison between inserting standard markers (Top) and half-markers (Bottom) within a $4$-ary data sequence. Marker bits are shown in black, and data bits are shown in white.
  • Figure 3: Bit error rates of concatenated LDPC-marker coding schemes, when $p_{s}=0.02$ and $N_{p}=6$.
  • Figure 4: Overall achievable rate, for $p_{d}=0.02$ and $p_{s}=0.01$.
  • Figure 5: Comparison between the symbol error rates achieved by different synchronization codes.