Tail-Erasure-Correcting Codes
Boaz Moav, Ryan Gabrys, Eitan Yaakobi
TL;DR
Tail-Erasure-Correcting Codes addresses error patterns in DNA-storage-type 2D binary arrays, focusing on tail-erasures at the ends of rows and arbitrary deletions across rows, as well as their combinations. The authors introduce the TE distance $\rho_{TE}$ and construct TE codes (including linear TE codes via a TE parity-check tensor and Hasse-derivative based large-$e$ codes), develop $(t,s)$-DC codes using VT- and tensor-product approaches, and design TED codes that fuse per-row VT syndromes with nonbinary parity checks. They derive constructive encoders/decoders in several regimes and establish upper bounds on code cardinalities that demonstrate near-optimal performance in many cases. The work advances practical, structured error-correction for emerging DNA-storage array platforms and provides a foundational framework for handling tail-erasures and deletions with explicit, scalable coding schemes.
Abstract
The increasing demand for data storage has prompted the exploration of new techniques, with molecular data storage being a promising alternative. In this work, we develop coding schemes for a new storage paradigm that can be represented as a collection of two-dimensional arrays. Motivated by error patterns observed in recent prototype architectures, our study focuses on correcting erasures in the last few symbols of each row, and also correcting arbitrary deletions across rows. We present code constructions and explicit encoders and decoders that are shown to be nearly optimal in many scenarios. We show that the new coding schemes are capable of effectively mitigating these errors, making these emerging storage platforms potentially promising solutions.
