Two-Insertion/Deletion/Substitution Correcting Codes
Yuhang Pi, Zhifang Zhang
TL;DR
The paper tackles the construction of explicit binary codes capable of correcting up to two insertions/deletions or substitutions in DNA storage contexts. It extends Levenshtein's Varshamov-Tenengolts framework by employing higher-order VT syndromes applied to counts of adjacent symbol pairs, and introduces a formal error-type taxonomy with a sign-preserving number to classify and bound error effects. A central contribution is the code family $\mathcal{C}_{k_{1},k_{2},k_{3},k_{4}}$ that, for $n\ge 7$, achieves $2$-ins/del/sub correction with redundancy around $6\log_{2} n$ (and at least one construction with redundancy $6\log_{2} n+8$), while maintaining explicit-form structure. The work provides rigorous case-by-case analyses, leveraging error segmentation and modular constraints to guarantee unique decoding, and offers a general framework for applying higher-order VT syndromes to more complex error patterns in DNA storage applications.
Abstract
In recent years, the emergence of DNA storage systems has led to a widespread focus on the research of codes correcting insertions, deletions, and classic substitutions. During the initial investigation, Levenshtein discovered the VT codes are precisely capable of correcting single insertion/deletion and then extended the VT construction to single-insertion/deletion/substitution ($1$-ins/del/sub) correcting codes. Inspired by this, we generalize the recent findings of $1$-del $1$-sub correcting codes with redundancy $6\log_{2}n+O(1)$ to more general $2$-ins/del/sub correcting codes without increasing the redundancy. Our key technique is to apply higher-order VT syndromes to distinct objects and accomplish a systematic classification of all error patterns.
