Combinatorial alphabet-dependent bounds for insdel codes
Xiangliang Kong, Itzhak Tamo, Hengjia Wei
TL;DR
The paper advances the theory of q-ary insdel codes by deriving a linear-programming based sphere-packing upper bound that tightens previous bounds in regimes of large distance or alphabet, and by giving two complementary lower bounds: a hypergraph-matching bound that is asymptotically tight for fixed $n,d$ as $q oty$, and a refined GV-type bound that extends improvements to larger alphabets and distances. It also develops a recursive upper-bound framework and demonstrates improvements over prior results in several parameter regimes, with explicit constructions leveraging RS-code structures. Together, these results sharpen the understanding of how the maximum insdel code size scales with $n$, $d$, and $q$, and have implications for synchronization-robust coding in applications like DNA storage. The techniques combine combinatorial LP duality, hypergraph matchings, RS-code subgraphs, and independence-number bounds to yield tight asymptotics and practical bounds across a broad parameter range.
Abstract
Error-correcting codes resilient to synchronization errors such as insertions and deletions are known as insdel codes. Due to their important applications in DNA storage and computational biology, insdel codes have recently become a focal point of research in coding theory. In this paper, we present several new combinatorial upper and lower bounds on the maximum size of $q$-ary insdel codes. Our main upper bound is a sphere-packing bound obtained by solving a linear programming (LP) problem. It improves upon previous results for cases when the distance $d$ or the alphabet size $q$ is large. Our first lower bound is derived from a connection between insdel codes and matchings in special hypergraphs. This lower bound, together with our upper bound, shows that for fixed block length $n$ and edit distance $d$, when $q$ is sufficiently large, the maximum size of insdel codes is $ \frac{q^{n-\frac{d}{2}+1}}{{n\choose \frac{d}{2}-1}}(1 \pm o(1))$. The second lower bound refines Alon et al.'s recent logarithmic improvement on Levenshtein's GV-type bound and extends its applicability to large $q$ and $d$.
