Minimum size of insertion/deletion/substitution balls
Yuhang Pi, Zhifang Zhang
TL;DR
This work determines the minimum possible size of the generalized insertion/deletion/substitution balls $\mathcal{B}_{t,s,p}(\mathbf{x})$ on $q$-ary strings of length $n$. It proves a universal lower bound $|\mathcal{B}_{t,s,p}(\mathbf{x})|\ge\sum_{i=0}^{t+p}\binom{n+t-s}{i}(q-1)^{i}$, with equality characterized by simple degeneracies or by the trivial case of a single run. The proof splits into three regimes: $t=s$, $t<s$, and $t>s$, employing bijective mappings, injection arguments from base cases, and run-based considerations to show the bound is tight only in those specified conditions. The results extend classical Levenshtein-type ball-size bounds and provide exact minimum sizes for a broad family of error processes, with implications for coding under mixed insertion/deletion/substitution errors.
Abstract
Let $n,q,t,s,p$ be non-negative integers where $n\geq s$ and $q\geq 1$. For $\mathbf{x}\in A_{q}^{n}\triangleq\{ 0,1,\ldots,q-1 \}^{n}$, let the $t$-insertion $s$-deletion $p$-substitution ball of $\mathbf{x}$, denoted by $\mathcal{B}_{t,s,p}(\mathbf{x})$, be the set of sequences in $A_{q}^{n+t-s}$ which can be obtained from $\mathbf{x}$ by performing $t$ insertions, $s$ deletions, and at most $p$ substitutions. We establish that for any $\mathbf{x}\in A_{q}^{n}$, $|\mathcal{B}_{t,s,p}(\mathbf{x})|\geq\sum_{i=0}^{t+p}\binom{n+t-s}{i}(q-1)^{i}$, with equality holding if and only if $t=s=0\vee s=p=0\vee s+p\geq n\vee r(\mathbf{x})=1$. Here, $r(\mathbf{x})$ denotes the number of runs in $\mathbf{x}$, and a run in $\mathbf{x}$ is a maximum continuous subsequence of identical symbols.
