Table of Contents
Fetching ...

Minimum size of insertion/deletion/substitution balls

Yuhang Pi, Zhifang Zhang

TL;DR

This work determines the minimum possible size of the generalized insertion/deletion/substitution balls $\mathcal{B}_{t,s,p}(\mathbf{x})$ on $q$-ary strings of length $n$. It proves a universal lower bound $|\mathcal{B}_{t,s,p}(\mathbf{x})|\ge\sum_{i=0}^{t+p}\binom{n+t-s}{i}(q-1)^{i}$, with equality characterized by simple degeneracies or by the trivial case of a single run. The proof splits into three regimes: $t=s$, $t<s$, and $t>s$, employing bijective mappings, injection arguments from base cases, and run-based considerations to show the bound is tight only in those specified conditions. The results extend classical Levenshtein-type ball-size bounds and provide exact minimum sizes for a broad family of error processes, with implications for coding under mixed insertion/deletion/substitution errors.

Abstract

Let $n,q,t,s,p$ be non-negative integers where $n\geq s$ and $q\geq 1$. For $\mathbf{x}\in A_{q}^{n}\triangleq\{ 0,1,\ldots,q-1 \}^{n}$, let the $t$-insertion $s$-deletion $p$-substitution ball of $\mathbf{x}$, denoted by $\mathcal{B}_{t,s,p}(\mathbf{x})$, be the set of sequences in $A_{q}^{n+t-s}$ which can be obtained from $\mathbf{x}$ by performing $t$ insertions, $s$ deletions, and at most $p$ substitutions. We establish that for any $\mathbf{x}\in A_{q}^{n}$, $|\mathcal{B}_{t,s,p}(\mathbf{x})|\geq\sum_{i=0}^{t+p}\binom{n+t-s}{i}(q-1)^{i}$, with equality holding if and only if $t=s=0\vee s=p=0\vee s+p\geq n\vee r(\mathbf{x})=1$. Here, $r(\mathbf{x})$ denotes the number of runs in $\mathbf{x}$, and a run in $\mathbf{x}$ is a maximum continuous subsequence of identical symbols.

Minimum size of insertion/deletion/substitution balls

TL;DR

This work determines the minimum possible size of the generalized insertion/deletion/substitution balls on -ary strings of length . It proves a universal lower bound , with equality characterized by simple degeneracies or by the trivial case of a single run. The proof splits into three regimes: , , and , employing bijective mappings, injection arguments from base cases, and run-based considerations to show the bound is tight only in those specified conditions. The results extend classical Levenshtein-type ball-size bounds and provide exact minimum sizes for a broad family of error processes, with implications for coding under mixed insertion/deletion/substitution errors.

Abstract

Let be non-negative integers where and . For , let the -insertion -deletion -substitution ball of , denoted by , be the set of sequences in which can be obtained from by performing insertions, deletions, and at most substitutions. We establish that for any , , with equality holding if and only if . Here, denotes the number of runs in , and a run in is a maximum continuous subsequence of identical symbols.

Paper Structure

This paper contains 6 sections, 43 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Case 1, where $\ast$ and $\ast\ast\ast$ represent an element in $A_{q}\setminus\{ 0 \}$ and a sequence over $A_{q}\setminus\{ 0 \}$, respectively.
  • Figure 2: Case 1, where $\ast$ and $\ast\ast\ast$ represent an element in $A_{q}\setminus\{ 0 \}$ and a sequence over $A_{q}\setminus\{ 0 \}$, respectively.
  • Figure 3: In the first stage, we substitute $x_{n}$ with $\overline{x_{n}}$ and insert a prefix $\overline{x_{1}}$ and a suffix $(\overline{x_{n}})^{t-1}$ from $\mathbf{x}$ to obtain $\mathbf{w}$.
  • Figure 4: For Case 1' in the second stage, we substitute $w_{e}$ with $\overline{x_{e}}$ for all $e\in ([2,p]\cup\{ n \})\setminus\{ i+1 \}$ from $\mathbf{w}$ to obtain $\mathbf{z}$.
  • Figure 5: For Case 2' in the second stage, we substitute $w_{e}$ with $\overline{x_{e}}$ for all $e\in [2,p]$ from $\mathbf{w}$ to obtain $\mathbf{z}$.

Theorems & Definitions (8)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof