Algorithms for Parameterized String Matching with Mismatches
Apurba Saha, Iftekhar Hakim Kaowsar, Mahdi Hasnat Siyam, M. Sohel Rahman
TL;DR
This work tackles parameterized string matching with mismatches by presenting two independent approaches: a deterministic algorithm for general mismatch tolerance that uses FFT-based symbol-weight computations and per-alignment maximum weighted bipartite matchings to bound mismatches, achieving a time bound of $O(|t| \cdot |\Sigma|^2 \sqrt{|\Sigma|} \log(|t| \cdot |\Sigma|))$; and a probabilistic hashing-based algorithm for the single-mismatch case that runs in $O(|t| \log |t|)$ time, with collision probabilities analyzed and mitigated via double hashing. The deterministic method encodes parameterized strings, reduces the problem to static matching via a sequence of convolutions, and supports parallelization to accelerate computation. The single-mismatch approach uses polynomial hashing and a segment tree to locate the first mismatch efficiently, improving to $O(|t| \log |t|)$ by descending the tree instead of binary search, with empirical collision analysis supporting practical deployment. Overall, the paper advances fast parameterized matching for general mismatch tolerance and provides a practical, faster hashing-based solution for the single-mismatch case, with clear paths to parallelization and future refinements.
Abstract
Two strings are considered to have parameterized matching when there exists a bijection of the parameterized alphabet onto itself such that it transforms one string to another. Parameterized matching has application in software duplication detection, image processing, and computational biology. We consider the problem for which a pattern $p$, a text $t$ and a mismatch tolerance limit $k$ is given and the goal is to find all positions in text $t$, for which pattern $p$, parameterized matches with $|p|$ length substrings of $t$ with at most $k$ mismatches. Our main result is an algorithm for this problem with $O(α^2 n\log n + n α^2 \sqrtα \log \left( n α\right))$ time complexity, where $n = |t|$ and $α= |Σ|$ which is improving for $k=\tildeΩ(|Σ|^{5/3})$ the algorithm by Hazay, Lewenstein and Sokol. We also present a hashing based probabilistic algorithm for this problem when $k = 1$ with $O \left( n \log n \right)$ time complexity, which we believe is algorithmically beautiful.
