Algorithms for Parameterized String Matching with Mismatches

Apurba Saha; Iftekhar Hakim Kaowsar; Mahdi Hasnat Siyam; M. Sohel Rahman

Algorithms for Parameterized String Matching with Mismatches

Apurba Saha, Iftekhar Hakim Kaowsar, Mahdi Hasnat Siyam, M. Sohel Rahman

TL;DR

This work tackles parameterized string matching with mismatches by presenting two independent approaches: a deterministic algorithm for general mismatch tolerance that uses FFT-based symbol-weight computations and per-alignment maximum weighted bipartite matchings to bound mismatches, achieving a time bound of $O(|t| \cdot |\Sigma|^2 \sqrt{|\Sigma|} \log(|t| \cdot |\Sigma|))$; and a probabilistic hashing-based algorithm for the single-mismatch case that runs in $O(|t| \log |t|)$ time, with collision probabilities analyzed and mitigated via double hashing. The deterministic method encodes parameterized strings, reduces the problem to static matching via a sequence of convolutions, and supports parallelization to accelerate computation. The single-mismatch approach uses polynomial hashing and a segment tree to locate the first mismatch efficiently, improving to $O(|t| \log |t|)$ by descending the tree instead of binary search, with empirical collision analysis supporting practical deployment. Overall, the paper advances fast parameterized matching for general mismatch tolerance and provides a practical, faster hashing-based solution for the single-mismatch case, with clear paths to parallelization and future refinements.

Abstract

Two strings are considered to have parameterized matching when there exists a bijection of the parameterized alphabet onto itself such that it transforms one string to another. Parameterized matching has application in software duplication detection, image processing, and computational biology. We consider the problem for which a pattern $p$, a text $t$ and a mismatch tolerance limit $k$ is given and the goal is to find all positions in text $t$, for which pattern $p$, parameterized matches with $|p|$ length substrings of $t$ with at most $k$ mismatches. Our main result is an algorithm for this problem with $O(α^2 n\log n + n α^2 \sqrtα \log \left( n α\right))$ time complexity, where $n = |t|$ and $α= |Σ|$ which is improving for $k=\tildeΩ(|Σ|^{5/3})$ the algorithm by Hazay, Lewenstein and Sokol. We also present a hashing based probabilistic algorithm for this problem when $k = 1$ with $O \left( n \log n \right)$ time complexity, which we believe is algorithmically beautiful.

Algorithms for Parameterized String Matching with Mismatches

TL;DR

; and a probabilistic hashing-based algorithm for the single-mismatch case that runs in

time, with collision probabilities analyzed and mitigated via double hashing. The deterministic method encodes parameterized strings, reduces the problem to static matching via a sequence of convolutions, and supports parallelization to accelerate computation. The single-mismatch approach uses polynomial hashing and a segment tree to locate the first mismatch efficiently, improving to

by descending the tree instead of binary search, with empirical collision analysis supporting practical deployment. Overall, the paper advances fast parameterized matching for general mismatch tolerance and provides a practical, faster hashing-based solution for the single-mismatch case, with clear paths to parallelization and future refinements.

Abstract

, a text

and a mismatch tolerance limit

is given and the goal is to find all positions in text

, for which pattern

, parameterized matches with

length substrings of

with at most

mismatches. Our main result is an algorithm for this problem with

time complexity, where

and

which is improving for

the algorithm by Hazay, Lewenstein and Sokol. We also present a hashing based probabilistic algorithm for this problem when

with

time complexity, which we believe is algorithmically beautiful.

Algorithms for Parameterized String Matching with Mismatches

TL;DR

Abstract

Algorithms for Parameterized String Matching with Mismatches

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (3)