Enhanced Graph Pattern Matching
Nicola Cotumaccio
TL;DR
The paper addresses the difficulty of graph pattern matching and aims to bridge it with string pattern matching by generalizing matching statistics to graphs. It introduces a graph analogue of the LCP array, defining $\mathsf{LCP}^{\min}_G$ and $\mathsf{LCP}^{\max}_G$, and shows that only $O(p)$ representative values are needed during computation. The main theorem establishes a data structure that, for a graph with parameter $1 \le p \le n$, computes the matching statistics of a string of length $w$ in $O(w p^2 \log \log (p \sigma))$ time, with $\sigma$ the alphabet size. This work extends Burrows-Wheeler–style techniques to graphs and provides a tractable, parameterized approach to graph pattern matching. The results have potential implications for efficient graph-structured data querying under topological constraints.
Abstract
Pattern matching queries on strings can be solved in linear time by Knuth-Morris-Pratt (KMP) algorithm. In 1973, Weiner introduced the suffix tree of a string [FOCS 1973] and showed that the seemingly more difficult problem of computing matching statistics can also be solved in liner time. Pattern matching queries on graphs are inherently more difficult: under the Orthogonal Vector hypothesis, the graph pattern matching problem cannot be solved in subquadratic time [TALG 2023]. The complexity of graph pattern matching can be parameterized by the topological complexity of the considered graph, which is captured by a parameter $ p $ [JACM 2023]. In this paper, we show that, as in the string setting, computing matching statistics on graph is as difficult as solving standard pattern matching queries. To this end, we introduce a notion of longest common prefix (LCP) array for arbitrary graphs.
