Enhanced Graph Pattern Matching

Nicola Cotumaccio

Enhanced Graph Pattern Matching

Nicola Cotumaccio

TL;DR

The paper addresses the difficulty of graph pattern matching and aims to bridge it with string pattern matching by generalizing matching statistics to graphs. It introduces a graph analogue of the LCP array, defining $\mathsf{LCP}^{\min}_G$ and $\mathsf{LCP}^{\max}_G$, and shows that only $O(p)$ representative values are needed during computation. The main theorem establishes a data structure that, for a graph with parameter $1 \le p \le n$, computes the matching statistics of a string of length $w$ in $O(w p^2 \log \log (p \sigma))$ time, with $\sigma$ the alphabet size. This work extends Burrows-Wheeler–style techniques to graphs and provides a tractable, parameterized approach to graph pattern matching. The results have potential implications for efficient graph-structured data querying under topological constraints.

Abstract

Pattern matching queries on strings can be solved in linear time by Knuth-Morris-Pratt (KMP) algorithm. In 1973, Weiner introduced the suffix tree of a string [FOCS 1973] and showed that the seemingly more difficult problem of computing matching statistics can also be solved in liner time. Pattern matching queries on graphs are inherently more difficult: under the Orthogonal Vector hypothesis, the graph pattern matching problem cannot be solved in subquadratic time [TALG 2023]. The complexity of graph pattern matching can be parameterized by the topological complexity of the considered graph, which is captured by a parameter $ p $ [JACM 2023]. In this paper, we show that, as in the string setting, computing matching statistics on graph is as difficult as solving standard pattern matching queries. To this end, we introduce a notion of longest common prefix (LCP) array for arbitrary graphs.

Enhanced Graph Pattern Matching

TL;DR

and

, and shows that only

representative values are needed during computation. The main theorem establishes a data structure that, for a graph with parameter

, computes the matching statistics of a string of length

time, with

the alphabet size. This work extends Burrows-Wheeler–style techniques to graphs and provides a tractable, parameterized approach to graph pattern matching. The results have potential implications for efficient graph-structured data querying under topological constraints.

Abstract

[JACM 2023]. In this paper, we show that, as in the string setting, computing matching statistics on graph is as difficult as solving standard pattern matching queries. To this end, we introduce a notion of longest common prefix (LCP) array for arbitrary graphs.

Paper Structure (2 sections, 1 theorem)

This paper contains 2 sections, 1 theorem.

Introduction
Our results

Key Result

theorem thmcountertheorem

Let $G$ be a graph with parameter $1 \le p \le n$cotumacciojacm2023, where $n$ is the number of vertices. Then, there exists a data structure such that, given a string of length $w$, we can compute the matching statistics of $w$ with respect to $G$ in $O(wp^2 \log \log (p \sigma))$ time, where $\sig

Theorems & Definitions (1)

theorem thmcountertheorem

Enhanced Graph Pattern Matching

TL;DR

Abstract

Enhanced Graph Pattern Matching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (1)