Table of Contents
Fetching ...

Fine-Grained Complexity of Regular Path Queries

Katrin Casel, Markus L. Schmid

TL;DR

This work analyzes the fine-grained complexity of evaluating regular path queries (RPQs) under arbitrary-path semantics, focusing on the product-graph (PG) approach as a baseline and exploring its optimality. It provides tight upper bounds for non-enumeration tasks (Boole, Test, Witness, Eval, Count) under data- and combined-complexity lenses, along with conditional lower bounds based on OV, com-BMM, SBMM, and OMv hypotheses. For enumeration, PG yields delay linear in the database size, and the paper argues that sublinear delay is unlikely in general, though several promising avenues exist: sublinear delay via super-linear preprocessing, representative-subset enumeration, and restricted RPQ classes. The results illuminate which RPQ-evaluation tasks can be efficiently supported in practice, delineate the limits of PG-based strategies, and identify concrete RPQ families where improved enumeration performance is achievable, informing both theory and graph-database design. The open problem about achieving truly sublinear delay for full RPQ enumeration remains a central challenge with potential broad impact on graph-query processing.

Abstract

A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.

Fine-Grained Complexity of Regular Path Queries

TL;DR

This work analyzes the fine-grained complexity of evaluating regular path queries (RPQs) under arbitrary-path semantics, focusing on the product-graph (PG) approach as a baseline and exploring its optimality. It provides tight upper bounds for non-enumeration tasks (Boole, Test, Witness, Eval, Count) under data- and combined-complexity lenses, along with conditional lower bounds based on OV, com-BMM, SBMM, and OMv hypotheses. For enumeration, PG yields delay linear in the database size, and the paper argues that sublinear delay is unlikely in general, though several promising avenues exist: sublinear delay via super-linear preprocessing, representative-subset enumeration, and restricted RPQ classes. The results illuminate which RPQ-evaluation tasks can be efficiently supported in practice, delineate the limits of PG-based strategies, and identify concrete RPQ families where improved enumeration performance is achievable, informing both theory and graph-database design. The open problem about achieving truly sublinear delay for full RPQ enumeration remains a central challenge with potential broad impact on graph-query processing.

Abstract

A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.

Paper Structure

This paper contains 19 sections, 31 theorems, 9 equations, 3 tables.

Key Result

Lemma 2.2

Let $G = (V, E)$ be a $\Sigma$-graph. Then $G^R$ can be computed in time $\mathop{\mathrm{O}}\nolimits(|G|)$.

Theorems & Definitions (64)

  • Remark 2.1
  • Lemma 2.2
  • proof
  • Proposition 2.3
  • proof
  • Remark 2.4
  • Lemma 2.5
  • proof
  • Remark 3.1
  • Lemma 3.2
  • ...and 54 more