Table of Contents
Fetching ...

Minimizing Conjunctive Regular Path Queries

Diego Figueira, Rémi Morvan, Miguel Romero

TL;DR

This work analyzes the minimization problem for Conjunctive Regular Path Queries (CRPQs) and unions of CRPQs (UCRPQs) in graph databases, providing decidability results and precise complexity bounds. The CRPQ minimization problem admits a 2ExpSpace upper bound via brute-force enumeration and is ExpSpace-hard, while the UCRPQ minimization problem is ExpSpace-complete, with an upper bound achieved through maximal under-approximations. For the SRE fragment, minimization remains PiP2-complete, aligning with containment complexity. The paper also develops a theory of minimality, introducing strong minimality and semantical structure results that relate to the underlying segment graphs, and discusses variable-minimization and extensions to tree patterns as future directions. Overall, the results deepen the theoretical understanding of query optimization for graph databases and provide concrete algorithms and hardness results that guide practical minimization efforts.

Abstract

We study the minimization problem for Conjunctive Regular Path Queries (CRPQs) and unions of CRPQs (UCRPQs). This is the problem of checking, given a query and a number $k$, whether the query is equivalent to one of size at most $k$. For CRPQs we consider the size to be the number of atoms, and for UCRPQs the maximum number of atoms in a CRPQ therein, motivated by the fact that the number of atoms has a leading influence on the cost of query evaluation. We show that the minimization problem is decidable, both for CRPQs and UCRPQs. We provide a 2ExpSpace upper-bound for CRPQ minimization, based on a brute-force enumeration algorithm, and an ExpSpace lower-bound. For UCRPQs, we show that the problem is ExpSpace-complete, having thus the same complexity as the classical containment problem. The upper bound is obtained by defining and computing a notion of maximal under-approximation. Moreover, we show that for UCRPQs using the so-called "simple regular expressions" consisting of concatenations of expressions of the form $a^+$ or $a_1 + \dotsb + a_k$, the minimization problem becomes $Π^p_2$-complete, again matching the complexity of containment.

Minimizing Conjunctive Regular Path Queries

TL;DR

This work analyzes the minimization problem for Conjunctive Regular Path Queries (CRPQs) and unions of CRPQs (UCRPQs) in graph databases, providing decidability results and precise complexity bounds. The CRPQ minimization problem admits a 2ExpSpace upper bound via brute-force enumeration and is ExpSpace-hard, while the UCRPQ minimization problem is ExpSpace-complete, with an upper bound achieved through maximal under-approximations. For the SRE fragment, minimization remains PiP2-complete, aligning with containment complexity. The paper also develops a theory of minimality, introducing strong minimality and semantical structure results that relate to the underlying segment graphs, and discusses variable-minimization and extensions to tree patterns as future directions. Overall, the results deepen the theoretical understanding of query optimization for graph databases and provide concrete algorithms and hardness results that guide practical minimization efforts.

Abstract

We study the minimization problem for Conjunctive Regular Path Queries (CRPQs) and unions of CRPQs (UCRPQs). This is the problem of checking, given a query and a number , whether the query is equivalent to one of size at most . For CRPQs we consider the size to be the number of atoms, and for UCRPQs the maximum number of atoms in a CRPQ therein, motivated by the fact that the number of atoms has a leading influence on the cost of query evaluation. We show that the minimization problem is decidable, both for CRPQs and UCRPQs. We provide a 2ExpSpace upper-bound for CRPQ minimization, based on a brute-force enumeration algorithm, and an ExpSpace lower-bound. For UCRPQs, we show that the problem is ExpSpace-complete, having thus the same complexity as the classical containment problem. The upper bound is obtained by defining and computing a notion of maximal under-approximation. Moreover, we show that for UCRPQs using the so-called "simple regular expressions" consisting of concatenations of expressions of the form or , the minimization problem becomes -complete, again matching the complexity of containment.

Paper Structure

This paper contains 30 sections, 19 theorems, 9 equations, 7 figures.

Key Result

proposition 1

Let $\Gamma_1$ and $\Gamma_2$ be "UCRPQs". Then the following are equivalent: (i) $\Gamma_1 \contained \Gamma_2$; (ii) for every $\xi_1\in \Exp(\Gamma_1)$, $\xi_1 \contained \Gamma_2$; (iii) for every $\xi_1\in \Exp(\Gamma_1)$ there is $\xi_2\in \Exp(\Gamma_2)$ such that $\xi_2\homto \xi_1$.

Figures (7)

  • Figure 1: The "segments" of $\gamma$---labels are omitted. Each "segment" has a different color. "Internal variables" are the smaller circles.
  • Figure 2: The "segment graph" of $\gamma$.
  • Figure 5: A "CRPQ" $\gamma$.
  • Figure 6: An "expansion" $\xi$ of $\gamma$, together with its "segments".
  • Figure 7: The "segment graph" of $\xi$.
  • ...and 2 more figures

Theorems & Definitions (22)

  • proposition 1: Folklore, see e.g. Florescu:CRPQ or four-italians
  • proposition 2
  • proposition 3
  • definition 1
  • theorem 4: Semantical Structure
  • proposition 4
  • lemma 1
  • corollary 1
  • Remark 6
  • proposition 5
  • ...and 12 more