Table of Contents
Fetching ...

Boundedness for Unions of Conjunctive Regular Path Queries over Simple Regular Expressions

Diego Figueira, S. Krishna, Om Swostik Mishra, Anantha Padmanabha

TL;DR

This work analyzes the boundedness problem for UCRPQs restricted to simple regular expressions, establishing a $Π^P_2$-completeness result that sharpens prior ExpSpace classifications. The authors introduce succinct automata to handle succinctly represented repetitions and prove that, when bounded, a bounded query of linear size can be efficiently produced; they also develop a PiP2 procedure to determine maximal boundedness by letters. A key technical contribution is showing that the membership problem for succinct automata lies in NP, enabling the overall containment-based upper bound. The results have implications for FO- and UCQ-rewritability in ontology-mediated query answering and provide a precise complexity boundary for practical RPQ-based query optimization. The work also characterizes by-letter boundedness and identifies a unique maximal alphabet subset, offering a concrete algorithmic path for boundedness analysis in practical CRPQ/UCRPQ settings.

Abstract

The problem of checking whether a recursive query can be rewritten as query without recursion is a fundamental reasoning task, known as the boundedness problem. Here we study the boundedness problem for Unions of Conjunctive Regular Path Queries (UCRPQs), a navigational query language extensively used in ontology and graph database querying. The boundedness problem for UCRPQs is ExpSpace-complete. Here we focus our analysis on UCRPQs using simple regular expressions, which are of high practical relevance and enjoy a lower reasoning complexity. We show that the complexity for the boundedness problem for this UCRPQs fragment is $Π^P_2$-complete, and that an equivalent bounded query can be produced in polynomial time whenever possible. When the query turns out to be unbounded, we also study the task of finding an equivalent maximally bounded query, which we show to be feasible in $Π^P_2$. As a side result of independent interest stemming from our developments, we study a notion of succinct finite automata and prove that its membership problem is in NP.

Boundedness for Unions of Conjunctive Regular Path Queries over Simple Regular Expressions

TL;DR

This work analyzes the boundedness problem for UCRPQs restricted to simple regular expressions, establishing a -completeness result that sharpens prior ExpSpace classifications. The authors introduce succinct automata to handle succinctly represented repetitions and prove that, when bounded, a bounded query of linear size can be efficiently produced; they also develop a PiP2 procedure to determine maximal boundedness by letters. A key technical contribution is showing that the membership problem for succinct automata lies in NP, enabling the overall containment-based upper bound. The results have implications for FO- and UCQ-rewritability in ontology-mediated query answering and provide a precise complexity boundary for practical RPQ-based query optimization. The work also characterizes by-letter boundedness and identifies a unique maximal alphabet subset, offering a concrete algorithmic path for boundedness analysis in practical CRPQ/UCRPQ settings.

Abstract

The problem of checking whether a recursive query can be rewritten as query without recursion is a fundamental reasoning task, known as the boundedness problem. Here we study the boundedness problem for Unions of Conjunctive Regular Path Queries (UCRPQs), a navigational query language extensively used in ontology and graph database querying. The boundedness problem for UCRPQs is ExpSpace-complete. Here we focus our analysis on UCRPQs using simple regular expressions, which are of high practical relevance and enjoy a lower reasoning complexity. We show that the complexity for the boundedness problem for this UCRPQs fragment is -complete, and that an equivalent bounded query can be produced in polynomial time whenever possible. When the query turns out to be unbounded, we also study the task of finding an equivalent maximally bounded query, which we show to be feasible in . As a side result of independent interest stemming from our developments, we study a notion of succinct finite automata and prove that its membership problem is in NP.
Paper Structure (14 sections, 20 theorems, 2 equations, 4 figures)

This paper contains 14 sections, 20 theorems, 2 equations, 4 figures.

Key Result

Lemma 1

Given two "UCRPQ"s $q$ and $q'$, we have $q \semsubset q'$ if, and only if, for every $\lambda\in \Exp(q)$ there is $\lambda' \in \Exp(q')$ such that there exists a "homomorphism" $\lambda'\homto \lambda$.

Figures (4)

  • Figure 1: Consider the query $x \xrightarrow{(aba)^n} y \wedge x \xrightarrow{(aba)^*} z$, where $(aba)^n$ is a $\SSF$ with $n=11$ in binary ("ie", 3).. Below, we have the corresponding coloring scheme in $\lambda^+$ with the respective colors according to a "homomorphism" $h$.
  • Figure 2: (Left) An example of a "homomorphism" from $\lambda'$ to $\lambda^+$ with a large "purely-red" interval $I$. In $\lambda'$, red (resp. blue) edges correspond to "expansions" of "recursive atoms" (resp. "non-recursive" "atoms"), and green vertices correspond to variables of $q$. The $h$-preimage of $I$ contains two intervals $I_1$ and $I_2$ coming from recursive atom expansions of words of length 2 and 3 respectively. (Right) The resulting $\lambda"$. The intervals $I_1, I_2$ are contracted to a single variable in the "homomorphism" to $\lambda^+$. The image of the variables which appeared to the right of $I$ are now shifted to the left, making the number of uncolored variables grow by $|I|-1$.
  • Figure 3: Query $q_1$ used in the proof of \ref{['thm:lowerbound']}. Variables with identical $y_{i,\alpha}$ label of the gadgets $D$ and $E$ (across all $E$) represent the same variable ("eg", $y_{1,f}$ in $D$ and $E$ are the same variable).
  • Figure 4: An example for query $q_2$, used in the proof of \ref{['thm:lowerbound']}, for the clause $(x_2\vee \neg x_5\vee\neg y_4)$. There will be one such gadget for every clause of the formula in $q_2$. Variables having identical $y_{i,\textit{tf}}$-label represent the same variable. Only the final variable of paths representing $y_i$-variables from $\Phi$ may have a $y_{i,\textit{tf}}$-label.

Theorems & Definitions (41)

  • Lemma 1: Folklore
  • proposition 1
  • Theorem 2
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem \ref{prop:mem-succinct-NFA-NP}: Restatement
  • proof
  • Claim 7
  • proof
  • ...and 31 more