Table of Contents
Fetching ...

Revisiting the Expressiveness Landscape of Data Graph Queries

Michael Benedikt, Anthony Widjaja Lin, Di-De Yen

TL;DR

This work analyzes the expressive power of graph query languages for data graphs, focusing on three canonical families: $RPQ$-based extensions, Walk Logic ($WL$), and first-order logic with transitive closure, and shows how data coupling adds complexity. It demonstrates that $FO(ERDPQ)$ subsumes several existing languages ($WL$, $RDPQ$, $GPC$) while $FO^*( ext{≡data})$ subsumes $RDPQ$ but is incomparable with others, outlining a rich landscape of expressiveness. To unify these approaches, the paper introduces $FO^*(ERDPQ)$, extending $FO(ERDPQ)$ with transitive closure, which subsumes all prior languages and provides a single maximal framework, albeit with non-elementary worst-case data complexity. Additionally, it introduces Multi-Path Walk Logic (MWL), an extension of WL with multi-path comparisons, which is strictly more expressive than WL and is expressible within $FO(ERDPQ)$ but does not reach the full unifying power of $FO^*(ERDPQ)$. The results offer a conceptual and technical bridge for graph querying, guiding future work on tractable fragments and practical implementations.

Abstract

The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. These languages differ in the mechanisms. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper provides a complete picture of the expressive power of these languages in the context of data graphs. Specifically, we consider a graph data model that supports querying over both data and topology. For example, "Does there exist a path between two different persons in a social network with the same last name?". We also show that an extension of (1), augmented with transitive closure operators, can unify the expressivity of (1)--(3) without increasing the query evaluation complexity.

Revisiting the Expressiveness Landscape of Data Graph Queries

TL;DR

This work analyzes the expressive power of graph query languages for data graphs, focusing on three canonical families: -based extensions, Walk Logic (), and first-order logic with transitive closure, and shows how data coupling adds complexity. It demonstrates that subsumes several existing languages (, , ) while subsumes but is incomparable with others, outlining a rich landscape of expressiveness. To unify these approaches, the paper introduces , extending with transitive closure, which subsumes all prior languages and provides a single maximal framework, albeit with non-elementary worst-case data complexity. Additionally, it introduces Multi-Path Walk Logic (MWL), an extension of WL with multi-path comparisons, which is strictly more expressive than WL and is expressible within but does not reach the full unifying power of . The results offer a conceptual and technical bridge for graph querying, guiding future work on tractable fragments and practical implementations.

Abstract

The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. These languages differ in the mechanisms. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper provides a complete picture of the expressive power of these languages in the context of data graphs. Specifically, we consider a graph data model that supports querying over both data and topology. For example, "Does there exist a path between two different persons in a social network with the same last name?". We also show that an extension of (1), augmented with transitive closure operators, can unify the expressivity of (1)--(3) without increasing the query evaluation complexity.

Paper Structure

This paper contains 15 sections, 18 theorems, 26 equations, 9 figures.

Key Result

Theorem 1

The subsumptions depicted in Figure fig:existing all hold.

Figures (9)

  • Figure 1: Prior query languages (extended with data). In the diagrams, for any pair of languages $L$ and $M$, the arrow $L \rightarrow M$ signifies that $M$ is more expressive than $L$. Languages $L$ and $M$ are considered incomparable if there is no directed path (after taking transitive closure) between them.
  • Figure 2: Expressiveness of languages.
  • Figure 3: Data graph $G$, where p stands for parent, s for spouse, and f stands for friend.
  • Figure 4: RDPA $A$ over $\Sigma=\{a\}$ with one register $x \in X_\textsc{data} \xspace$.
  • Figure 5: Data graph $G$ over $\{a,b\}$, where $\textsc{dataof} \xspace(n_1)=\textsc{dataof} \xspace(n_2)=0$ and $\textsc{dataof} \xspace(n_i)=i$ for $i=3,\dots,7$.
  • ...and 4 more figures

Theorems & Definitions (34)

  • Definition 1
  • Definition 2
  • Definition 3
  • Example 1
  • Theorem 1
  • Theorem 2
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • ...and 24 more