Table of Contents
Fetching ...

Complexity of Evaluating GQL Queries

Diego Figueira, Anthony W. Lin, Liat Peterfreund

TL;DR

This paper resolves a central question for graph querying by determining the data complexity of evaluating GQL queries on property graphs. It shows that unrestricted GQL evaluation is $PNPLOG$-complete, while restricting to restrictor-free patterns reduces the problem to $NL$-complete, with domain-embedded extensions preserving these bounds. The authors achieve these results by embedding GQL into extensions of first-order logic (transitive closure and existential second-order logic) and by leveraging embedded finite model theory to handle infinite domains and data types. The findings establish tight connections between GQL and relational query languages, enabling meta-querying and more expressive extensions without worsening data complexity, and point to future work in aggregation and broader domain-specific features.

Abstract

GQL has recently emerged as the standard query language over graph databases (particularly, the property graph model). Indeed, this is analogous to the role of SQL for relational databases. Unlike SQL, however, fundamental problems regarding GQL are hitherto still unsolved, most notably the complexity of query evaluation. In this paper we provide a complete solution to this problem. In particular, we show that the data complexity of GQL is $\text{P}^{\text{NP}[\log]}$-complete in general, and is $\text{NL}$-complete, when the so-called ``restrictors'' are disallowed. Using techniques from embedded finite model theory, we show that this is true, even when the queries use data from infinite concrete domains (for example the domain of real numbers where arithmetic is allowed in the query). In proving these results, we establish and exploit tight connections between GQL and query languages over relational databases, especially the extension of relational calculus with transitive closure operators, and a fragment of second-order logic.

Complexity of Evaluating GQL Queries

TL;DR

This paper resolves a central question for graph querying by determining the data complexity of evaluating GQL queries on property graphs. It shows that unrestricted GQL evaluation is -complete, while restricting to restrictor-free patterns reduces the problem to -complete, with domain-embedded extensions preserving these bounds. The authors achieve these results by embedding GQL into extensions of first-order logic (transitive closure and existential second-order logic) and by leveraging embedded finite model theory to handle infinite domains and data types. The findings establish tight connections between GQL and relational query languages, enabling meta-querying and more expressive extensions without worsening data complexity, and point to future work in aggregation and broader domain-specific features.

Abstract

GQL has recently emerged as the standard query language over graph databases (particularly, the property graph model). Indeed, this is analogous to the role of SQL for relational databases. Unlike SQL, however, fundamental problems regarding GQL are hitherto still unsolved, most notably the complexity of query evaluation. In this paper we provide a complete solution to this problem. In particular, we show that the data complexity of GQL is -complete in general, and is -complete, when the so-called ``restrictors'' are disallowed. Using techniques from embedded finite model theory, we show that this is true, even when the queries use data from infinite concrete domains (for example the domain of real numbers where arithmetic is allowed in the query). In proving these results, we establish and exploit tight connections between GQL and query languages over relational databases, especially the extension of relational calculus with transitive closure operators, and a fragment of second-order logic.
Paper Structure (21 sections, 27 theorems, 32 equations, 3 figures)

This paper contains 21 sections, 27 theorems, 32 equations, 3 figures.

Key Result

Theorem 2

The data complexity of "GQL" queries is "PNPLOG"-complete. The data complexity of "GQL" queries without restrictors improves to "NL"-complete.

Figures (3)

  • Figure 1: A "property graph" $G_{\mathtt{bike}}$ of bike lanes and road connections between towns and crossroads.
  • Figure 2: Semantics of GQL's "Path Patterns" and "Conditions"
  • Figure 3: The graph $G^\star$ used in the reduction. Lower layer: Copies $G_i^{\mathsf{lo}}$ of each input graph $G_i$ are linked by edges $s\to s_1$, $t_i\to s_{i+1}$, and odd‐index nodes $v_j$ (label $\ell_0$) are appended via $t_j\to v_j$. Upper layer: Fresh copies $G_j^{\mathsf{up}}$ for $j\ge2$ have sources $s_j'$ connected from every $v_i$ with $i<j$, and all upper targets $t_j'$ feed into the global sink $t$. Source nodes $s,s_i,s_j'$ carry label $s$; targets $t,t_i,t_j'$ carry $t$; and each $v_j$ (odd $j$) carries $\ell_0$.

Theorems & Definitions (31)

  • Example 1
  • Theorem 2
  • Corollary 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 21 more