Complexity of Evaluating GQL Queries
Diego Figueira, Anthony W. Lin, Liat Peterfreund
TL;DR
This paper resolves a central question for graph querying by determining the data complexity of evaluating GQL queries on property graphs. It shows that unrestricted GQL evaluation is $PNPLOG$-complete, while restricting to restrictor-free patterns reduces the problem to $NL$-complete, with domain-embedded extensions preserving these bounds. The authors achieve these results by embedding GQL into extensions of first-order logic (transitive closure and existential second-order logic) and by leveraging embedded finite model theory to handle infinite domains and data types. The findings establish tight connections between GQL and relational query languages, enabling meta-querying and more expressive extensions without worsening data complexity, and point to future work in aggregation and broader domain-specific features.
Abstract
GQL has recently emerged as the standard query language over graph databases (particularly, the property graph model). Indeed, this is analogous to the role of SQL for relational databases. Unlike SQL, however, fundamental problems regarding GQL are hitherto still unsolved, most notably the complexity of query evaluation. In this paper we provide a complete solution to this problem. In particular, we show that the data complexity of GQL is $\text{P}^{\text{NP}[\log]}$-complete in general, and is $\text{NL}$-complete, when the so-called ``restrictors'' are disallowed. Using techniques from embedded finite model theory, we show that this is true, even when the queries use data from infinite concrete domains (for example the domain of real numbers where arithmetic is allowed in the query). In proving these results, we establish and exploit tight connections between GQL and query languages over relational databases, especially the extension of relational calculus with transitive closure operators, and a fragment of second-order logic.
