GQL and SQL/PGQ: Theoretical Models and Expressive Power

Amélie Gheerbrant; Leonid Libkin; Liat Peterfreund; Alexandra Rogova

GQL and SQL/PGQ: Theoretical Models and Expressive Power

Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, Alexandra Rogova

TL;DR

The paper formalizes SQL/PGQ and GQL through Core PGQ (RA-based) and Core GQL (LCRA-based), establishing a concise theoretical model that clarifies their expressiveness and limitations. It proves that pattern matching in these languages cannot express certain natural queries (e.g., increasing values along edges) and demonstrates, both theoretically and experimentally, that practical workarounds are inefficient. It then contrasts Core GQL/PGQ with positive recursive SQL and linear Datalog, showing expressivity gaps and suggesting extensions to restore compositionality and two-way interoperability between pattern matching and relational querying. The work provides a foundation for guiding future language design, emphasizing the need for language extensions to capture a broader class of graph queries without sacrificing practical tractability. This formalization offers a basis for evaluating extensions, tool support, and performance trade-offs in next versions of graph standards.

Abstract

SQL/PGQ and GQL are very recent international standards for querying property graphs: SQL/PGQ specifies how to query relational representations of property graphs in SQL, while GQL is a standalone language for graph databases. The rapid industrial development of these standards left the academic community trailing in its wake. While digests of the languages have appeared, we do not yet have concise foundational models like relational algebra and calculus for relational databases that enable the formal study of languages, including their expressiveness and limitations. At the same time, work on the next versions of the standards has already begun, to address the perceived limitations of their first versions. Motivated by this, we initiate a formal study of SQL/PGQ and GQL, concentrating on their concise formal model and expressiveness. For the former, we define simple core languages -- Core GQL and Core PGQ -- that capture the essence of the new standards, are amenable to theoretical analysis, and fully clarify the difference between PGQ's bottom up evaluation versus GQL's linear, or pipelined approach. Equipped with these models, we both confirm the necessity to extend the language to fill in the expressiveness gaps and identify the source of these deficiencies. We complement our theoretical analysis with an experimental study, demonstrating that existing workarounds in full GQL and PGQ are impractical which further underscores the necessity to correct deficiencies in the language design.

GQL and SQL/PGQ: Theoretical Models and Expressive Power

TL;DR

Abstract

Paper Structure (35 sections, 24 theorems, 28 equations, 3 figures)

This paper contains 35 sections, 24 theorems, 28 equations, 3 figures.

Introduction
Formal models of GQL and SQL/PGQ
Limitations of pattern matching
GQL vs Recursive SQL vs Datalog
GQL and SQL/PGQ by examples
Linear Composition Relational Algebra
Schemas, databases, and notations.
Relational Algebra (RA)
Linear Composition Relational Algebra (LCRA)
Expressivity results
The origins of linear composition
GQL and SQL/PGQ: theoretical abstractions
Property Graphs
Pattern Matching: Turning Graphs into Relations
Correspondence with Cypher and GQL
...and 20 more sections

Key Result

Theorem 3.1

For every schema $\mathbf S$, languages $\mathsf{RA}(\mathbf S)$ and $\mathsf{LCRA}(\mathbf S)$ are equivalent.

Figures (3)

Figure 1: A labeled property graph
Figure 2: Semantics of patterns and patterns with output
Figure 3: Timeouts and median running time of $Q^{\mathsf{E}}_{<}$

Theorems & Definitions (25)

Theorem 3.1
proposition 1
Definition 4.1: Property Graph
proposition 2
corollary 1
proposition 3
Theorem 5.1
Theorem 5.2
Theorem 6.1
Lemma 10.1
...and 15 more

GQL and SQL/PGQ: Theoretical Models and Expressive Power

TL;DR

Abstract

GQL and SQL/PGQ: Theoretical Models and Expressive Power

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (25)