Table of Contents
Fetching ...

Path-based Algebraic Foundations of Graph Query Languages

Renzo Angles, Angela Bonifati, Roberto García, Domagoj Vrgoč

TL;DR

This work addresses the lack of a cohesive algebra for evaluating path queries in property graphs, aiming to harmonize concepts in GQL and SQL/PGQ and enable robust query optimization. It introduces a path-based algebra with a core set of operators (Selection, Join, Union), extended by a Recursive Path Algebra using a recursive operator $phi$, and enriched by path modes through group-by, order-by, and projection. The framework demonstrates expressiveness by encoding core fragments of the two standards and provides formal semantics for selectors/restrictors, as well as evaluation trees that double as logical plans. An open-source parser and a GQL-extension showcase how the algebra can serve as a lingua franca for path queries across implementations and standards, promoting comparability and future optimization opportunities.

Abstract

Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data models and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, there exists a panoply of concrete graph query languages provided by current graph database systems, each offering different query features. A common limitation of current graph query engines is the absence of an algebraic approach for evaluating path queries. To address this, we introduce an abstract algebra for evaluating path queries, allowing paths to be treated as first-class entities within the query processing pipeline. We demonstrate that our algebra can express a core fragment of path queries defined in GQL and SQL/PGQ, thereby serving as a formal framework for studying both standards and supporting their implementation in current graph database systems. We also show that evaluation trees for path algebra expressions can function as logical plans for evaluating path queries and enable the application of query optimization techniques. Our algebraic framework has the potential to act as a lingua franca for path query evaluation, enabling different implementations to be expressed and compared.

Path-based Algebraic Foundations of Graph Query Languages

TL;DR

This work addresses the lack of a cohesive algebra for evaluating path queries in property graphs, aiming to harmonize concepts in GQL and SQL/PGQ and enable robust query optimization. It introduces a path-based algebra with a core set of operators (Selection, Join, Union), extended by a Recursive Path Algebra using a recursive operator , and enriched by path modes through group-by, order-by, and projection. The framework demonstrates expressiveness by encoding core fragments of the two standards and provides formal semantics for selectors/restrictors, as well as evaluation trees that double as logical plans. An open-source parser and a GQL-extension showcase how the algebra can serve as a lingua franca for path queries across implementations and standards, promoting comparability and future optimization opportunities.

Abstract

Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data models and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, there exists a panoply of concrete graph query languages provided by current graph database systems, each offering different query features. A common limitation of current graph query engines is the absence of an algebraic approach for evaluating path queries. To address this, we introduce an abstract algebra for evaluating path queries, allowing paths to be treated as first-class entities within the query processing pipeline. We demonstrate that our algebra can express a core fragment of path queries defined in GQL and SQL/PGQ, thereby serving as a formal framework for studying both standards and supporting their implementation in current graph database systems. We also show that evaluation trees for path algebra expressions can function as logical plans for evaluating path queries and enable the application of query optimization techniques. Our algebraic framework has the potential to act as a lingua franca for path query evaluation, enabling different implementations to be expressed and compared.
Paper Structure (24 sections, 3 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: A graph representing a social network (drawn from the LDBC SNB benchmark).
  • Figure 2: The algebraic plan of a recursive graph query ($\phi_{}$ being the recursive operator corresponding to Kleene plus).
  • Figure 3: Example of query tree obtained from a core path algebra expression.
  • Figure 4: Evaluation tree of a recursive path algebra query.
  • Figure 5: A query plan including order-by, group-by and projection.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 2.1
  • Definition 3.1: Core Path Algebra
  • Definition 4.1
  • Definition 5.1