Table of Contents
Fetching ...

GPC: A Pattern Calculus for Property Graphs

Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, Domagoj Vrgoč

TL;DR

This paper addresses the gap between practical graph query languages and formal theory by introducing Graph Pattern-matching Calculus (GPC) as a core, analyzable foundation for GQL and SQL/PGQ. It defines syntax, a robust type system, and a formal semantics for GPC, including extensions like GPC+ to capture common query classes such as UC2RPQs, NREs, and regular queries. The authors analyze expressivity and complexity, showing finite, analyzable output through a compositional semantics, and provide results on enumeration, data vs combined complexity, and undecidability when arithmetic is added. The work also outlines extensions and demonstrates how theory has already informed standard drafting, establishing GPC as a tool for principled study and future enhancements in graph querying standards.

Abstract

The development of practical query languages for graph databases runs well ahead of the underlying theory. The ISO committee in charge of database query languages is currently developing a new standard called Graph Query Language (GQL) as well as an extension of the SQL Standard for querying property graphs represented by a relational schema, called SQL/PGQ. The main component of both is the pattern matching facility, which is shared by the two standards. In many aspects, it goes well beyond RPQs, CRPQs, and similar queries on which the research community has focused for years. Our main contribution is to distill the lengthy standard specification into a simple Graph Pattern Calculus (GPC) that reflects all the key pattern matching features of GQL and SQL/PGQ, and at the same time lends itself to rigorous theoretical investigation. We describe the syntax and semantics of GPC, along with the typing rules that ensure its expressions are well-defined, and state some basic properties of the language. With this paper we provide the community a tool to embark on a study of query languages that will soon be widely adopted by industry.

GPC: A Pattern Calculus for Property Graphs

TL;DR

This paper addresses the gap between practical graph query languages and formal theory by introducing Graph Pattern-matching Calculus (GPC) as a core, analyzable foundation for GQL and SQL/PGQ. It defines syntax, a robust type system, and a formal semantics for GPC, including extensions like GPC+ to capture common query classes such as UC2RPQs, NREs, and regular queries. The authors analyze expressivity and complexity, showing finite, analyzable output through a compositional semantics, and provide results on enumeration, data vs combined complexity, and undecidability when arithmetic is added. The work also outlines extensions and demonstrates how theory has already informed standard drafting, establishing GPC as a tool for principled study and future enhancements in graph querying standards.

Abstract

The development of practical query languages for graph databases runs well ahead of the underlying theory. The ISO committee in charge of database query languages is currently developing a new standard called Graph Query Language (GQL) as well as an extension of the SQL Standard for querying property graphs represented by a relational schema, called SQL/PGQ. The main component of both is the pattern matching facility, which is shared by the two standards. In many aspects, it goes well beyond RPQs, CRPQs, and similar queries on which the research community has focused for years. Our main contribution is to distill the lengthy standard specification into a simple Graph Pattern Calculus (GPC) that reflects all the key pattern matching features of GQL and SQL/PGQ, and at the same time lends itself to rigorous theoretical investigation. We describe the syntax and semantics of GPC, along with the typing rules that ensure its expressions are well-defined, and state some basic properties of the language. With this paper we provide the community a tool to embark on a study of query languages that will soon be widely adopted by industry.
Paper Structure (26 sections, 13 theorems, 46 equations, 3 figures)

This paper contains 26 sections, 13 theorems, 46 equations, 3 figures.

Key Result

Proposition 2

For every well-typed expression $\xi$, variable $x$, and types $\tau, \tau'$,

Figures (3)

  • Figure 1: GPC expressions
  • Figure 2: Typing rules for the GPC type system.
  • Figure 3: Refactorization of a path $p=p_1p_2\cdots p_{10}$ as $p=p'_1p'_2\cdots p'_{7}$ by grouping consecutive edgeless factors

Theorems & Definitions (31)

  • Definition 1
  • Proposition 2
  • proof
  • Definition 3
  • Proposition 4
  • Definition 5
  • Remark 6
  • Definition 7
  • Remark 8
  • Proposition 9
  • ...and 21 more