Table of Contents
Fetching ...

Efficient Path Query Processing in Relational Database Systems

Diego Rivera Correa, Mirek Riedewald

Abstract

Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.

Efficient Path Query Processing in Relational Database Systems

Abstract

Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.

Paper Structure

This paper contains 19 sections, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Graph representing accounts (vertices) transactions between them (edges). The label is shown next to the ID; properties are shown inside the box.
  • Figure 2: Runtime of ReCAP vs. SOA competitors for varying path-length limits. All competitors scale poorly and exceed a 2-hour timeout for $\ell>5$.
  • Figure 3: Query plan for \ref{['ex:q1_bitcoin']} on Neo4j. It shows all paths up to length $\ell$ being generated first (left-most column), and then the constraints are applied in the end (middle and right-most column).
  • Figure 4: Left: NFA for regex $\texttt{Domestic}^+ \texttt{Foreign}$. Right: Tabular representation of the NFA.
  • Figure 5: Default construction for property-constraint evaluation. When transitioning upon appending a new edge $e$ to a path, all relevant edge data is stored in list $D$. In an accepting state, property constraint $\varphi$ is evaluated on $D$.
  • ...and 9 more figures

Theorems & Definitions (13)

  • Example 1: Query $Q_A$
  • Definition 1: Property Graph angles2017foundations
  • Example 2
  • Definition 2: path, word
  • Example 3
  • Definition 3: regular expression, language
  • Definition 4: path query
  • Definition 5: NFA
  • Example 4: query $Q_B$
  • Example 5: Wasted Computation
  • ...and 3 more