Revisiting the Expressiveness Landscape of Data Graph Queries
Michael Benedikt, Anthony Widjaja Lin, Di-De Yen
TL;DR
This work analyzes the expressive power of graph query languages for data graphs, focusing on three canonical families: $RPQ$-based extensions, Walk Logic ($WL$), and first-order logic with transitive closure, and shows how data coupling adds complexity. It demonstrates that $FO(ERDPQ)$ subsumes several existing languages ($WL$, $RDPQ$, $GPC$) while $FO^*( ext{≡data})$ subsumes $RDPQ$ but is incomparable with others, outlining a rich landscape of expressiveness. To unify these approaches, the paper introduces $FO^*(ERDPQ)$, extending $FO(ERDPQ)$ with transitive closure, which subsumes all prior languages and provides a single maximal framework, albeit with non-elementary worst-case data complexity. Additionally, it introduces Multi-Path Walk Logic (MWL), an extension of WL with multi-path comparisons, which is strictly more expressive than WL and is expressible within $FO(ERDPQ)$ but does not reach the full unifying power of $FO^*(ERDPQ)$. The results offer a conceptual and technical bridge for graph querying, guiding future work on tractable fragments and practical implementations.
Abstract
The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. These languages differ in the mechanisms. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper provides a complete picture of the expressive power of these languages in the context of data graphs. Specifically, we consider a graph data model that supports querying over both data and topology. For example, "Does there exist a path between two different persons in a social network with the same last name?". We also show that an extension of (1), augmented with transitive closure operators, can unify the expressivity of (1)--(3) without increasing the query evaluation complexity.
