Distinct Shortest Walk Enumeration for RPQs
Claire David, Nadime Francis, Victor Marsault
TL;DR
The paper addresses efficient enumeration of all distinct shortest walks from a source to a target in multi-labeled graph databases that satisfy a regular path query, accounting for nondeterminism in both the data and the query. It introduces an algorithm with linear preprocessing in the database, $O(|\mathcal{D}|\times|\mathcal{A}|)$, and per-output delay $O(\lambda\times|\mathcal{A}|)$, where $\lambda$ is the shortest-walk length, using a backward-search tree and on-the-fly annotations to avoid duplicates and exponential preprocessing. The method supports $\varepsilon$-transitions and regular-expression inputs without extra cost, enabling practical all-shortest-walks semantics for RPQs in real-world graph databases. This has potential impact on query planning and RPQ evaluation in graph DBMS, improving scalability under realistic labeling and nondeterminism conditions.
Abstract
We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(λ\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $λ$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.
