Table of Contents
Fetching ...

Distinct Shortest Walk Enumeration for RPQs

Claire David, Nadime Francis, Victor Marsault

TL;DR

The paper addresses efficient enumeration of all distinct shortest walks from a source to a target in multi-labeled graph databases that satisfy a regular path query, accounting for nondeterminism in both the data and the query. It introduces an algorithm with linear preprocessing in the database, $O(|\mathcal{D}|\times|\mathcal{A}|)$, and per-output delay $O(\lambda\times|\mathcal{A}|)$, where $\lambda$ is the shortest-walk length, using a backward-search tree and on-the-fly annotations to avoid duplicates and exponential preprocessing. The method supports $\varepsilon$-transitions and regular-expression inputs without extra cost, enabling practical all-shortest-walks semantics for RPQs in real-world graph databases. This has potential impact on query planning and RPQ evaluation in graph DBMS, improving scalability under realistic labeling and nondeterminism conditions.

Abstract

We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(λ\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $λ$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.

Distinct Shortest Walk Enumeration for RPQs

TL;DR

The paper addresses efficient enumeration of all distinct shortest walks from a source to a target in multi-labeled graph databases that satisfy a regular path query, accounting for nondeterminism in both the data and the query. It introduces an algorithm with linear preprocessing in the database, , and per-output delay , where is the shortest-walk length, using a backward-search tree and on-the-fly annotations to avoid duplicates and exponential preprocessing. The method supports -transitions and regular-expression inputs without extra cost, enabling practical all-shortest-walks semantics for RPQs in real-world graph databases. This has potential impact on query planning and RPQ evaluation in graph DBMS, improving scalability under realistic labeling and nondeterminism conditions.

Abstract

We consider the Distinct Shortest Walks problem. Given two vertices and of a graph database and a regular path query, enumerate all walks of minimal length from to that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in and the delay between two consecutive outputs is in , where is a nondeterministic automaton representing the query and is the minimal length. The algorithm can handle -transitions in or queries given as regular expressions at no additional cost.
Paper Structure (9 sections, 2 theorems, 4 equations, 2 figures)

This paper contains 9 sections, 2 theorems, 4 equations, 2 figures.

Key Result

Theorem 1

Given a nondeterministic automaton $\mathcal{A}$ with set of states $Q$ and transition table $\Delta$ and a database $\mathcal{D}$ with set of vertices $V$, Distinct Shortest Walks($\mathcal{D}$,$\mathcal{A}$) can be enumerated with delay in $\mathbf{O}\mathopen{}\left(|\mathcal{D}|^{}\times|\Delta|

Figures (2)

  • Figure 1: A multi-edge multi-labeled graph database
  • Figure 2: Pseudocode of the main algorithm

Theorems & Definitions (9)

  • Theorem 1: MartensTrautner2018, combined with FrancisMarsault2023
  • Theorem 2
  • Definition 3
  • Remark 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Example 9