Distinct Shortest Walk Enumeration for RPQs

Claire David; Nadime Francis; Victor Marsault

Distinct Shortest Walk Enumeration for RPQs

Claire David, Nadime Francis, Victor Marsault

TL;DR

The paper addresses efficient enumeration of all distinct shortest walks from a source to a target in multi-labeled graph databases that satisfy a regular path query, accounting for nondeterminism in both the data and the query. It introduces an algorithm with linear preprocessing in the database, $O(|\mathcal{D}|\times|\mathcal{A}|)$, and per-output delay $O(\lambda\times|\mathcal{A}|)$, where $\lambda$ is the shortest-walk length, using a backward-search tree and on-the-fly annotations to avoid duplicates and exponential preprocessing. The method supports $\varepsilon$-transitions and regular-expression inputs without extra cost, enabling practical all-shortest-walks semantics for RPQs in real-world graph databases. This has potential impact on query planning and RPQ evaluation in graph DBMS, improving scalability under realistic labeling and nondeterminism conditions.

Abstract

We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(λ\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $λ$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.

Distinct Shortest Walk Enumeration for RPQs

TL;DR

, and per-output delay

, where

is the shortest-walk length, using a backward-search tree and on-the-fly annotations to avoid duplicates and exponential preprocessing. The method supports

-transitions and regular-expression inputs without extra cost, enabling practical all-shortest-walks semantics for RPQs in real-world graph databases. This has potential impact on query planning and RPQ evaluation in graph DBMS, improving scalability under realistic labeling and nondeterminism conditions.

Abstract

We consider the Distinct Shortest Walks problem. Given two vertices

and

of a graph database

and a regular path query, enumerate all walks of minimal length from

that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in

and the delay between two consecutive outputs is in

, where

is a nondeterministic automaton representing the query and

is the minimal length. The algorithm can handle

-transitions in

or queries given as regular expressions at no additional cost.

Paper Structure (9 sections, 2 theorems, 4 equations, 2 figures)

This paper contains 9 sections, 2 theorems, 4 equations, 2 figures.

Introduction
Contributions.
Outline.
Preliminaries
Sets, lists and queues
Graph databases
Automata and queries
Distinct shortest walks
The algorithm

Key Result

Theorem 1

Given a nondeterministic automaton $\mathcal{A}$ with set of states $Q$ and transition table $\Delta$ and a database $\mathcal{D}$ with set of vertices $V$, Distinct Shortest Walks($\mathcal{D}$,$\mathcal{A}$) can be enumerated with delay in $\mathbf{O}\mathopen{}\left(|\mathcal{D}|^{}\times|\Delta|

Figures (2)

Figure 1: A multi-edge multi-labeled graph database
Figure 2: Pseudocode of the main algorithm

Theorems & Definitions (9)

Theorem 1: MartensTrautner2018, combined with FrancisMarsault2023
Theorem 2
Definition 3
Remark 4
Definition 5
Definition 6
Definition 7
Definition 8
Example 9

Distinct Shortest Walk Enumeration for RPQs

TL;DR

Abstract

Distinct Shortest Walk Enumeration for RPQs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)