Table of Contents
Fetching ...

Fast Answering Pattern-Constrained Reachability Queries with Two-Dimensional Reachability Index

Huihui Yang, Pingpeng Yuan

TL;DR

The paper introduces Pattern-Constrained Reachability (PCR), enabling composite logical constraints over edge labels on paths in directed, edge-labeled graphs. It proves PCR is $NP$-hard and proposes the Two-Dimensional Reachability (TDR) index, which combines a horizontal multi-way hashing filter with a vertical short-path index to prune search space during query answering. TDR partitions each vertex's reachable space into groups and projects each group onto horizontal and vertical dimensions, enabling efficient pruning and faster query processing for PCR and related LCR scenarios. Extensive experiments on real and synthetic graphs show that TDR achieves substantially smaller index size and faster PCR/LCR query answering than state-of-the-art complete-index methods, particularly on large, sparse graphs.

Abstract

Reachability queries ask whether there exists a path from the source vertex to the target vertex on a graph. Recently, several powerful reachability queries, such as Label-Constrained Reachability (LCR) queries and Regular Path Queries (RPQ), have been proposed for emerging complex edge-labeled digraphs. However, they cannot allow users to describe complex query requirements by composing query patterns. Here, we introduce composite patterns, a logical expression of patterns that can express complex constraints on the set of labels. Based on pattern, we propose pattern-constrained reachability queries (PCR queries). However, answering PCR queries is NP-hard. Thus, to improve the performance to answer PCR queries, we build a two-dimensional reachability (TDR for short) index which consists of a multi-way index (horizontal dimension) and a path index (vertical dimension). Because the number of combinations of both labels and vertices is exponential, it is very expensive to build full indices that contain all the reachability information. Thus, the reachable vertices of a vertex are decomposed into blocks, each of which is hashed into the horizontal dimension index and the vertical dimension index, respectively. The indices in the horizontal dimension and the vertical dimension serve as a global filter and a local filter, respectively, to prune the search space. Experimental results demonstrate that our index size and indexing time outperform the state-of-the-art label-constrained reachability indexing technique on 16 real datasets. TDR can efficiently answer pattern-constrained reachability queries, including label-constrained reachability queries.

Fast Answering Pattern-Constrained Reachability Queries with Two-Dimensional Reachability Index

TL;DR

The paper introduces Pattern-Constrained Reachability (PCR), enabling composite logical constraints over edge labels on paths in directed, edge-labeled graphs. It proves PCR is -hard and proposes the Two-Dimensional Reachability (TDR) index, which combines a horizontal multi-way hashing filter with a vertical short-path index to prune search space during query answering. TDR partitions each vertex's reachable space into groups and projects each group onto horizontal and vertical dimensions, enabling efficient pruning and faster query processing for PCR and related LCR scenarios. Extensive experiments on real and synthetic graphs show that TDR achieves substantially smaller index size and faster PCR/LCR query answering than state-of-the-art complete-index methods, particularly on large, sparse graphs.

Abstract

Reachability queries ask whether there exists a path from the source vertex to the target vertex on a graph. Recently, several powerful reachability queries, such as Label-Constrained Reachability (LCR) queries and Regular Path Queries (RPQ), have been proposed for emerging complex edge-labeled digraphs. However, they cannot allow users to describe complex query requirements by composing query patterns. Here, we introduce composite patterns, a logical expression of patterns that can express complex constraints on the set of labels. Based on pattern, we propose pattern-constrained reachability queries (PCR queries). However, answering PCR queries is NP-hard. Thus, to improve the performance to answer PCR queries, we build a two-dimensional reachability (TDR for short) index which consists of a multi-way index (horizontal dimension) and a path index (vertical dimension). Because the number of combinations of both labels and vertices is exponential, it is very expensive to build full indices that contain all the reachability information. Thus, the reachable vertices of a vertex are decomposed into blocks, each of which is hashed into the horizontal dimension index and the vertical dimension index, respectively. The indices in the horizontal dimension and the vertical dimension serve as a global filter and a local filter, respectively, to prune the search space. Experimental results demonstrate that our index size and indexing time outperform the state-of-the-art label-constrained reachability indexing technique on 16 real datasets. TDR can efficiently answer pattern-constrained reachability queries, including label-constrained reachability queries.

Paper Structure

This paper contains 26 sections, 1 theorem, 6 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

PCR is an NP-hard problem.

Figures (6)

  • Figure 1: An illustrative example of three types of reachability queries on a graph of transportation network.
  • Figure 2: An edge-labeled digraph with 10 vertices and 5 labels, and the digraph with vertices and labels hashed.
  • Figure 3: The two-dimensional reachability index. The traversal tree starting from $u_i$ branches into $g_i$ distinct ways (denoted $w_1,\dots, w_{g_i}$), with each way (e.g. $w_j$) comprising one or more branches (e.g. $b_{j,1},\dots, b_{j,n_j}$) starting from neighbors of $u_i$ (e.g. $v_{j,1},\dots, v_{j,n_j}$). Each way is then projected onto both the horizontal and vertical dimensions, with the index in each dimension comprising two sub-indices for reachable vertex ($\mathcal{H}^{vtx}$,$\mathcal{V}^{vtx}$) and labels ($\mathcal{H}^{lab}$, $\mathcal{V}^{lab}$).
  • Figure 4: Indexing time, index space and execution time of $\mathbb{AND}$-, $\mathbb{OR}$- and $\mathbb{NOT}$-queries for ER-datasets with $|V|=200k$
  • Figure 5: Indexing time, index space and execution time of $\mathbb{AND}$-, $\mathbb{OR}$- and $\mathbb{NOT}$-queries for PA-datasets with $|V|=200k$
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: Edge-Labeled Digraph
  • Definition 2: Reachability
  • Definition 3: Pattern
  • Definition 4: Pattern-Constrained Reachability Queries
  • Example 1
  • Example 2
  • Example 3
  • Theorem 1
  • proof