Table of Contents
Fetching ...

Batch Hop-Constrained s-t Simple Path Query Processing in Large Graphs

Long Yuan, Kongzhang Hao, Xuemin Lin, Wenjie Zhang

TL;DR

The paper tackles batch processing of hop-constrained $s$-$t$ simple path queries by introducing HC-s path queries and a two-phase approach to identify dominating substructures and share computations. The proposed BatchEnum framework extends the single-query PathEnum with a clustering-based domination detection and a topologically ordered, cache-enabled enumeration that reuses intermediate results. Empirical evaluations on real-world graphs with billions of edges demonstrate substantial speedups (up to 7.7x average, 6.4x typical) and strong scalability compared to baselines and some K-shortest-path methods. This work enables efficient, scalable batch HC-s-t path query processing, with practical implications for fraud detection, biological pathway analysis, and knowledge graph completion.

Abstract

Hop-constrained s-t simple path (HC-s-t path) enumeration is a fundamental problem in graph analysis. Existing solutions for this problem focus on optimizing the processing performance of a single query. However, in practice, it is more often that multiple HC-s-t path queries are issued simultaneously and processed as a batch. Therefore, we study the problem of batch HC-s-t path query processing in this paper and aim to compute the results of all queries concurrently and efficiently as a batch. To achieve this goal, we first propose the concept of HC-s path query which can precisely characterize the common computation among different queries.We then devise a two-phase HC-s path query detection algorithm to identify the common HC-s path queries for the given HC-s-t path queries. Based on the detected HC-s path queries, we further devise an efficient HC-s-t path enumeration algorithm in which the common computation represented by HC-s path queries are effectively shared. We conduct extensive experiments on real-world graphs and the experimental results demonstrate that our proposed algorithm is efficient and scalable regarding processing multiple HC-s-t path queries in large graphs at billion-scale.

Batch Hop-Constrained s-t Simple Path Query Processing in Large Graphs

TL;DR

The paper tackles batch processing of hop-constrained - simple path queries by introducing HC-s path queries and a two-phase approach to identify dominating substructures and share computations. The proposed BatchEnum framework extends the single-query PathEnum with a clustering-based domination detection and a topologically ordered, cache-enabled enumeration that reuses intermediate results. Empirical evaluations on real-world graphs with billions of edges demonstrate substantial speedups (up to 7.7x average, 6.4x typical) and strong scalability compared to baselines and some K-shortest-path methods. This work enables efficient, scalable batch HC-s-t path query processing, with practical implications for fraud detection, biological pathway analysis, and knowledge graph completion.

Abstract

Hop-constrained s-t simple path (HC-s-t path) enumeration is a fundamental problem in graph analysis. Existing solutions for this problem focus on optimizing the processing performance of a single query. However, in practice, it is more often that multiple HC-s-t path queries are issued simultaneously and processed as a batch. Therefore, we study the problem of batch HC-s-t path query processing in this paper and aim to compute the results of all queries concurrently and efficiently as a batch. To achieve this goal, we first propose the concept of HC-s path query which can precisely characterize the common computation among different queries.We then devise a two-phase HC-s path query detection algorithm to identify the common HC-s path queries for the given HC-s-t path queries. Based on the detected HC-s path queries, we further devise an efficient HC-s-t path enumeration algorithm in which the common computation represented by HC-s path queries are effectively shared. We conduct extensive experiments on real-world graphs and the experimental results demonstrate that our proposed algorithm is efficient and scalable regarding processing multiple HC-s-t path queries in large graphs at billion-scale.
Paper Structure (10 sections, 13 figures, 1 table, 4 algorithms)

This paper contains 10 sections, 13 figures, 1 table, 4 algorithms.

Figures (13)

  • Figure 1: A graph $G$ and multiple $$HC-$$s-$$t paths queries $Q$
  • Figure 2: Procedures of $$BasicEnum
  • Figure 3: Main observation of our approach
  • Figure 4: Example of query clustering ($\gamma = 0.8$)
  • Figure 5: Example of common $$HC-$$s$$ path query
  • ...and 8 more figures