Batch Hop-Constrained s-t Simple Path Query Processing in Large Graphs
Long Yuan, Kongzhang Hao, Xuemin Lin, Wenjie Zhang
TL;DR
The paper tackles batch processing of hop-constrained $s$-$t$ simple path queries by introducing HC-s path queries and a two-phase approach to identify dominating substructures and share computations. The proposed BatchEnum framework extends the single-query PathEnum with a clustering-based domination detection and a topologically ordered, cache-enabled enumeration that reuses intermediate results. Empirical evaluations on real-world graphs with billions of edges demonstrate substantial speedups (up to 7.7x average, 6.4x typical) and strong scalability compared to baselines and some K-shortest-path methods. This work enables efficient, scalable batch HC-s-t path query processing, with practical implications for fraud detection, biological pathway analysis, and knowledge graph completion.
Abstract
Hop-constrained s-t simple path (HC-s-t path) enumeration is a fundamental problem in graph analysis. Existing solutions for this problem focus on optimizing the processing performance of a single query. However, in practice, it is more often that multiple HC-s-t path queries are issued simultaneously and processed as a batch. Therefore, we study the problem of batch HC-s-t path query processing in this paper and aim to compute the results of all queries concurrently and efficiently as a batch. To achieve this goal, we first propose the concept of HC-s path query which can precisely characterize the common computation among different queries.We then devise a two-phase HC-s path query detection algorithm to identify the common HC-s path queries for the given HC-s-t path queries. Based on the detected HC-s path queries, we further devise an efficient HC-s-t path enumeration algorithm in which the common computation represented by HC-s path queries are effectively shared. We conduct extensive experiments on real-world graphs and the experimental results demonstrate that our proposed algorithm is efficient and scalable regarding processing multiple HC-s-t path queries in large graphs at billion-scale.
