Table of Contents
Fetching ...

GPU-Accelerated Batch-Dynamic Subgraph Matching

Linshan Qiu, Lu Chen, Hailiang Jie, Xiangyu Ke, Yunjun Gao, Yang Liu, Zetao Zhang

TL;DR

This paper introduces an efficient framework for batch-dynamic subgraph matching, GAMMA (GPU-Accelerated Batch-Dynamic Subgraph Matching), featuring a DFS-based warp-centric batch-dynamic subgraph matching algorithm and proposes warp-level work stealing via shared memory.

Abstract

Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive overheads has been a focus of research. However, existing approaches for dynamic subgraph matching often proceed serially, retrieving incremental matches for each updated edge individually. This approach falls short when handling batch data updates, leading to a decrease in system throughput. Leveraging the parallel processing power of GPUs, which can execute a massive number of cores simultaneously, has been widely recognized for performance acceleration in various domains. Surprisingly, systematic exploration of subgraph matching in the context of batch-dynamic graphs, particularly on a GPU platform, remains untouched. In this paper, we bridge this gap by introducing an efficient framework, GAMMA (GPU-Accelerated Batch-Dynamic Subgraph Matching). Our approach features a DFS-based warp-centric batch-dynamic subgraph matching algorithm. To ensure load balance in the DFS-based search, we propose warp-level work stealing via shared memory. Additionally, we introduce coalesced search to reduce redundant computations. Comprehensive experiments demonstrate the superior performance of GAMMA. Compared to state-of-the-art algorithms, GAMMA showcases a performance improvement up to hundreds of times.

GPU-Accelerated Batch-Dynamic Subgraph Matching

TL;DR

This paper introduces an efficient framework for batch-dynamic subgraph matching, GAMMA (GPU-Accelerated Batch-Dynamic Subgraph Matching), featuring a DFS-based warp-centric batch-dynamic subgraph matching algorithm and proposes warp-level work stealing via shared memory.

Abstract

Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive overheads has been a focus of research. However, existing approaches for dynamic subgraph matching often proceed serially, retrieving incremental matches for each updated edge individually. This approach falls short when handling batch data updates, leading to a decrease in system throughput. Leveraging the parallel processing power of GPUs, which can execute a massive number of cores simultaneously, has been widely recognized for performance acceleration in various domains. Surprisingly, systematic exploration of subgraph matching in the context of batch-dynamic graphs, particularly on a GPU platform, remains untouched. In this paper, we bridge this gap by introducing an efficient framework, GAMMA (GPU-Accelerated Batch-Dynamic Subgraph Matching). Our approach features a DFS-based warp-centric batch-dynamic subgraph matching algorithm. To ensure load balance in the DFS-based search, we propose warp-level work stealing via shared memory. Additionally, we introduce coalesced search to reduce redundant computations. Comprehensive experiments demonstrate the superior performance of GAMMA. Compared to state-of-the-art algorithms, GAMMA showcases a performance improvement up to hundreds of times.
Paper Structure (23 sections, 14 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 14 figures, 3 tables, 1 algorithm.

Figures (14)

  • Figure 1: A running example of batch-dynamic subgraph matching
  • Figure 2: The simplified GPU architecture
  • Figure 3: Overview of GAMMA
  • Figure 4: Preprocessing for candidate table generation. $v_6\sim v_9$ are omitted for brevity. In the example, we use the first 3 bits for vertex label encoding and the remaining 6 bits for counting neighbors with specific labels.
  • Figure 5: A comparison of BFS and DFS in a GPU environment
  • ...and 9 more figures

Theorems & Definitions (7)

  • Example 1
  • Definition 1
  • Definition 2
  • Example 2
  • Example 3
  • Definition 3
  • Example 4